Dynamic selection of device type

farshad.akhbari · March 18, 2020, 3:05pm

I have multiple kernels which are spawn in a flow graph on multiple devices. For load balancing considerations, these kernels need to be run on different devices decided by upper layer. Is there any example on how to select a device before a Sycl queue is setup? figuratively, I am looking for this kind of flow:

if (cpu_device) {
cl::sycl::cpu_selector device_selector;
} else if (gpu_device) {
cl::sycl::gpu_selector device_selector;
} else {
cl::sycl::default_selector device_selector;
}

cl::sycl::queue sycl_queue (device_delector);

thanks in advance.

duncan · March 18, 2020, 4:26pm

Hi @farshad.akhbari, the simple answer is that the selectors are part of an inheritance hierarchy so you could use an std::unique_ptr<sycl::device_selector> to store whatever selector you eventually choose. That said, it is almost always better to create queues once on application startup and reuse them if possible, so I would recommend creating and caching a queue per-device when initialising your graph, then submitting work based on what device your upper layers require.

If you must create a queue each time, which I would strongly recommend against, you could also use a custom selector, like in some sample code we have on GitHub.

I hope this helps,
Duncan.

farshad.akhbari · March 18, 2020, 4:46pm

Hi Duncan,

The intent is really not to recreate queues but to set them up in the object’s constructor (flow graph
object that is). I think the solution I am looking for is to use the std::unique_ptr but not sure how it should be properly used. I see a null pointer at runtime when I try this. Any working example I can compare mine against?

Thanks,

Farshad.

duncan · March 18, 2020, 5:11pm

You could do something like:

auto selector = std::make_unique<sycl::device_selector*>(new sycl::default_selector{});
if (use_cpu_device) {
  selector = std::make_unique<sycl::device_selector*>(new sycl::cpu_selector{});
else if /*and so on*/ {
}

That said, if you’re putting that much logic into it, you’re honestly as well wrapping it all into a custom selector, much like the sample shows.

farshad.akhbari · March 18, 2020, 5:56pm

You mean
unique_ptr below? I am not certain you
sycl::queue can dynamically find the right constructor:

error: no matching constructor for initialization of ‘cl::sycl::queue’

cl::sycl::queue
sycl_queue(device_selector);

^ ~~~~~~~~~~~~~~~

/CL/sycl/queue.hpp:29:12: note: candidate constructor not viable: no known conversion from ‘std::unique_ptr<cl::sycl::device_selector
*, std::default_delete<cl::sycl::device_selector *> >’ to ‘const cl::sycl::property_list’ for 1st argument

explicit queue(const property_list &propList = {})

duncan · March 18, 2020, 6:30pm

It’s just like a pointer, you can just dereference it to get the right type:

sycl::queue queue(*selector);

farshad.akhbari · March 18, 2020, 8:17pm

Duncan,

I need a complete solution. This is a serious DX issue. Can you send me a working sample code after verification?

Regards,

Farshad.

duncan · March 18, 2020, 9:25pm

I’ve thought a little about your situation, and tried to come up with a different approach. If you are using every device on the system, I imagine it is easier to enumerate them and create a queue from each directly, which are then used in this lower layer.

auto platforms = sycl::platform::get_platforms();
std::vector<sycl::queue> queues;
for (auto plat : platforms) {
  for (auto dev : plat.get_devices()) {
    queues.push_back(sycl::queue{dev});
  }
}

This will give you a vector of queues which you can distribute among these lower layers.

Otherwise, the selector example I linked in a previous reply will show you how to encapsulate the logic of which device to choose inside a single class that you can use in your lower layers.

farshad.akhbari · March 18, 2020, 10:46pm

Oh my lord!!

There is a known platform. CL capabilities in the platform is already known. The dilemma is to allocate queues as the upper layer spawns work. Since we are in runtime, query of devices at this point is very costly since SYCL kernel execution is half way
down the pipeline. Unless I move it to the very beginning which I can do (yet a different challenge). At the FG point of entry, I need to send kernel A to CPU and kernels B, C and D to GPU and another kernel to FPGA device. I hope there is a solution that
would help me setup proper devices and their respective queues with minimal complexity. If there is setup and initialization time I can move it up the stack to avoid added latency.

Given above info, any pointer?

Regards,

Farshad.

duncan · March 19, 2020, 7:44pm

Yes, that’s helpful. Queue creation is definitely one of the slower steps in a SYCL program from my experience. If you know exactly which devices will be executing the work, I would recommend moving the queue construction earlier. Could you store the queues as members of the lower layers? That way the queue will be ready to submit work to at the points you need it, and since you have a constrained number of devices to begin with it shouldn’t be too hard to do. I don’t think we’ve got any larger samples showing multiple device queues.

Topic		Replies	Views
Changing the target device of a Queue	2	780	March 7, 2021
Usage of different queues attached to the same device SYCL development	1	110	January 26, 2024
How to debug using GDB on the host device if sycl::queue(host_selector) isn't supported anymore? SYCL development	8	410	October 25, 2023
SYCL kernel hangs and never finishes SYCL development	7	379	October 30, 2023
Work-group local accessors for hierarchical kernels SYCL development	15	510	August 28, 2023

Dynamic selection of device type

Related Topics