Direct initialization of device memory for sycl::buffer

Hi,
I’m experiencing some conflicting behaviour with sycl::buffers.
According to the specification I’m under the impression that the buffer abstraction is supposed to implicitly handle data movement between devices when needed, but these data transfers tend to be inconsistent.

I want a quick solution to pass ownership management to the sycl::buffer so that it is able to transfer the data to the appropriate device when a kernel is invoked.

This ensures data availability on the host side (no ownership transfer to sycl::buffer):

std::vector<T> buf_data(N);
sycl::buffer<T> buf(buf_data.data(), sycl::range<1>(buf_data.size()));

Depending on the how the scope of the buf_data object is handled, the data is implicitly copied to device memory when a sycl::accessor is requested for the buffer in a queue submission on the device. But I have found this to be inconsistent when data ownership is handled separately.

In this thread I found that the data can be force-copied to the device by constructing two buffers and performing a copy between them in a queue submission on the device:

    // copies data to device memory
    {
      sycl::buffer<int, 1> data_buf(host_data.data(),
                                    sycl::range<1>(num_elements));
                                    
      // : command group scope
      queue.submit([&](sycl::handler& cgh) {
        auto data_acc = data_buf.get_access<sycl::access::mode::read>(cgh);
        auto dev_data_acc =
            dev_data_buf.get_access<sycl::access::mode::write>(cgh);
        auto index_space = sycl::nd_range<1>(sycl::range<1>(num_elements),
                                             sycl::range<1>(num_elements / 4));

        auto copy_dev_mem_kernel_func = [=](sycl::nd_item<1> item) {
          auto id = item.get_global_id(0);
          dev_data_acc[id] = data_acc[id];
        };

        cgh.parallel_for<kernels::copy_dev_mem_kernel>(
            index_space, copy_dev_mem_kernel_func);
      });
    }

Is it really necessary to run a queue submission like this every time I want a vector of data initialized on device memory in a sycl::buffer?

Aren’t there any one-liners to pass buf_data directly into the ownership of the buffer ,without having to worry about device memory copying and explicit copying via queue submissions?

Hello @Heinzelnisse ,

I understand you want to explicitly manage your data transfers between devices.

Is it really necessary to run a queue submission like this every time I want a vector of data initialized on device memory in a sycl::buffer?

Not really, that was just one example. Better and cleaner, you can just enqueue dedicated copy commands to/from device for this behaviour via the explicit memory operation APIs in the command group handler - sycl::handler::copy (see: 4.9.4.3. SYCL functions for explicit memory operations in the SYCL 2020 (Revision 7) specification.

Aren’t there any one-liners to pass buf_data directly into the ownership of the buffer ,without having to worry about device memory copying and explicit copying via queue submissions?

Yes, sycl::handler::copy can be used in place of the separate copy_dev_mem_kernel_func kernel, i.e.:

queue.submit([&](sycl::handler &cgh) {
  auto dev_data_acc = ...
  cgh.copy(host_data.data(), dev_data_acc);
}

Destination here is sycl::accessor as you’d need to define the range of the memory to be written (see this example).

Buffer objects enqueue implicit data transfers to the device when the associated accessors with the buffer memory region are enqeued in the device commands and most likely this would be when that memory is actually required on the device. If you want to explicitly manage when data transfers to device should happen, the above approach via using the SYCL explicit memory copy functions should work.

Please let me know if there is more to your use-case that needs to be clarified. Thanks!

Kind regards,
Georgi

1 Like