When does memory transfer happen in SYCL

Hello All

A basic question from my side. I was not able to get it clarified after reading from various online sources.

Suppose I have code like -

void first_differential_x(uchar* input, uchar* output, size_t M, size_t N, queue myQueue, size_t TILE_SIZE){

    buffer<uchar, 2> input_buffer{input, {M, N}};
    buffer<uchar, 2> output_buffer{output, {M, N}};

    auto local_range = range<2>(TILE_SIZE, TILE_SIZE);
    auto global_range = range<2>(M / TILE_SIZE + 1 , N / TILE_SIZE + 1) * local_range;

    auto launchParams = nd_range<2>(global_range, local_range);

    myQueue.submit([&input_buffer, &output_buffer, M, N, launchParams](handler& cgh){

        auto accessor_input = input_buffer.get_access<access::mode::read>(cgh);
        auto accessor_output = output_buffer.get_access<access::mode::write>(cgh);

        cgh.parallel_for<class dx>(launchParams, [accessor_input, accessor_output, M, N](nd_item<2> ndItem){
            auto x = ndItem.get_global_id(0);
            auto y = ndItem.get_global_id(1);

            accessor_output[x][y] = accessor_input[x][y] - accessor_input[x][y + 1];


When does the memory transfer happen, when I create the buffers or when .submit is called


Thanks for your question. The answer is that it’s not possible to know exactly when the memory transfer happens, it’s not invoked by any specific code you write. The command group handler deals with scheduling and the SYCL runtime decides when is the best time to do the memory transfer.
The main thing you can guarantee is that (a) it is there when a kernel starts execution and (b) it is there when you needed on the host.

Hello Rod

Thank you for your reply. My intention is to profile the run time (wall clock) of the kernel without taking into account the memory transfer from host to device and vice-versa. Any idea how I could do that?

Ok that makes sense.
There are a couple of options.
The easiest would be to use the processor profiling tools, they are likely to give you this information specifically.
If you want to do it manually however, you can do explicit copies.
I think this post will give you some hints, I’ll try to find an example for you.

I think I found an example of how to do the copies before running the execution kernel.

This one copies the data in two commands, then runs the execution command. I wouldn’t recommend this in general for development as the scheduler knows the best time to do the transfers, but if you are specifically trying to get this information then it would be the simplest way.

Thank you so much @rod