Hi,
Thanks for your question. The answer is that it’s not possible to know exactly when the memory transfer happens, it’s not invoked by any specific code you write. The command group handler deals with scheduling and the SYCL runtime decides when is the best time to do the memory transfer.
The main thing you can guarantee is that (a) it is there when a kernel starts execution and (b) it is there when you needed on the host.
Thank you for your reply. My intention is to profile the run time (wall clock) of the kernel without taking into account the memory transfer from host to device and vice-versa. Any idea how I could do that?
Hello,
Ok that makes sense.
There are a couple of options.
The easiest would be to use the processor profiling tools, they are likely to give you this information specifically.
If you want to do it manually however, you can do explicit copies.
I think this post will give you some hints, I’ll try to find an example for you.
I think I found an example of how to do the copies before running the execution kernel.
This one copies the data in two commands, then runs the execution command. I wouldn’t recommend this in general for development as the scheduler knows the best time to do the transfers, but if you are specifically trying to get this information then it would be the simplest way.