How should I accurately time the execution of SYCL kernels? Given the JIT compilation part, my thinking is for the first time not to measure it, only for the second time:
run kernel
begin timing
run kernel
end timing
Running with spir64 on Intel CPUs, the timings are more or less stable, within 5%. However, with ptx64, running on a V100, the timings are all over the place:
0.02003 sec 0.0200269 sec 0.00789094 sec 0.000488997 sec 0.00049901 sec 0.00693297 sec 0.0199871 sec 0.0200288 sec 0.0200479 sec 0.000420094 sec 0.000442028 sec