Accurately timing kernels

How should I accurately time the execution of SYCL kernels? Given the JIT compilation part, my thinking is for the first time not to measure it, only for the second time:

run kernel
begin timing
run kernel
end timing

Running with spir64 on Intel CPUs, the timings are more or less stable, within 5%. However, with ptx64, running on a V100, the timings are all over the place:
0.02003 sec 0.0200269 sec 0.00789094 sec 0.000488997 sec 0.00049901 sec 0.00693297 sec 0.0199871 sec 0.0200288 sec 0.0200479 sec 0.000420094 sec 0.000442028 sec

You are doing the correct thing by measuring the second execution.
Our PTX support is currently in an experimental state.
We rely on the callbacks from the NVIDIA OpenCL drivers and we have noticed this begins calling these quickly but as the execution goes on the callbacks timing often increases. I suspect this is what you are seeing.