I would like to use the nvidia profiler nvprof to profile some SYCL code generated with the ptx64 backend to run on an NVidia GPU. I have no problems running the profiler on code that uses CUDA, but when I view the generated timeline of the SYCL executable, it’s empty. Is there something special I need to do, or is this not possible at all?
If this is not possible, are there alternatives for profiling SYCL code on NVidia (or AMD) hardware? The profiling section of the manual only mentions Intel hardware.
There’s an article on how to manually profile with the Community Edition that might help you, this will be migrated to the developer website documentation shortly.
Additionally, the nvprof tool doesn’t seem to work with OpenCL kernels, which is what ComputeCpp ultimately ends up running on nvidia devices. There used to be workarounds but as far as I’m aware none of them work as of a couple of years now.
If you’d like to profile on nvidia devices, I would recommend contacting nvidia directly to request that their profiling tools work for both OpenCL and CUDA. I would also say that at the moment performance is likely to appear to be quite bad for nvidia devices as at the moment our profiling will tell you that kernels take multiples of 100ms to run, and basically never less than that. We don’t have a timeframe for fixing this unfortunately but we are aware of it.