It seems like the direction for running SYCL programs on NVIDIA hardware is transitioning from using PTX64 bitcode from ComputeCPP to using the extensions Codeplay added to llvm for generating CUDA using llvm (DPC++): https://developer.codeplay.com/products/computecpp/ce/guides/platform-support/targeting-nvidia-ptx
We’re doing some prototyping with SYCL and originally were using PTX64 and ComputeCPP for x86 machines with NVIDIA GPUs, but have transitioned to trying to get the applications running on embedded devices based around the Jetson platform (TX2 and Xavier AGX). We ran into issues with ComputeCPP recognizing the GPU cores on those devices. On x86-64, we’re able to use CUDA generated from DPC++ without issue and simple programs are working on Jetson devices.
I watched a couple of presentations from you guys introducing the new CUDA support you added to DPC++, and have a question regarding cl::sycl::image support. From the presentations I watched, it sounded like there wasn’t good mapping of CUDA to the sycl image functionality. However, I was able to build sample applications using visioncpp on x86-64 hardware with NVIDIA GPUs and a GPU device selector. The program ran and appeared to be accelerated, but perhaps I was fooled. I’m still working through issues with getting clang to build the program on Jetson boards, but I’m wondering what the status is of cl::sycl::image support with the CUDA functionality in DPC++. Has it been implemented using CUDA? I’m wondering if I’m being fooled when running on x86-64 machines with NVIDIA GPUs.