I’m trying to AOT compile -fsycl-targets=nvptx64-nvidia-cuda for architectures that are not installed locally on my machine.
I know which GPU I’m going to use (nvidia_gpu_sm_72), but I do not have the oppurtunity to install oneAPI with your plugin on this remote node. Trying to compile for it results in the following error:
clang++: fatal error: cannot find libdevice for sm_72; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice
Disabling linking (-nocudalib) results in the following
"/opt/intel/oneapi/compiler/2023.2.0/linux/bin-llvm/llvm-foreach" --out-ext=o --in-file-list=/tmp/clang-c6bf17/Vector_Remap-sm_50-26a78b.s --in-replace=/tmp/clang-c6bf17/Vector_Remap-sm_50-26a78b.s --out-file-list=/tmp/clang-c6bf17/Vector_Remap-sm_50-54f499.cubin --out-replace=/tmp/clang-c6bf17/Vector_Remap-sm_50-54f499.cubin -- ptxas -m64 -g --dont-merge-basicblocks --return-at-end -v --gpu-name sm_50 --output-file /tmp/clang-c6bf17/Vector_Remap-sm_50-54f499.cubin /tmp/clang-c6bf17/Vector_Remap-sm_50-26a78b.s
llvm-foreach: No such file or directory
I suspect that the linking produces files that are required by the rest of the toolchain, which makes it necessary to link to the appropriate device library.
Is it possible to avoid linking to the device (-nocudalib) or to configure cuda appropriately to compile for the device?
I do have access to the libdevice code on the node, is it possible to inject this in some way?
I have the oneAPI-compiler installed with the corresponding oneAPI for nvidia plugin:
Thanks for your question. I’ve actually seen similar errors before. I think there are a few things you can try:
Firstly you can try using the setvars script to put llvm-foreach onto the path with source /path/to/oneapi/setvars.sh --include-intel-llvm. If you are currently using the modulefiles this might not be possible.
You could also try creating a container with the correct nvidia libraries available. Nvidia make them available through dockerhub here.
You might even be able to use a “normal” llvm-foreach by installing the llvm tools to your machine, I’ve not tried that personally.
Hi @Heinzelnisse,
I’m not sure I fully understand your setup. Am I right thinking you have a compilation machine where you have oneAPI and CUDA installed? If that is the case, I think you might be just missing the right --cuda-path setting in your compilation command, in case it is not installed in the default /usr/local/cuda-<ver> location.
You can see the CUDA location with which nvcc, or even put it directly into the compilation command:
The problem is that my local Cuda installation does not support the target architecture. (I’m trying to compile to sm_70, but I do not have an sm_70 locally on my machine)
I have tried targeting the correct cuda installation (/opt/cuda/).
Ultimately I’m after a working solution. Is it possible to add sm_70 to the list of devices supported by my local cuda installation?
(It would be nice of the Codeplay-plugin and clang++ translation to be able to support -nocudalib though, so that the cuda-side of things could be resolved on the remote side later, without requiring the complete oneAPI-package)
Oh, if you had already added the LLVM binaries to the path then I’d consider that a bug. I’ll let the team know.
I’m not very familiar with how CUDA selects which device binaries are available. In my personal experience I’ve always had support for whatever hardware I’ve been using.
Okay, if its the case that cuda supports all architectures regardless, is there some way to manually add architectures into dpcpp (nvidia_gpu_sm_70), so that it will compile?
Yes, CUDA 12 can generate device code for all architectures down to sm_50, so sm_70 is definitely supported. You don’t need a GPU or drivers to compile CUDA device code, just the toolkit with compilers and libraries.
To generate sm_70 device code from SYCL, you can add the following compilation flags:
The first part tells clang++ that the following flag should apply to the device code compiler (CUDA in this case), and the second part sets the target architecture.
I can reproduce the issue with -nocudalib, we’ll investigate this.
Great! Let us know if that works. Note also that the oneAPI plugin is actually needed to run the program, and not really needed to compile it. The plugin contains the runtime library allowing SYCL runtime to execute code on NVIDIA GPUs.
For the -nocudalib issue, we figured that the llvm-foreach error message was rather unhelpful, but the actual problem is simple. To compile SYCL code to CUDA device code, even without linking the device lib (libdevice.10.bc), clang++ needs to execute ptxas which comes from the CUDA toolkit installation. It looks like your environment PATH doesn’t include the location of the ptxas executable, so llvm-foreach failed to execute it.
The solution would be to just update your PATH to include the location of the ptxas executable. For example, this fails for me: