Oneapi for nvidia "Cross-compilation"

I’m trying to AOT compile -fsycl-targets=nvptx64-nvidia-cuda for architectures that are not installed locally on my machine.

I know which GPU I’m going to use (nvidia_gpu_sm_72), but I do not have the oppurtunity to install oneAPI with your plugin on this remote node. Trying to compile for it results in the following error:

clang++: fatal error: cannot find libdevice for sm_72; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice

Disabling linking (-nocudalib) results in the following

 "/opt/intel/oneapi/compiler/2023.2.0/linux/bin-llvm/llvm-foreach" --out-ext=o --in-file-list=/tmp/clang-c6bf17/Vector_Remap-sm_50-26a78b.s --in-replace=/tmp/clang-c6bf17/Vector_Remap-sm_50-26a78b.s --out-file-list=/tmp/clang-c6bf17/Vector_Remap-sm_50-54f499.cubin --out-replace=/tmp/clang-c6bf17/Vector_Remap-sm_50-54f499.cubin -- ptxas -m64 -g --dont-merge-basicblocks --return-at-end -v --gpu-name sm_50 --output-file /tmp/clang-c6bf17/Vector_Remap-sm_50-54f499.cubin /tmp/clang-c6bf17/Vector_Remap-sm_50-26a78b.s
llvm-foreach: No such file or directory

I suspect that the linking produces files that are required by the rest of the toolchain, which makes it necessary to link to the appropriate device library.

Is it possible to avoid linking to the device (-nocudalib) or to configure cuda appropriately to compile for the device?

I do have access to the libdevice code on the node, is it possible to inject this in some way?

I have the oneAPI-compiler installed with the corresponding oneAPI for nvidia plugin:

Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230622)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.2.0/linux/bin-llvm

Cuda:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Hi @Heinzelnisse,

Thanks for your question. I’ve actually seen similar errors before. I think there are a few things you can try:

Firstly you can try using the setvars script to put llvm-foreach onto the path with source /path/to/oneapi/setvars.sh --include-intel-llvm. If you are currently using the modulefiles this might not be possible.

You could also try creating a container with the correct nvidia libraries available. Nvidia make them available through dockerhub here.

You might even be able to use a “normal” llvm-foreach by installing the llvm tools to your machine, I’ve not tried that personally.

I hope this helps,
Duncan.

Thanks, setvars.sh --include-intel-llvm was already sourced prior to the initial attempt.

I will try the docker solution, and another llvm-foreach

Hi @Heinzelnisse,
I’m not sure I fully understand your setup. Am I right thinking you have a compilation machine where you have oneAPI and CUDA installed? If that is the case, I think you might be just missing the right --cuda-path setting in your compilation command, in case it is not installed in the default /usr/local/cuda-<ver> location.

You can see the CUDA location with which nvcc, or even put it directly into the compilation command:

--cuda-path=$(realpath $(dirname $(which nvcc))/..)

Cheers,
Rafal

The problem is that my local Cuda installation does not support the target architecture. (I’m trying to compile to sm_70, but I do not have an sm_70 locally on my machine)

I have tried targeting the correct cuda installation (/opt/cuda/).

Ultimately I’m after a working solution. Is it possible to add sm_70 to the list of devices supported by my local cuda installation?

(It would be nice of the Codeplay-plugin and clang++ translation to be able to support -nocudalib though, so that the cuda-side of things could be resolved on the remote side later, without requiring the complete oneAPI-package)

Oh, if you had already added the LLVM binaries to the path then I’d consider that a bug. I’ll let the team know.

I’m not very familiar with how CUDA selects which device binaries are available. In my personal experience I’ve always had support for whatever hardware I’ve been using.

Okay, if its the case that cuda supports all architectures regardless, is there some way to manually add architectures into dpcpp (nvidia_gpu_sm_70), so that it will compile?

Yes, CUDA 12 can generate device code for all architectures down to sm_50, so sm_70 is definitely supported. You don’t need a GPU or drivers to compile CUDA device code, just the toolkit with compilers and libraries.

To generate sm_70 device code from SYCL, you can add the following compilation flags:

-Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_70

The first part tells clang++ that the following flag should apply to the device code compiler (CUDA in this case), and the second part sets the target architecture.

I can reproduce the issue with -nocudalib, we’ll investigate this.

1 Like

Thanks, now it compiles successfully!
I will test the binary as soon as the remote node is available.

1 Like

Great! Let us know if that works. Note also that the oneAPI plugin is actually needed to run the program, and not really needed to compile it. The plugin contains the runtime library allowing SYCL runtime to execute code on NVIDIA GPUs.

For the -nocudalib issue, we figured that the llvm-foreach error message was rather unhelpful, but the actual problem is simple. To compile SYCL code to CUDA device code, even without linking the device lib (libdevice.10.bc), clang++ needs to execute ptxas which comes from the CUDA toolkit installation. It looks like your environment PATH doesn’t include the location of the ptxas executable, so llvm-foreach failed to execute it.

The solution would be to just update your PATH to include the location of the ptxas executable. For example, this fails for me:

clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -nocudalib test.cpp

but this does not:

PATH=/my/path/to/cuda/bin:${PATH} clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -nocudalib test.cpp

Can confirm, with cuda binaries exported it works like it should with the -fsycl-targets=nvptx64-nvidia-cuda, case closed.

1 Like