OneMKL / NVIDIA

According to this page: oneAPI for CUDA® - Codeplay Software Ltd, OneMKL is supported with NVIDIA GPU. I used a simple example provided with OneAPI (fcorr_1d_buffers.cpp) fails with the error below. It seems to happen right when the oneapi::mkl::rng::generate() function is called around L48 of the code.

Running on: NVIDIA GeForce RTX 3050 Laptop GPU
terminate called after throwing an instance of ‘sycl::_V1::runtime_error’
*** what(): Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)***
Aborted (core dumped)

The OneAPI example code is here: oneAPI-samples/fcorr_1d_buffers.cpp at master · oneapi-src/oneAPI-samples · GitHub

From the error it sounds like you have not built the binary with the flag to use Nvidia hardware.

Are you compiling with the flags in the Get Started Guide, i.e. using the nvptx64-nvidia-cuda flag?

clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda simple-sycl-app.cpp -o simple-sycl-app

Thanks for responding.
The short answer is: yes.

The exact line used to compile is:
clang++ -O2 -DMKL_ILP64 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -qmkl=parallel -o CMakeFiles/CodePlay.dir/simple-sycl-app.cpp.o -c simple-sycl-app.cpp

The exact line used to link is:
clang++ -O2 -DMKL_ILP64 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -qmkl=parallel CMakeFiles/CodePlay.dir/simple-sycl-app.cpp.o -o CodePlay

Results are same with the -O2 flag omitted. The MKL flags are needed for the MKL portions of the code. The contents of simple-sycl-app.cpp are the same as fcorr_1d_buffers.cpp (example provided in OneAPI).

Thanks for the information.

You will need to build oneMKL with the right backend to use these custom kernels including for RNG.
The project includes CMake flags to enable these, I believe the appropriate one is ENABLE_CURAND_BACKEND

Just to check, did you get the oneMKL binaries with the oneAPI base toolkit release?

Yes, they are the ones that came with oneAPI base toolkit.
Should I follow the directions here: Building the Project — oneAPI Math Kernel Library Interfaces 0.1 documentation?

As a follow up, can oneMKL be compiled with multiple backends (e.g. ENABLE_MKLCPU_BACKEND, ENABLE_CURAND_BACKEND and ENABLE_CUBLAS_BACKEND) enabled? I assume this is what I need to run common code (either on CPU or GPU, selected at runtime) on my laptop which has an intel CPU and NVIDIA GPU.

Your help is much appreciated.

Yes, take a look at the building for CUDA instructions, you’ll need to enable the backend options you need.

1 Like

Hi,

Did you resolve your problem? Same issue here. I’ve rebuilt MKL library adding cuFFT, cuBLAS & cuRAND support but with no effect. Exception: Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY) still occurs.

Hi @vandyke,
is your sycl::queue using a device selector which selects the NVIDIA GPU? It could be that the library is correctly compiled for NVIDIA backend, but your queue defaults to selecting an OpenCL or Level Zero backend, hence the “invalid binary” error.

You can check this by running your application with the environment variable SYCL_PI_TRACE=1, for example:

SYCL_PI_TRACE=1 ./my-app

This should print (among other things) something like:

SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA A100-SXM4-40GB

The score and device name will differ for you, but it should be the NVIDIA CUDA BACKEND.

If that’s not the case, you can select the right backend either with a custom device selector:
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:device-selection

or using the default selector (used by the default constructor of sycl::queue) and narrowing down the list of devices using the ONEAPI_DEVICE_SELECTOR environment variable. For example:

ONEAPI_DEVICE_SELECTOR=cuda:gpu SYCL_PI_TRACE=1 ./my-app

The environment variables are documented here:
https://intel.github.io/llvm-docs/EnvironmentVariables.html

Hi @rbielski

Thx for Your response. I’ve discovered where the problem exists and this is not problem (i think…) with selected backend. I tried compiling & run an example shown at this link:

https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/efficiently-implementing-fourier-correlation-using.html

Everything works fine until oneapi::mkl::vm::mulbyconj function execution. On NVIDIA GPU mulbyconj function rise exception:

“Exception: Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)”

Hi @vandyke,
I see now what the issue is. You are using a function which is part of the Vector Math domain in oneMKL API specification. Unfortunately, the oneMKL interfaces library does not yet implement the full API specification and the Vector Math domain is one of the missing parts. You can see more information in the README of the project:

The “Supported configurations” section lists which domains are available for which backends/platforms.

The first question in FAQ explains the difference between the:

The instructions from intel.com you’re following assume you’re using the last one, and unfortunately don’t fully apply to the second one as its implementation is still in progress.

1 Like

@rbielski thx for your response. Everything is clear now :+1: :+1: :+1:

1 Like