Runing Intel oneAPI (Intel C++ compiler + MKL) on Nvidia GPUs - no offload

Hi all,

I am trying to enhance our legacy code that works very well using oneAPI. We use Intel MKL extensively in our code now I intend to offload computations on GPUs if possible. CUDA toolkit 12.4 is installed on our server. We use cmake to build our project and I have updated our CMakeLists script so following flags are used during compilation and linking our code:


Now I see the code is built successfully but when I run some examples, there is no process running on GPUs. I use nvidia-smi to see the GPU load, and that is always zero.

Have I missed something? Do I build my code correctly? I have installed following package from codeplay:

and have followed all instruction you have provided. Even the example code you had is working and results do match.
I am wondering if I need to do something to let MKL know to offload the calculations to GPU. Any help is very appreciated.



Hi @danesh.daroui,
Intel MKL is a proprietary library built only for Intel devices (CPUs and GPUs). If you want to run oneMKL computations on NVIDIA GPUs, you can use the open-source oneMKL interface library which can be built to support multiple proprietary or open-source backends:

See the building options here:

Assuming you’re running on a machine with an Intel CPU and NVIDIA GPU, you could build it with:

-DENABLE_CU<lib>_BACKEND=ON # with <lib>=BLAS,SOLVER,RAND,FFT to enable CUDA proprietary backends


1 Like

Hi Rafal,
Thanks for your help. I will try it out. One more question! Can I just build my code just as it is, or I need to at least include new header files and also update my code where I call MKL routines? I see an example in github repo that is a bit scary since it goes into lower levels to specify queues fro CPU and GPU.

If you’re only including the top-level header #include <oneapi/mkl.hpp> and only using the Intel MKL APIs which are part of the open standard specification then you should be able to change the library in place without modifying your code. Just direct your compiler to the include directory of the oneMKL interface library installation instead of the Intel MKL one, and make sure you link the right libraries, e.g. -lonemkl_blas_mklcpu -lonemkl_blas_cublas if you wish to have both BLAS backends (CPU and GPU) available to use at runtime depending on your SYCL queue device selection.

However, Intel MKL provides some additional APIs which are not part of the open standard. There’s also some parts of the standard that are not yet implemented in the oneMKL interface library, e.g. the vector math domain API. If you encounter anything missing, you might need to change your code slightly to fall back onto the Intel MKL library and run on CPU for the missing parts.


Hi Rafal,
Thanks for your support. I have cloned the repo you mentioned but when I execute cmake script I get following error:

CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find LAPACKE (missing: LAPACKE64_file)
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
cmake/FindLAPACKE.cmake:23 (find_package_handle_standard_args)
tests/unit_tests/CMakeLists.txt:25 (find_package)

In addition, I get some warnings which look like:

CMake Warning (dev) at cmake/FindcuBLAS.cmake:20 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run “cmake
–help-policy CMP0146” for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

Call Stack (most recent call first):
src/blas/backends/cublas/CMakeLists.txt:22 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.

Would it be safe to ignore these warnings?

I am running Fedora 39 and have already installed lapack package, but not sure how to fulfill the package that cmake is complaining about.
Can you please advice?