Cmake-integration for OneAPI for Nvidia GPUs

I’ve followed the setup guide for OneAPI-cuda,
and managed to compile and execute the provided simple-sycl-app.cpp on my GTX1660.

Many of my other applications seem to compile fine from the command line with an environment-activated shell (with source /opt/Intel/oneapi/ --include-intel-llvm),
but I’m not able to make this work with cmake. I’ve tried the following:

  • building cmake projects from the source-activated shell and -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang
  • Activating environments with the OneAPI environment extension in VScode, followed by a cmake-tools configuration with using one of these kits:
    "name": "DPCPP",
    "compilers": {
      "C": "/opt/intel/oneapi/compiler/latest/linux/bin/icx",
      "CXX": "/opt/intel/oneapi/compiler/latest/linux/bin/icpx"
    "keep": true
    "name": "DPCPP-clang++",
    "compilers": {
      "C": "/opt/intel/oneapi/compiler/latest/linux/bin-llvm/clang",
      "CXX": "/opt/intel/oneapi/compiler/latest/linux/bin-llvm/clang++",
    "keep": true

All cmake configurations leads to the following error:

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Aborted (core dumped)

Relevant CMakeLists.txt options configured:

find_package(IntelDPCPP REQUIRED HINTS "/opt/intel/oneapi/compiler/latest/linux/IntelDPCPP")

CMake is not able to sycl-libraries properly without the IntelDPCPP-package.
Is this disrupting the Codeplay-configured clang++?

[usr@usr build]$ clang++ --version
Intel(R) oneAPI DPC++/C++ Compiler 2023.0.0 (2023.0.0.20221201)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.0.0/linux/bin-llvm

How do I properly configure my environment for CUDA?

Hi Heinzelnisse

May I suggest you have a look at this Codeplay example’s CMakelist.txt. It may give you some clues.
Here: SYCL-For-CUDA-Examples/CMakeLists.txt at master · codeplaysoftware/SYCL-For-CUDA-Examples · GitHub
Please let me know if you have resolved the issue.
In the mean time I will try to emulate your problem and resolve it.

Hi Heinzeinesse

Looking at the error produced, is it possible in your CMake’s file, the wrong type of target device is being specified? The parameter value does not match you actual NVidia card.

The compiled binary runs smoothly on the GPU without target specification when compiled from the command line, like in the example. I will test the code in your provided example as soon as I have the GPU available.

Using the same compiler options as found in FindSYCL.cmake for dpcpp (in your example) did not resolve the issue.

As previously mentioned, the environment variables from running:

source /opt/intel/oneapi/ --include-intel-llvm

successfully initializes environment variables for CMake when configured directly from the shell.
Sourcing the same script prior to opening vscode from the command line solved the issue, CMake (configured with CMake-tools) is now able to find and link the sycl-libraries.

Final function for generating executables:

set(DPCPP_FLAGS -fsycl -fsycl-targets=spir64_x86_64,nvptx64-nvidia-cuda -Xcuda-ptxas -v -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_60 -Wno-linker-warnings)

function(add_sycl_executable source_file)
    add_executable(${source_file} "${source_file}.cpp")
    target_compile_options(${source_file} PRIVATE $<$<COMPILE_LANGUAGE:CXX>:${DPCPP_FLAGS} -sycl-std=2020 -std=c++20 -fsycl-unnamed-lambda>)
    target_link_options(${source_file} PRIVATE ${DPCPP_FLAGS} -sycl-std=2020 -std=c++20 -fsycl-unnamed-lambda)

Thanks for the confirmation. Are you able to point us to or provide a minimal reproducer, like a small project where we can try the compilation and figure out where the problem occurs?

  1. Run Vscode with CMake-Tools extension without sourcing /opt/intel/
  2. Initialize the OneAPI environment using the Intel OneAPI environment configurator extension.
  3. Open your provided hashing example.
  4. Set appropriate target architecture (--cuda-gpu-arch=sm_60 in this case)

Invoking CMake configure and build via CMake-Tools produced these warnings for this minimal example:

[build] clang++: warning: linked binaries do not contain expected 'nvptx64-nvidia-cuda' target; found targets: 'spir64_x86_64-unknown-unknown, nvptx64-nvidia-cuda-sm_60'

Modifying -fsycl-targets=spir64_x86_64,nvptx64-nvidia-cuda to -fsycl-targets=spir64_x86_64,nvptx64-nvidia-cuda-sm_60 results in:

[build] clang++: warning: linked binaries do not contain expected 'nvptx64-nvidia-cuda-sm_60' target; found targets: 'spir64_x86_64-unknown-unknown, nvptx64-nvidia-cuda-sm_60-sm_60'

Additionally this occurs, which I suspect to be caused by the misconfigured environment:

[build] clang++: error: unable to execute command: Executable "sycl-post-link" doesn't exist!

Note that this is on a GTX1660 with CUDA 11.8

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Here is an even simpler example circumventing the -fsycl-targets issue:


PROJECT( test )

find_package(IntelDPCPP REQUIRED)

add_executable( test test.cpp )
#include <CL/sycl.hpp>
#include <vector>
#include <random>   
#include <iostream>
//create square multiply kernel
class square_multiply_kernel;

int main()
    std::vector<int> v(10);
    //fill vector with random numbers
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<> dis(1, 100);
    for (int i = 0; i < 10; i++)
        v[i] = dis(gen);

    //square 10 random numbers in parallel in sycl kernel on gpu
    //create queue on gpu
    sycl::queue q(sycl::gpu_selector{});

    //create buffer
    sycl::buffer<int, 1> buf(, sycl::range<1>(v.size()));

    //submit command group to queue
    q.submit([&](sycl::handler& cgh)
        //get write access to buffer
        auto acc = buf.get_access<sycl::access::mode::write>(cgh);

        //execute kernel
        cgh.parallel_for<square_multiply_kernel>(sycl::range<1>(v.size()), [=](sycl::id<1> index)
            acc[index] = acc[index] * acc[index];

    //wait for queue to finish

    //print results
    for (int i = 0; i < 10; i++)
        std::cout << v[i] << std::endl;

    return 0;

Which (again) returns the initial error:

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Aborted (core dumped)

Thank you for the example code and CMake script. We will try it and try to replicate the issue.

Hi Heinzelnisse,
Just to confirm, the problem only occurs when using VSCode and the Intel VSCode extension Environment Configurator to source the VSCode dev environment?

If you use the terminal with sourcing /opt/intel/, your CMake configuration and build successfully?

I am trying to establish if it is VSCode Intel Environment Configuration problem or a compiler problem.

We are not able to replicate the issue as we do not have the hardware. How we would like to suggest the following.
The CMake is incorrect because the architecture flags need to be passed when linking. These are the warnings. Appending -sm_60 to the architecture flag -fsycl-targets doesn’t work, need to be passed in separately. This document can help:

It works only when sourcing with --include-intel-llvm as argument, but as mentioned this works fine before launching vscode, in addition to omitting find_package(IntelDPCPP). It would be nice with a similar CMake-config file for the codeplay-configured compilers though, or a modification of IntelDPCPPConfig.cmake that works.


source /opt/intel/oneapi/ --include-intel-llvm

then start vscode.

The --include-intel-llvm argument is important because
Intel put clang++ and sycl-post-link into different directories:


If you only set the path to find clang++, it will miss sycl-post-link, so you need both bin and bin-llvm

The warnings and runtime error are likely related, so here are a few ideas of what could be the problem:

  • From the CMake source it looks like the need to set the -fsycl-targets option. If not, then do this.
  • Though the warnings suggest that -fsycl-targets is set to nvptx64-nvidia-cuda-sm_60, this is not a valid triple anymore, use-sycl-targets=nvptx64-nvidia-cuda with -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_60
  • Make sure that the gpu_selector is actually picking an Nvidia GPU. Check by using sycl::get_device on the queue() and then get_info on the device.
  • Add the SYCL flags to add_link_options in CMake.

There are issues with the IntelDCPP and we aim to address them ready for the next release.

Have you been successful in this issue? Or is the issue still open? If still open, has there been any progress?

This issue is now closed.