Linker error (trying oneAPI DFT on NVIDIA device)

Hello everyone,

a small program of mine, using the oneAPI DFT, fails to link.
I provide a minimal working example, the Makefile, and the error message:

#include <random>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <vector>
#include <complex>

#include <sycl/sycl.hpp>
#include <oneapi/mkl/dft.hpp>

namespace dft_ns = oneapi::mkl::dft;

template< typename T >
constexpr bool is_usable_compute_type = std::is_same_v< T, float > |
                                        std::is_same_v< T, std::complex< float > > |
                                        std::is_same_v< T, double > |
                                        std::is_same_v< T, std::complex< double > >;

template< dft_ns::precision prec, dft_ns::domain dom >
void ready_descriptor( dft_ns::descriptor< prec, dom >& desc, sycl::queue& q )
{
  std::int64_t rank;
  desc.get_value( dft_ns::config_param::DIMENSION, &rank );
  std::vector< std::int64_t > lengths( rank );
  desc.get_value( dft_ns::config_param::LENGTHS, &lengths );
  desc.commit( q );
}

template< typename T, std::enable_if_t< is_usable_compute_type< T >, bool > = true >
sycl::event compute_inplace_real_dft( sycl::queue& q,
                                      const std::vector< std::int64_t >& lengths,
                                      T* device_accessible_usm_data
                                    )
{
  constexpr bool is_single_precision = std::is_same_v< T, float > | std::is_same_v< T, std::complex< float > >;
  constexpr auto prec = is_single_precision ? dft_ns::precision::SINGLE
                                            : dft_ns::precision::DOUBLE;
  auto desc = dft_ns::descriptor< prec, dft_ns::domain::REAL >( lengths );
  ready_descriptor( desc, q );
  return dft_ns::compute_forward( desc, device_accessible_usm_data );
}


std::string manufacturer = "NVIDIA";

int device_selector( const sycl::device& dev )
{
  int score = 0;
  if( dev.is_gpu() ) ++score;
  std::string vendor = dev.get_info< sycl::info::device::vendor >();
  if( vendor.find( manufacturer ) != std::string::npos )
    ++score;

  return score;
}

void do_some_work( sycl::queue& q )
{
  auto dev = q.get_device();
  auto ctxt = q.get_context();

  bool has_usm_shared = dev.has( sycl::aspect::usm_shared_allocations ),
       has_usm_device = dev.has( sycl::aspect::usm_device_allocations ),
       has_usm = has_usm_shared || has_usm_device;

  constexpr int N = 2048;

  if( has_usm )
  {
    float* data = nullptr;
    if( has_usm_shared )
      data = sycl::malloc_shared< float >( N, q );
    else
      data = sycl::malloc_device< float >( N, q );

    std::random_device rnd_device;
    std::mt19937 mersenne_engine{ rnd_device() };  // Generates random integers
    std::uniform_real_distribution< float > dist{ -1.0f, +1.0f };
    auto gen = [&](){
                      return dist( mersenne_engine );
                    };
    std::generate( data, data + N, gen );
    std::vector< std::int64_t > lengths{ N };
    compute_inplace_real_dft< float >( q, lengths, data );
    sycl::free( data, q );
  }
}

int main( int argc, char** argv )
{
  int num_args = 1;

  while( num_args < argc )
  {
    if( std::string( "-m" ) == argv[num_args] ) manufacturer = argv[++num_args];
    ++num_args;
  }

  sycl::queue q{ device_selector };
  do_some_work( q );

  return 0;
}

The Makefile is

LDIR    = /home/pablo/intel/oneapi/mkl/2025.1/

CFLAGS  = -Werror -Wall -Wpedantic -I${IDIR} -std=c++17
LFLAGS  = -fsycl  -fsycl-device-code-split=per_kernel ${LDIR}/lib/libmkl_sycl.a -Wl,-export-dynamic -Wl,--start-group ${LDIR}/lib/libmkl_intel_ilp64.a ${LDIR}/lib/libmkl_tbb_thread.a ${LDIR}/lib/libmkl_core.a -Wl,--end-group -L${LDIR}/lib/ -ltbb -lsycl -lOpenCL -lpthread -lm -ldl
DEVICES = -fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_60 

example: example.cpp
        icpx -o example example.cpp -fsycl ${CFLAGS} ${DEVICES} ${LFLAGS}

When I attemp to compile/link the code I get the following error message:

icpx -o example example.cpp -fsycl -Werror -Wall -Wpedantic -I/home/pablo/intel/oneapi/mkl/2025.1/include -std=c++17 -fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_60  -fsycl  -fsycl-device-code-split=per_kernel /home/pablo/intel/oneapi/mkl/2025.1//lib/libmkl_sycl.a -Wl,-export-dynamic -Wl,--start-group /home/pablo/intel/oneapi/mkl/2025.1//lib/libmkl_intel_ilp64.a /home/pablo/intel/oneapi/mkl/2025.1//lib/libmkl_tbb_thread.a /home/pablo/intel/oneapi/mkl/2025.1//lib/libmkl_core.a -Wl,--end-group -L/home/pablo/intel/oneapi/mkl/2025.1//lib/ -ltbb -lsycl -lOpenCL -lpthread -lm -ldl
icpx: error: linked binaries do not contain expected 'nvptx64-nvidia-cuda-sm_60' target; found targets: 'spir64-unknown-unknown, spir64_gen-unknown-unknown, spir64' [-Werror,-Wsycl-target]
make: *** [Makefile:9: example] Fehler 1

In my system, sycl-ls gives

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) UHD Graphics 12.2.0 [1.6.33276]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) i9-14900HX OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics OpenCL 3.0 NEO  [25.13.033276]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4070 Laptop GPU 8.9 [CUDA 12.8]

What flag should I give the linker, for it to generate the code that an NVIDIA device can run?

Thank you in advance!

I’ll give you a bit of background to the answer so you understand how the bits fit together.
The Intel oneAPI Base Toolkit contains binaries for DPC++ and oneMKL that can be used to build for Intel targets.
When you download the plugin from Codeplay you are only getting the DPC++ binaries that allow you to build for Nvidia targets, not the oneMKL part.
So what you would need to do is build oneMath (formerly oneMKL) for Nvidia and link to that binary when compiling your project.

I appreciate this is not totally convenient, we have looked at making the oneMath binaries available with the plugin but at the moment it is just the DPC++ compiler part.

Thank you, Rod, for the detailed answer.
Unfortunately, I do not get it working.

I downloaded a snapshot of oneMath, and compiled it. I chose the static libs. In my Makefile I selected the new include directory and the new lib directory (the ones of oneMath instead of those of MKL). However, the error message stays the same:

icpx -o complex_fwd_usm_mklcpu_cufft complex_fwd_usm_mklcpu_cufft.cpp -fsycl -v -fsycl -Werror -Wall -Wpedantic -I/home/pablo/intel/oneMath/include /home/pablo/intel/oneapi/mkl/2025.1/include -std=c++17 -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_50  /home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -v -fsycl -L/home/pablo/intel/oneMath/lib -lsycl -lOpenCL -lpthread -lm -ldl
Intel(R) oneAPI DPC++/C++ Compiler 2025.1.1 (2025.1.1.20250418)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/pablo/intel/oneapi/compiler/2025.1/bin/compiler
Configuration file: /home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/../icpx.cfg
Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/15
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
Found CUDA installation: /opt/cuda, version 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -list 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-spir64-unknown-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -list 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -list 
icpx: error: linked binaries do not contain expected 'nvptx64-nvidia-cuda-sm_50' target; found targets: 'spir64-unknown-unknown' [-Werror,-Wsycl-target]
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -check-section -base-temp-dir=/tmp/icpx-0a5cd9e409 
/home/pablo/intel/oneapi/compiler/2025.1/bin/compiler/clang-offload-bundler -type=ao -input=/home/pablo/intel/oneMath/lib/libonemath_dft_cufft.a -list 
make: *** [Makefile:15: complex_fwd_usm_mklcpu_cufft] Fehler 1

I noticed that oneMath comes with examples. I could not find the way to tell the compiler to look for sycl/sycl.hpp at the right directory, so the examples do not compile in my system.

Hi @pablo_s,
DPC++ knows all its SYCL include directories, library directories and library name to link just from the -fsycl flag. There is no need to specify any -I, -L or -l arguments for the SYCL runtime.

The oneMath library is just a host-side wrapper around calling vendor-specific kernel libraries (cuFFT in this case), so it doesn’t contain any device code itself. The message from icpx about incorrect target is surprising, but it’s actually just a warning. It was raised to an error because you included -Werror.

Could you try adding -Wno-sycl-target as well? It worked fine for me.

I built oneMath in the following way:

git clone https://github.com/uxlfoundation/oneMath.git
cd oneMath
cmake \
  -Bbuild \
  -DCMAKE_INSTALL_PREFIX=$PWD/install \
  -DCMAKE_CXX_COMPILER=icpx \
  -DCMAKE_C_COMPILER=icx \
  -DENABLE_MKLCPU_BACKEND=False \
  -DENABLE_MKLGPU_BACKEND=False \
  -DENABLE_CUFFT_BACKEND=True \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_SHARED_LIBS=False
cmake --build build
cmake --install build

and then compiled my test file like this:

icpx \
my_dft_example.cpp \
-fsycl \
-fsycl-targets=nvptx64-nvidia-cuda \
-I/path/to/oneMath/install/include \
/path/to/oneMath/install/lib/libonemath_dft_cufft.a \
-L/usr/local/cuda/lib64 \
-lcuda \
-lcufft \
-Werror \
-Wno-sycl-target

I verified my test file compiles and runs correctly. It looks like the -Wsycl-target diagnostic might be just wrong in this case - we will look into this issue and would recommend disabing it for now.