I’ve followed the setup guide for OneAPI-cuda,
and managed to compile and execute the provided simple-sycl-app.cpp on my GTX1660.
Many of my other applications seem to compile fine from the command line with an environment-activated shell (with source /opt/Intel/oneapi/setvars.sh --include-intel-llvm),
but I’m not able to make this work with cmake. I’ve tried the following:
building cmake projects from the source-activated shell and -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang
Activating environments with the OneAPI environment extension in VScode, followed by a cmake-tools configuration with using one of these kits:
All cmake configurations leads to the following error:
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Aborted (core dumped)
Looking at the error produced, is it possible in your CMake’s file, the wrong type of target device is being specified? The parameter value does not match you actual NVidia card.
The compiled binary runs smoothly on the GPU without target specification when compiled from the command line, like in the example. I will test the code in your provided example as soon as I have the GPU available.
successfully initializes environment variables for CMake when configured directly from the shell.
Sourcing the same script prior to opening vscode from the command line solved the issue, CMake (configured with CMake-tools) is now able to find and link the sycl-libraries.
Thanks for the confirmation. Are you able to point us to or provide a minimal reproducer, like a small project where we can try the compilation and figure out where the problem occurs?
Set appropriate target architecture (--cuda-gpu-arch=sm_60 in this case)
Invoking CMake configure and build via CMake-Tools produced these warnings for this minimal example:
[build] clang++: warning: linked binaries do not contain expected 'nvptx64-nvidia-cuda' target; found targets: 'spir64_x86_64-unknown-unknown, nvptx64-nvidia-cuda-sm_60'
Modifying -fsycl-targets=spir64_x86_64,nvptx64-nvidia-cuda to -fsycl-targets=spir64_x86_64,nvptx64-nvidia-cuda-sm_60 results in:
[build] clang++: warning: linked binaries do not contain expected 'nvptx64-nvidia-cuda-sm_60' target; found targets: 'spir64_x86_64-unknown-unknown, nvptx64-nvidia-cuda-sm_60-sm_60'
Additionally this occurs, which I suspect to be caused by the misconfigured environment:
Here is an even simpler example circumventing the -fsycl-targets issue:
CMAKE_MINIMUM_REQUIRED( VERSION 3.25 )
PROJECT( test )
find_package(IntelDPCPP REQUIRED)
add_compile_options("-fsycl")
add_compile_options("-O0")
add_compile_options("-Xsycl-target-backend=nvptx64-nvidia-cuda")
add_compile_options("--cuda-gpu-arch=sm_60")
add_executable( test test.cpp )
#include <CL/sycl.hpp>
#include <vector>
#include <random>
#include <iostream>
//create square multiply kernel
class square_multiply_kernel;
int main()
{
std::vector<int> v(10);
//fill vector with random numbers
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 100);
for (int i = 0; i < 10; i++)
{
v[i] = dis(gen);
}
//square 10 random numbers in parallel in sycl kernel on gpu
//create queue on gpu
sycl::queue q(sycl::gpu_selector{});
//create buffer
sycl::buffer<int, 1> buf(v.data(), sycl::range<1>(v.size()));
//submit command group to queue
q.submit([&](sycl::handler& cgh)
{
//get write access to buffer
auto acc = buf.get_access<sycl::access::mode::write>(cgh);
//execute kernel
cgh.parallel_for<square_multiply_kernel>(sycl::range<1>(v.size()), [=](sycl::id<1> index)
{
acc[index] = acc[index] * acc[index];
});
});
//wait for queue to finish
q.wait();
//print results
for (int i = 0; i < 10; i++)
{
std::cout << v[i] << std::endl;
}
return 0;
}
Which (again) returns the initial error:
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Aborted (core dumped)
Hi Heinzelnisse,
Just to confirm, the problem only occurs when using VSCode and the Intel VSCode extension Environment Configurator to source the VSCode dev environment?
If you use the terminal with sourcing /opt/intel/oneapi.setvars.sh, your CMake configuration and build successfully?
I am trying to establish if it is VSCode Intel Environment Configuration problem or a compiler problem.
We are not able to replicate the issue as we do not have the hardware. How we would like to suggest the following.
The CMake is incorrect because the architecture flags need to be passed when linking. These are the warnings. Appending -sm_60 to the architecture flag -fsycl-targets doesn’t work, need to be passed in separately. This document can help:
It works only when sourcing with --include-intel-llvm as argument, but as mentioned this works fine before launching vscode, in addition to omitting find_package(IntelDPCPP). It would be nice with a similar CMake-config file for the codeplay-configured compilers though, or a modification of IntelDPCPPConfig.cmake that works.
The warnings and runtime error are likely related, so here are a few ideas of what could be the problem:
From the CMake source it looks like the need to set the -fsycl-targets option. If not, then do this.
Though the warnings suggest that -fsycl-targets is set to nvptx64-nvidia-cuda-sm_60, this is not a valid triple anymore, use-sycl-targets=nvptx64-nvidia-cuda with -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_60
Make sure that the gpu_selector is actually picking an Nvidia GPU. Check by using sycl::get_device on the queue() and then get_info on the device.
Add the SYCL flags to add_link_options in CMake.
There are issues with the IntelDCPP and we aim to address them ready for the next release.