Compile a SYCL program for both INTEL and NVIDIA GPUs

Hello all,

I am trying to compile a SYCL project using the below provided Makefile. The machine contains both an INTEL and NVIDIA GPUs and for AOT compilation I use the -fsycl-targets flags.

Here is the sycl-ls output:

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
[opencl:cpu:1] Intel(R) OpenCL, Genuine Intel(R) CPU 0000%@ 3.0 [2023.16.7.0.21_160000]
[opencl:cpu:2] Intel(R) OpenCL, Genuine Intel(R) CPU 0000%@ 3.0 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 3.0 [23.43.27642.40]
[opencl:acc:4] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.8 [CUDA 12.4]
[ext_oneapi_cuda:gpu:1] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.8 [CUDA 12.4]

Using -fsycl-targets=spir64,nvptx64-nvidia-cuda the compiler throws the following error

icpx -fsycl -fsycl-targets=spir64,nvptx64-nvidia-cuda -Wno-unknown-cuda-version -DMKL_ILP64  -qmkl=parallel -qtbb -O3  -Iinclude -c src/bandwidth_reduction.cpp -o obj/bandwidth_reduction.o
/usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/bin-llvm/clang-offload-bundler: error: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'Intel.oneAPI.DPCPP.Compiler_2023.2.0' Reader: 'Intel.oneAPI.DPCPP.Compiler_2023.2.0')
icpx: error: clang-offload-bundler command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:63: obj/bandwidth_reduction.o] Error 1

Using -fsycl-targets=nvptx64-nvidia-cuda,spir64 , the compilation is successful but the program fails while running on the Intel GPU with the following error (It runs fine on the NVIDIA GPU):

Device Name: Intel(R) Data Center GPU Max 1100
Device Set || Name: Intel(R) Data Center GPU Max 1100
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Native API failed. Native API returns: -46 (PI_ERROR_INVALID_KERNEL_NAME)
Aborted (core dumped)

Using -fsycl-targets=spir64 , the program compiles and runs on the Intel GPU just fine.

Could someone please explain to me what I am missing here ?

Here is the Makefile:

# Makefile for SYCL project
#The Makefile should be present in the directory which contains the source files in "src" directory and include files (.h) in "include" directory

# Compiler
ICPX := icpx

# Common Compiler flags
ICPXFLAGS :=   -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64 -Wno-unknown-cuda-version -DMKL_ILP64  -qmkl=parallel -qtbb
LINKFLAGS :=   -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64 -Wno-unknown-cuda-version -qmkl=parallel -qtbb

# Debug flags
ICPXFLAGS_DEBUG := -g -O0

# Release flags
ICPXFLAGS_RELEASE := -O3

# GPU architecture

# Directories
SRC_DIR := src
INCLUDE_DIR := include
OBJ_DIR := obj
BIN_DIR := bin

# Source files
CPP_FILES := $(wildcard $(SRC_DIR)/*.cpp)
H_FILES := $(wildcard $(INCLUDE_DIR)/*.h)

# Object files
OBJ_FILES := $(patsubst $(SRC_DIR)/%.cpp,$(OBJ_DIR)/%.o,$(CPP_FILES))

# Target executables
TARGET_DEBUG := $(BIN_DIR)/ex_sycl_debug.exe
TARGET_RELEASE := $(BIN_DIR)/ex_sycl_release.exe

# Main target
all: debug

# Create the necessary directories
$(OBJ_DIR) $(BIN_DIR):
	mkdir -p $@

debug: ICPXFLAGS += $(ICPXFLAGS_DEBUG)
debug: LINKFLAGS += $(ICPXFLAGS_DEBUG)
debug: $(OBJ_DIR) $(BIN_DIR) $(TARGET_DEBUG)

# Rule for C++ files
$(OBJ_DIR)/%.o: $(SRC_DIR)/%.cpp $(H_FILES)
	$(ICPX) $(ICPXFLAGS) $(ARCH) -I$(INCLUDE_DIR) -c $< -o $@

# Linking step
$(TARGET_DEBUG): $(OBJ_FILES)
	$(ICPX) $(ICPXFLAGS) $(LINKFLAGS) $(ARCH) $^ -o $@

release: ICPXFLAGS += $(ICPXFLAGS_RELEASE)
release: LINKFALGS += $(ICPXFLAGS_RELEASE)
release: $(OBJ_DIR) $(BIN_DIR) $(TARGET_RELEASE)

# Rule for C++ files
$(OBJ_DIR)/%.o: $(SRC_DIR)/%.cpp $(H_FILES)
	$(ICPX) $(ICPXFLAGS) $(ARCH) -I$(INCLUDE_DIR) -c $< -o $@

# Linking step
$(TARGET_RELEASE): $(OBJ_FILES)
	$(ICPX) $(ICPXFLAGS) $(LINKFLAGS) $(ARCH) $^ -o $@

# Clean target
clean:
	rm -rf $(OBJ_DIR)
clean_all:
	rm -rf $(OBJ_DIR) $(TARGET_DEBUG) $(TARGET_RELEASE)
clean_debug:
	rm -rf $(OBJ_DIR) $(TARGET_DEBUG)
clean_release:
	rm -rf $(OBJ_DIR) $(TARGET_RELEASE)```

Hi @sidarth,
the first issue with the ordering of targets is a known compiler bug in the version you’re using. It is mentioned in our troubleshooting page. This bug is fixed in newer versions of the compiler, but a simple workaround is to use the ordering which works, as you have already found.

I’m not sure about the second error. Would you be able to provide a minimal example code producing this issue?

Is this exception coming from a submission of a kernel you wrote, or from an external library? I see you’re linking the Intel MKL library and I’m wondering if it could be related. You can find the source of the exception in a debugger:

$ gdb-oneapi --args your-app <your args>
(gdb) catch throw
(gdb) run
# when it stops on the breakpoint:
(gdb) bt

Thanks,
Rafal

1 Like

Hello @rbielski ,
Thank you for the response. For the second error, I will try to reproduce this error with a smaller code snippet.
Here is the gdb-oneapi run output (It doesn’t reach the part which has MKL calls):

GNU gdb (Intel(R) Distribution for GDB* 2023.2.0) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.; (C) 2023 Intel Corp.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.

For information about how to find Technical Support, Product Updates,
User Forums, FAQs, tips and tricks, and other support information, please visit:
<http://www.intel.com/software/products/support/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/ex_sycl_debug.exe...
(gdb) catch throw
Catchpoint 1 (throw)
(gdb) run
Starting program: /mounts/work/snarayanan/save/SYCL/Laplace_code_convert/sycl_conversion/non_unique_face/cvg_version/bin/ex_sycl_debug.exe 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Residual 8.000000e+03
Residual 8.000000e+03
CPU time to solve 7.449459
-------------------
[New Thread 0x7fff765ff640 (LWP 148778)]
[Thread 0x7fff765ff640 (LWP 148778) exited]
warning: Temporarily disabling breakpoints for unloaded shared library "/lib64/libze_intel_gpu.so.1"
[New Thread 0x7fff765ff640 (LWP 148779)]
[New Thread 0x7fff76fde640 (LWP 148780)]
[New Thread 0x7fff73bbc640 (LWP 148781)]
Device Name: Intel(R) Data Center GPU Max 1100
Device Set || Name: Intel(R) Data Center GPU Max 1100
[New Thread 0x7fff508cf640 (LWP 148782)]

Thread 1 "ex_sycl_debug.e" hit Catchpoint 1 (exception thrown), 0x00007fffcdcad612 in __cxa_throw () from /lib64/libstdc++.so.6
(gdb) bt
#0  0x00007fffcdcad612 in __cxa_throw () from /lib64/libstdc++.so.6
#1  0x00007fffcd611728 in void sycl::_V1::detail::plugin::checkPiResult<(sycl::_V1::errc)13>(_pi_result) const ()
   from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#2  0x00007fffcd5f8312 in sycl::_V1::detail::ProgramManager::getOrCreateKernel(long, std::shared_ptr<sycl::_V1::detail::context_impl> const&, std::shared_ptr<sycl::_V1::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, sycl::_V1::detail::program_impl const*) ()
   from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#3  0x00007fffcd64fb73 in sycl::_V1::detail::enqueueImpKernel(std::shared_ptr<sycl::_V1::detail::queue_impl> const&, sycl::_V1::detail::NDRDescT&, std::vector<sycl::_V1::detail::ArgDesc, std::allocator<sycl::_V1::detail::ArgDesc> >&, std::shared_ptr<sycl::_V1::detail::kernel_bundle_impl> const&, std::shared_ptr<sycl::_V1::detail::kernel_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long const&, std::vector<_pi_event*, std::allocator<_pi_event*> >&, _pi_event**, std::function<void* (sycl::_V1::detail::AccessorImplHost*)> const&, _pi_kernel_cache_config) ()
   from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#4  0x00007fffcd6a6f32 in sycl::_V1::handler::finalize()::$_0::operator()() const () from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#5  0x00007fffcd6a2c4c in sycl::_V1::handler::finalize() () from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#6  0x00007fffcd6d24df in void sycl::_V1::detail::queue_impl::finalizeHandler<sycl::_V1::handler>(sycl::_V1::handler&, sycl::_V1::detail::CG::CGTYPE const&, sycl::_V1::event&) ()
   from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#7  0x00007fffcd6d1fdc in sycl::_V1::detail::queue_impl::submit_impl(std::function<void (sycl::_V1::handler&)> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, sycl::_V1::detail::code_location const&, std::function<void (bool, bool, sycl::_V1::event&)> const*) ()
   from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#8  0x00007fffcd6d1406 in sycl::_V1::detail::queue_impl::submit(std::function<void (sycl::_V1::handler&)> const&, std::shared_ptr<sycl::_V1::detail::queue_impl> const&, sycl::_V1::detail::code_location const&, std::function<void (bool, bool, sycl::_V1::event&)> const*) () from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#9  0x00007fffcd6d13c5 in sycl::_V1::queue::submit_impl(std::function<void (sycl::_V1::handler&)>, sycl::_V1::detail::code_location const&) ()
   from /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/lib/libsycl.so.6
#10 0x000000000041aff6 in sycl::_V1::queue::submit<sycl::_V1::queue::parallel_for<sycl::_V1::detail::auto_name, 1, sycl::_V1::ext::oneapi::experimental::properties<std::tuple<> >, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&>(sycl::_V1::nd_range<1>, sycl::_V1::ext::oneapi::experimental::properties<std::tuple<> >, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&)::{lambda(sycl::_V1::handler&)#1}>(sycl::_V1::queue::parallel_for<sycl::_V1::detail::auto_name, 1, sycl::_V1::ext::oneapi::experimental::properties<std::tuple<> >, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&>(sycl::_V1::nd_range<1>, sycl::_V1::ext::oneapi::experimental::properties<std::tuple<> >, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&)::{lambda(sycl::_V1::handler&)#1}, sycl::_V1::detail::code_location const&) (this=0x7fffffff6b60, CGF=..., CodeLoc=...)
    at /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/bin-llvm/../include/sycl/queue.hpp:506
#11 0x000000000041af53 in sycl::_V1::queue::parallel_for<sycl::_V1::detail::auto_name, 1, sycl::_V1::ext::oneapi::experimental::properties<std::tuple<> >, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&>(sycl::_V1::nd_range<1>, sycl::_V1::ext::oneapi::experimental::properties<std::tuple<> >, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&) (
    this=0x7fffffff6b60, Range=..., Properties=..., Rest=...) at /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/bin-llvm/../include/sycl/queue.hpp:1863
#12 0x000000000041a039 in sycl::_V1::queue::parallel_for<sycl::_V1::detail::auto_name, 1, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}>(sycl::_V1::nd_range<1>, GPU_Solver_non_unique_face::compute_time_step()::{lambda(sycl::_V1::nd_item<1>)#1}&&) (this=0x7fffffff6b60, Range=..., Rest=...)
    at /usr/people/shared/tools/centos/7/intel_oneapi/2023.2.1/compiler/2023.2.1/linux/bin-llvm/../include/sycl/queue.hpp:1880
#13 0x0000000000419b6a in GPU_Solver_non_unique_face::compute_time_step (this=0x4da20f0) at src/gpu_solver_non_unique_face.cpp:156
#14 0x0000000000429e33 in main (argc=1, argv=0x7fffffff6e78) at src/main.cpp:101