This article explains how to debug your code with GDB on the host device by instantiating a sycl::queue with the sycl::host_selector{}.
However, when trying to do this with the following snippet of code:
#include <CL/sycl.hpp>
using namespace sycl;
static const int N = 16;
int main()
{
queue q(sycl::host_selector{});
std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;
std::vector<int> v(N);
for(int i=0; i<N; i++) v[i] = i;
buffer<int, 1> buf(v.data(), v.size());
q.submit([&] (handler &h)
{
auto A = buf.get_access<access::mode::read_write>(h);
h.parallel_for(range<1>(N), [=](id<1> i)
{
A[i] *= 2;
});
});
buf.get_access<access::mode::read>(); // <--- Host Accessor to Synchronize Memory
for(int i=0; i<N; i++) std::cout << v[i] << std::endl;
return 0;
}
I get the runtime error:
No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html -1 (PI_ERROR_DEVICE_NOT_FOUND)
and a compiler warning about host_selector not being supported anymore.
So I think I understand that host_selector isn’t usable anymore but is there an alternative to debug a SYCL application using GDB? I couldn’t find anything on the internet.
I managed to debug this test application with the help of sycl::out and printing values in the terminal but that’s far less convenient than being able to use GDB.
Is there a recent alternative to host_selector and the host device in general ?
I’m not sure how I can check the version of my SYCL / Intel DPCPP installation but all SYCL headers are installed in /opt/intel/oneapi/compiler/2023.2.1/linux/include/sycl/ so I assume this is a 2023 version of SYCL. I installed Intel Base OneAPI kit using this method.
Hi @Adhesive_Bagels,
indeed sycl::host_selector
is not part of the SYCL 2020 specification, which is implemented by the DPC++ compiler (including the compiler version you have, 2023.2.1).
In SYCL 2020, the way to run on a CPU would be to use a CPU target device. You can select one using the built-in CPU device selector sycl::cpu_selector_v
like this:
sycl::queue q{sycl::cpu_selector_v};
q.submit([&](handler& cgh) {
// your task here
});
or with the DPC++ implementation you can also use the default device selector:
sycl::queue q{};
q.submit([&](handler& cgh) {
// your task here
});
and narrow down the list of available devices when starting the application with the environment variable ONEAPI_DEVICE_SELECTOR=*:cpu
documented here.
The CPU device is implemented by the DPC++ in Intel’s oneAPI Base Toolkit using the OpenCL backend and drivers. If you installed the toolkit through a package manager, the OpenCL CPU support should be already available. You can confirm by running sycl-ls
which should list a opencl:cpu
device like this (among others):
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900K 3.0 [2023.16.6.0.22_223734]
Hi @rbielski , thanks for your answer.
I could select the CPU for the execution of my code using cpu_selector_v
but I still couldn’t debug my code line by line using GDB (using the integration of GDB in my QtCreator IDE to be precise).
I tried using GDB on the commandline (and not through QtCreator) but even after putting down a breakpoint in the code of my functor kernel class (and compiling the code with cpu_selector_v
) GDB doesn’t break.
How can I debug my code line by line?
Hi @Adhesive_Bagels,
yes, it should certainly work. Did you compile with -g -O0
for debugging? The following example code works for me:
#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
constexpr static size_t N{1024};
int main() {
std::vector<int> v(N);
for (size_t i{0}; i<N; ++i) {
v[i] = i;
}
sycl::queue q{sycl::cpu_selector_v};
sycl::buffer buf{v};
q.submit([&](sycl::handler& cgh) {
auto acc{buf.get_access(cgh,sycl::read_write)};
cgh.parallel_for(N, [=](sycl::id<1> id) {
int value{acc[id]};
acc[id] = 2 * value;
acc[id] += 1;
});
});
auto acc{buf.get_host_access(sycl::read_only)};
for (size_t i : {0, 63, 255, 511, 1023}) {
std::cout << "v[" << i << "] = " << acc[i] << std::endl;
}
return 0;
}
Compiled and debugged in the following way:
$ icpx -fsycl -g -O0 -o test ./test.cpp
$ gdb ./test
(gdb) list test.cpp:19
14 sycl::buffer buf{v};
15 q.submit([&](sycl::handler& cgh) {
16 auto acc{buf.get_access(cgh,sycl::read_write)};
17 cgh.parallel_for(N, [=](sycl::id<1> id) {
18 int value{acc[id]};
19 acc[id] = 2 * value;
20 acc[id] += 1;
21 });
22 });
23
(gdb) break test.cpp:19
Breakpoint 1 at 0x406502: file ./test.cpp, line 19.
(gdb) run
Starting program: /tmp/test
[Switching to Thread 0x7fff923ff640 (LWP 12195)]
Thread 8 "test" hit Breakpoint 1, main::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::id<1>)#1}::operator()(sycl::_V1::id<1>) const (this=0x7fff923fe230, id=...) at test.cpp:19
19 acc[id] = 2 * value;
(gdb) print id
$1 = {<sycl::_V1::detail::array<1>> = {common_array = {448}}, <No data fields>}
(gdb) print value
$2 = 448
(gdb) continue
Continuing.
[Switching to Thread 0x7fff5bfff640 (LWP 12210)]
Thread 23 "test" hit Breakpoint 1, main::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::id<1>)#1}::operator()(sycl::_V1::id<1>) const (this=0x7fff5bffe230, id=...) at test.cpp:19
19 acc[id] = 2 * value;
(gdb) print id
$3 = {<sycl::_V1::detail::array<1>> = {common_array = {256}}, <No data fields>}
(gdb) print value
$4 = 256
You can see I could set a breakpoint and stop there and investigate the stack inside the kernel function in different threads.
I’m using the DPC++ compiler version 2023.2.1 and gdb version 12.1.
Hope this helps,
Rafal
Even with your simple example, I cannot seem to get my GDB to hit the breakpoint:
$ icpx -fsycl -g -O0 -o test ./test.cpp
$ gdb /test
(gdb) list test.cpp:19
14 sycl::buffer buf{v};
15 q.submit([&](sycl::handler& cgh) {
16 auto acc{buf.get_access(cgh,sycl::read_write)};
17 cgh.parallel_for(N, [=](sycl::id<1> id) {
18 int value{acc[id]};
19 acc[id] = 2 * value;
20 acc[id] += 1;
21 });
22 });
23
(gdb) break test.cpp:19
Breakpoint 1 at 0x406692: file ./test.cpp, line 19.
(gdb) run
Starting program: /home/bagels/test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdc3ac700 (LWP 8472)]
[New Thread 0x7fffd88c2700 (LWP 8473)]
[New Thread 0x7fffd84c1700 (LWP 8474)]
[New Thread 0x7fffc5f55700 (LWP 8475)]
[New Thread 0x7fffc5b54700 (LWP 8476)]
[New Thread 0x7fffc5352700 (LWP 8478)]
[New Thread 0x7fffc5753700 (LWP 8477)]
[New Thread 0x7fffc4f51700 (LWP 8479)]
//v[0] = 1
v[63] = 127
v[255] = 511
v[511] = 1023
v[1023] = 2047
[Thread 0x7fffdc3ac700 (LWP 8472) exited]
[Thread 0x7fffc5352700 (LWP 8478) exited]
[Thread 0x7fffc4f51700 (LWP 8479) exited]
[Thread 0x7fffc5753700 (LWP 8477) exited]
[Thread 0x7fffc5b54700 (LWP 8476) exited]
[Thread 0x7fffc5f55700 (LWP 8475) exited]
[Thread 0x7fffd88c2700 (LWP 8473) exited]
[Thread 0x7ffff596ff80 (LWP 8468) exited]
[Inferior 1 (process 8468) exited normally]
My GDB doesn’t hit the breakpooint. Maybe this has to do with my version of GDB? I’m using GDB 9.2 but this post gives a working example with GDB 8.1.
I’m also using DPC++ 2023.2.1.
Hi @Adhesive_Bagels,
that’s odd, but could indeed be due to the gdb version or due to the OpenCL driver version. Could you post the output of your sycl-ls
and clinfo
?
Hi @rbielski
Here’s the output of sycl-ls
:
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 3.0 [2023.16.7.0.21_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics 3.0 [23.30.26918.9]
and the pastebin of clinfo
.
Hi @Adhesive_Bagels,
thanks for the extra details, your OpenCL installation looks good. I just tested gdb 9.2 on Ubuntu 20.04 and confirmed that indeed the breakpoint doesn’t work in that version. I see the same behaviour as you.
Fortunately, the oneAPI base toolkit comes packaged with a gdb-oneapi
installation which is built on top of a recent gdb
version (gdb 13.1 for oneAPI 2023.2). It’s a version of gdb
that’s extended by functionality to debug Intel GPUs, but it still contains all the elements of the standard gdb
. Just use it like the regular version:
gdb-oneapi ./test
(gdb) break test.cpp:19
(gdb) run
Hi @rbielski,
Using gdb-oneapi
works as expected and the debugger hits the breakpoint.
Thanks for the help!