I tried to make a basic comparison application to see the different speed in computation of vector add and multiply on CPU vs Nvidia GPU.
This is what I have, I take 2x 250 * 1024 * 204 input numbers, add them and multiply by 0.3f and write the result.
I tried this on Intel build-in GPU, and on Nvidia using the compute++ -sycl -sycl-target ptx64 option.
However results confuse me, on all targets,
first of all on the CPU (using cpu_selector), it sometimes runs, in 3-10s seconds (hot/cold) and sometimes, reports:
Run on CPU selection
Vector adding 256000k elements
Error: [ComputeCpp:RT0107] Failed to create program from binary
Time: 1288ms
using the gpu_selector, it should run on the P600 (see below), but it always reports:
error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 000002A1FAD58B60 device 0 (size 65):
error : Binary format for key='0', ident='' is not recognized
I’m unsure why you are seeing these issues with your devices. What I would recommend is that you print out the device being used to validate it is the one you are expecting. There’s some code that shows how to do this in the sample code here
For the issue you are seeing with the NVidia GPU can you open the .sycl file that is generated during the build stage and check that it shows the correct instruction set.
At around line 34 you’ll see something like this:
Thanks! That was exactly what I was looking for!
It turns our really weird things happen if you create more then one queue, even if the first queue leaves scope before you make the second, it seems to mess up maybe the device selection, or other weirdness, I’m getting inconsistent error messages.
I it now works fine on:
Intel GPU
Native CPU
Host device
But still not on Nvidea, I checked the sycl file, and it does contains: unsigned char SYCL_main_cpp_bin_nvptx64[] = {…
However, running it on Nvidia result in:
D:\>"C:\Users\Jan Wilmans\source\repos\ComputeCpp SYCL C++1\x64\Release\ComputeCpp SYCL C++1.exe" 0
Allocate memory...
0) gpu: Quadro P600
Running on Quadro P600
Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 000001A8FDAB0F80 device 0 (size 65):
error : Binary format for key='0', ident='' is not recognized>)
This is a very generic error message I guess, so no idea why that is happening… however, no matter if I pass -sycl-target ptx64 or not the error message is the same, so I think its erroring out before it gets to the pointer of actually executing anything on the GPU.
ok, I figured out why this second queue was behaving different: because I was messing it up with my hacky custom_selector. I would like to know what the best way is to list all devices and select exactly one from the command line?
std::vector<device> devices;
// Get list of platforms
std::vector<platform> platforms = platform::get_platforms();
// Enumerate devices
for (unsigned int i = 0; i < platforms.size(); i++)
{
std::vector<device> plat_devices = platforms[i].get_devices();
devices.insert(devices.end(), plat_devices.begin(), plat_devices.end());
}
Then parse the int from the command line and get the device from devices array.
Testing on a different machine at work, I don’t see it either… I will try to narrow it down when I get back home, at least now I know its not the expected behavior…
I have come across the issue when using CMake, that default CMake behavior is to use “RelWithDebInfo” configuration when requesting Release builds. However, this triggers Debug layout for STL types, but the API types coming out of ComputeCpp are proper Debug/Release types. Trying to query for a std::vectorcl::sycl::XXX with wrong layout can result in criptic behavior. Make sure your Debug build is using the Debug ComputeCpp libraries, and Release using the Release ones.