I’ve been trying to launch a kernel using “nd_range”. I want to launch a kernel and ND-range of 16x16 divided into work-groups of 4x4. The code below shows that
int main (){
constexpr int G= 16;
constexpr int L = 4;
float orbital[G*G];
{//SYCL scope
cl::sycl::gpu_selector device; //Gpu device
cl::sycl::queue Q{device}; //command queue with the selected device
//Just for information: prints the device in use
std::cout << "Running on " << Q.get_device().get_info<sycl::info::device::name>()<< "\n";
cl::sycl::buffer<float, 1> buffOrb{orbital,cl::sycl::range<1>(G*G)};
Q.submit([&](cl::sycl::handler &cgh){
cgh.parallel_for<orbital_function>(cl::sycl::nd_range<2>(cl::sycl::range<2> {16,16},cl::sycl::range<2> {4,4}), [=](cl::sycl::nd_item<2> index){
});
});
}//END SYCL SCOPE
return 0;
}
But when I compile I get an error with this message:
Running on GeForce GTX 1060 with Max-Q Design
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
Aborted (core dumped)
it’s not quite clear what might be the problem: compile_program_error indicates there might be something wrong with the kernel, but the kernel is empty. I’d recommend adding error checking, you can look at some ComputeCpp SDK samples to see how that’s done.
Additionally, you can try enabling verbose output to get more information from the ComputeCpp runtime, see for example this post on how to enable it.
Have you observed different behavior depending on the ComputeCpp version used? Which version are you using at the moment?
Hi,
Have you added error checking into your code? This will give a more specific error message that we can use to understand where the problem is happening. Peter has pointed to some advice there, but I would also recommend looking at the materials here, sample code and accompanying video here.
ComputeCpp> (CE 2.0.0)
ComputeCpp> (Assuming platform supports SPIR-V)
ComputeCpp> (Selected Platform: NVIDIA CUDA)
ComputeCpp> (Selected Device: GeForce GTX 1060 with Max-Q Design)
Running on GeForce GTX 1060 with Max-Q Design
ComputeCpp> Error: [ComputeCpp:RT0100] Failed to build program
ComputeCpp> -> With OpenCL Code -9999: The Error string is not available
ComputeCpp> -> Triggered at: program_detail.cpp (567)
ComputeCpp> (Build log for program 0x1638580 device 0 (size 69):
ptxas fatal : Unresolved extern function '_Z17get_global_offsetj'
)
ComputeCpp> Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 0x1638580 device 0 (size 69):
ptxas fatal : Unresolved extern function '_Z17get_global_offsetj'
>
)
ComputeCpp> -> With OpenCL Code -9999: The Error string is not available
ComputeCpp> -> Triggered at: program_detail.cpp (778)
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
Aborted (core dumped)
I added error checking as @rod suggested and I had no new messages
Hello there.
The error is saying that the built-in you are trying to use does not exist. So this code cannot execute on the Nvidia GPU through OpenCL and ptx.
Not all OpenCL built-ins have been implemented for the ComputeCpp ptx implementation which is why it is in an experimental status, and we point developers towards using the DPC++ for CUDA implementation we have done if they want to target Nvidia GPUs.
One work around if you are determined to use ComputeCpp with this hardware is to figure out what method call is triggering the error and finding a way to make an alternative call that is working.
Rod.
#include<CL/sycl.hpp>
#include<iostream>
class mat_mul;
// Example device selector, looks for an Intel GPU:
class intel_gpu_selector : public sycl::device_selector {
public:
int operator()(const cl::sycl::device& dev) const override {
if (dev.has(cl::sycl::aspect::gpu)) {
auto vendorName = dev.get_info<cl::sycl::info::device::vendor>();
if (vendorName.find("Intel") != std::string::npos) {
return 1;
}
}
return -1;
}
};
int main (){
//auto Q = cl::sycl::queue{intel_gpu_selector{}}; //Intel GPU Works totally fine
auto Q = cl::sycl::queue{cl::sycl::gpu_selector{}}; //Targeting NVIDIA GPU: here is the problem
std::cout << "Chosen device: " << Q.get_device().get_info<cl::sycl::info::device::name>()<<std::endl;
std::cout<< "Max Work Group Size: "<< Q.get_device().get_info<cl::sycl::info::device::max_work_group_size>()<<std::endl;
int N = 8;
int M = 8;
float c_mat[N*M];
{
//Buffer creation:
cl::sycl::buffer<float, 1> buffc{c_mat, cl::sycl::range<1>{static_cast<size_t>(N*M)}};
Q.submit([&](cl::sycl::handler &cgh){
//Accessro creation:
auto acc_matC = buffc.get_access<cl::sycl::access::mode::write>(cgh);
// I think the problem is the use of "nd_range" with the NVIDIA GPU
auto myRange = cl::sycl::nd_range<2>(cl::sycl::range<2> {8,8},cl::sycl::range<2> {4,4});
cgh.parallel_for<mat_mul>(myRange,[=](cl::sycl::nd_item<2> it){
//NOTE: Even if I leave this part empty I got the same error********
//Mapping 2D to 1D:
int work_item_indx = it.get_local_id(1)*it.get_local_range(0) + it.get_local_id(0);
int work_group_indx = it.get_group(1)*it.get_group_range(0) + it.get_group(0);
int index = (it.get_local_range(0)*it.get_local_range(1))*work_group_indx + work_item_indx;
acc_matC[index] = index;
});
});
}
//Result:
for(int i=0; i<N*M; i++)
std::cout<<"matC = "<<c_mat[i]<<std::endl;
return 0;
}
I added the verbose mode to my compilation and I obtained this:
ComputeCpp> (CE 2.5.0 2021/04/23 )
ComputeCpp> (Assuming platform supports SPIR-V )
ComputeCpp> (Selected Platform: NVIDIA CUDA )
ComputeCpp> (Selected Device: GeForce GTX 1060 with Max-Q Design )
Chosen device: GeForce GTX 1060 with Max-Q Design
Max Work Group Size: 1024
ComputeCpp> (Creating program for binary nvptx64 )
ComputeCpp> Error: [ComputeCpp:RT0100] Failed to build program
ComputeCpp> -> With OpenCL Code -9999: The Error string is not available
ComputeCpp> -> Triggered at: program_detail.cpp (581)
ComputeCpp> (Build log for program 0x22ab840 (size: 69) ptxas fatal : Unresolved extern function '_Z17get_global_offsetj'
)
ComputeCpp> Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 0x22ab840 (size: 69) ptxas fatal : Unresolved extern function '_Z17get_global_offsetj'
>
)
ComputeCpp> -> With OpenCL Code -9999: The Error string is not available
ComputeCpp> -> Triggered at: program_detail.cpp (792)
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
Aborted (core dumped)
There is no native support of global offset in PTX and so this error is being triggered by the use of nd_range. Your only workaround would be to avoid the use of nd_range in this scenario.