Using nd_range with parallel_for does not work

Hello my SYCL friends, it’s me again,

I’ve been trying to launch a kernel using “nd_range”. I want to launch a kernel and ND-range of 16x16 divided into work-groups of 4x4. The code below shows that

int main (){


constexpr int G= 16;
constexpr int L = 4;
float orbital[G*G];

{//SYCL scope
    cl::sycl::gpu_selector device; //Gpu device
    cl::sycl::queue Q{device}; //command queue with the selected device

    //Just for information: prints the device in use
    std::cout << "Running on " << Q.get_device().get_info<sycl::info::device::name>()<< "\n";
    cl::sycl::buffer<float, 1> buffOrb{orbital,cl::sycl::range<1>(G*G)};

    Q.submit([&](cl::sycl::handler &cgh){

        cgh.parallel_for<orbital_function>(cl::sycl::nd_range<2>(cl::sycl::range<2> {16,16},cl::sycl::range<2> {4,4}), [=](cl::sycl::nd_item<2> index){

        });

    });


}//END SYCL SCOPE





return 0;
}

But when I compile I get an error with this message:

Running on GeForce GTX 1060 with Max-Q Design
 terminate called after throwing an instance of 'cl::sycl::compile_program_error'
 Aborted (core dumped)

I compiled using:

compute++  -sycl-driver -sycl-target ptx64 work_items.cpp -I /usr/local/computecpp/include/ -lOpenCL -L /usr/local/computecpp/lib/ -lComputeCpp -o work_items.exe 

-Juan

Hi,

it’s not quite clear what might be the problem: compile_program_error indicates there might be something wrong with the kernel, but the kernel is empty. I’d recommend adding error checking, you can look at some ComputeCpp SDK samples to see how that’s done.

Additionally, you can try enabling verbose output to get more information from the ComputeCpp runtime, see for example this post on how to enable it.

Have you observed different behavior depending on the ComputeCpp version used? Which version are you using at the moment?

Hello,

This is my ComputeCpp version:

Codeplay ComputeCpp - CE 2.0.0 Device Compiler

My problem arises when I used the

and

because when my kernel uses the

cl::sycl::range<1>(N), [=](cl::sycl::id<1> index)

Everything works fine.

I’m gonna try what you mentioned before

-Juan

Hi,
Have you added error checking into your code? This will give a more specific error message that we can use to understand where the problem is happening. Peter has pointed to some advice there, but I would also recommend looking at the materials here, sample code and accompanying video here.

Hello @peterzuzek @rod

I added the verbose mode and I obtained this:

ComputeCpp>  (CE 2.0.0)
ComputeCpp>  (Assuming platform supports SPIR-V)
ComputeCpp>  (Selected Platform: NVIDIA CUDA)
ComputeCpp>  (Selected Device: GeForce GTX 1060 with Max-Q Design)
Running on GeForce GTX 1060 with Max-Q Design
ComputeCpp> Error: [ComputeCpp:RT0100] Failed to build program
ComputeCpp> -> With OpenCL Code -9999: The Error string is not available
ComputeCpp> -> Triggered at: program_detail.cpp (567)
ComputeCpp>  (Build log for program 0x1638580 device 0 (size 69):
ptxas fatal   : Unresolved extern function '_Z17get_global_offsetj'


)
ComputeCpp> Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 0x1638580 device 0 (size 69):
ptxas fatal   : Unresolved extern function '_Z17get_global_offsetj'


>


)
ComputeCpp> -> With OpenCL Code -9999: The Error string is not available
ComputeCpp> -> Triggered at: program_detail.cpp (778)
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
Aborted (core dumped)

I added error checking as @rod suggested and I had no new messages

Hello there.
The error is saying that the built-in you are trying to use does not exist. So this code cannot execute on the Nvidia GPU through OpenCL and ptx.

Not all OpenCL built-ins have been implemented for the ComputeCpp ptx implementation which is why it is in an experimental status, and we point developers towards using the DPC++ for CUDA implementation we have done if they want to target Nvidia GPUs.

One work around if you are determined to use ComputeCpp with this hardware is to figure out what method call is triggering the error and finding a way to make an alternative call that is working.
Rod.

I’m going to try the DPC++for cuda. I really appreciate your help Rod.

Additionally, I want to do what you mentioned: figure out what method call is triggering the error.

I’ll keep you informed.

Juan.