ComputeCpp 2.5 Error using GPU devices: [ComputeCpp:RT0102]

Hello there!

I’m here again asking about SYCL, recently I attended the IWOCL 2021 (amazing by the way). Today I decided to work with ComputeCpp 2.5 and just for fun I tried to run the Code Exercises from the IWOCL 2021: SYCL tutorial

and I noticed that Exercise 03 and Exercise 05 did not work.

The Exercise 05 with verbose mode gives me this error message:

 ComputeCpp>  (CE 2.5.0 2021/04/23 )
ComputeCpp>  (Selected Platform: Intel(R) OpenCL HD Graphics )
ComputeCpp>  (Selected Device: Intel(R) UHD Graphics 630 [0x3e9b] )
Chosen device: Intel(R) UHD Graphics 630 [0x3e9b]
ComputeCpp> Error: [ComputeCpp:RT0102] The requested kernel name could not be found (Unable to retrieve kernel function, is integration header included? )
ComputeCpp> -> Triggered at: program.h (491)
Exception caught: Error: [ComputeCpp:RT0102] The requested kernel name could not be found (Unable to retrieve kernel function, is integration header included? )

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
main2.exe is a Catch v2.12.1 host application.
Run with -? for options

-------------------------------------------------------------------------------
intel_gpu_device_selector
-------------------------------------------------------------------------------
solution.cpp:37
...............................................................................

solution.cpp:74: FAILED:
  REQUIRE( r == 42 )
with expansion:
  0 == 42

===============================================================================
test cases: 1 | 1 failed
assertions: 1 | 1 failed

Additionally I decided to writhe this program:
#include<CL/sycl.hpp>
#include

class intel_gpu_selector : public sycl::device_selector {
 public:
  int operator()(const cl::sycl::device& dev) const override {
    if (dev.has(cl::sycl::aspect::gpu)) {
      auto vendorName = dev.get_info<cl::sycl::info::device::vendor>();
      if (vendorName.find("Intel") != std::string::npos) {
        return 1;
      }
    }
    return -1;
  }
};
int main(){
        auto Q = cl::sycl::queue{intel_gpu_selector{}};
        //cl::sycl::device
        std::cout << "Chosen device: "
                  << Q.get_device().get_info<cl::sycl::info::device::name>()<<std::endl;

    //Create a buffer of 4 doubles and initialize it from a host pointer
      double mat1[4] = {1.2,2.2,3.3,4.4};
      double mat2[4] = {1.2,2.2,3.3,4.4};
      double mat3[4] = {0.0,0.0,0.0,0.0};
      constexpr int N = 4;
      {
      cl::sycl::buffer<double, 1> buff1{mat1, cl::sycl::range<1>{4}}; 
      cl::sycl::buffer<double, 1> buff2{mat2, cl::sycl::range<1>{4}}; 
      cl::sycl::buffer<double, 1> buff3{mat3, cl::sycl::range<1>{4}}; 
	
	
      //
      Q.submit([&](cl::sycl::handler &cgh){

        auto acc_mat1 = buff1.get_access<cl::sycl::access::mode::read>(cgh);// Accessor for Mat1
        auto acc_mat2 = buff2.get_access<cl::sycl::access::mode::read>(cgh); //Accessor for mat2
        auto acc_mat3 = buff3.get_access<cl::sycl::access::mode::write>(cgh); //Accessor for mat3


        cgh.parallel_for(cl::sycl::range<1>(N), [=](cl::sycl::id<1> index){
			
          acc_mat3[index] = acc_mat1[index] + acc_mat2[index];
        });
	
	
      });	

      }
      for(int i=0; i<4; i++)
        std::cout<<"mat3 = "<<mat3[i]<<std::endl;


      return 0;
}

But the error message with verbose mode is the same :pensive:

ComputeCpp>  (CE 2.5.0 2021/04/23 )
ComputeCpp>  (Selected Platform: Intel(R) OpenCL HD Graphics )
ComputeCpp>  (Selected Device: Intel(R) UHD Graphics 630 [0x3e9b] )
Chosen device: Intel(R) UHD Graphics 630 [0x3e9b]
ComputeCpp> Error: [ComputeCpp:RT0102] The requested kernel name could not be found (Unable to retrieve kernel function, is integration header included? )
ComputeCpp> -> Triggered at: program.h (491)
terminate called after throwing an instance of 'cl::sycl::invalid_object_error'
Aborted (core dumped)

NOTE: The errors above did not show when I use the CPU device
I’m compiling in this way since I have ComputeCpp in my computer:

compute++ function_evaluation.cpp -I /usr/local/computecpp2.5/include/ -lOpenCL -L /usr/local/computecpp2.5/lib/ -lComputeCpp -std=c++17 -o main.exe -DSYCL_LANGUAGE_VERSION=2020

Thanks for your help!

-Juan

Hello there, I’m ashamed.

I was reading the ComputeC++ compiler manual and I needed to add:

-sycl-device
to my compilation :frowning:

So, when I add -sycl-device I get this message:

remark: [Computecpp:CC0027]: Some memcpy/memset intrinsics added by the llvm optimizer were replaced by serial functions. This is a workaround for OpenCL drivers that do not support those
      intrinsics. This may impact performance, consider using -no-serial-memop. [-Rsycl-serial-memop]

Does anyone knows what it means? My code works fine, but I would like to know.

Additionally I noticed something, using -sycl-device the kernel needs to be templated

cgh.parallel_for<mat_add>(cl::sycl::range<1>(N), [=](cl::sycl::id<1> index)

why is that? I thought that with SYCL 2020 the kernel did not need it.

Thanks and sorry for my silly fails

Sorry I responded to you separately but going to add a response here for others. We would recommend using the CMake files provided with the computecpp-sdk which will choose the most appropriate compiler flags for your target hardware. This warning is saying that you should try adding -no-serial-memop to your compute++ command to get the best performance.