I wrote a template data class containing buffer and accessor, which can automatically create buffer and the corresponding accessor based on the template arguments. But when I used an instance of that class in SYCL kernel, it emitted errors that SYCL cannot capture non standard_layout class. How can I fix that?
Hi,
Can you post a code snippet of what you are trying to do so that we can figure out a solution for you?
Thanks,
Rod.
Hi, Rod
Here’s what I am trying to do.
template <class T,size_t N>
struct Data{
buffer<T, N> buf;
Data(){
//initialize buf...
}
T& operator()(const int i) const{
//get accessor from buf and return accessor[i];
}
};
struct Func{
Data<..> data;
Func(Data<..> _data):data(_data){
}
void operator()(const int i) const{
//do some computation using data(i)
}
};
class my_kernel;
template <class Functor, size_t N>
void kernel(Functor functor){
//...
cgh.parallel_for<class my_kernel>(nd_range<1>(N,N),[=] (nd_item<1> item) {
size_t global_id = item.get_local_linear_id();
functor(i);
});
//...
}
int main(){
//...
kernel(/*A instance of Func as parameter*/);
//...
}
Is there a way I can make it?
Thanks,
Ryan.
You can format code by indenting it with four spaces. You might also be able to introduce a block with three backquotes (```).
To put it simply, you can’t have buffers in device code. Specifically, types you instantiate on device must be standard_layout
and trivially_copyable
, which buffers are not. In this case you’ll need to split the buffer
and accessor
holding-classes apart. Personally, when programming in SYCL, I find it useful to wrap the buffer class a lot of the time, but just use raw accessors in the kernel interface, so I can just pull the accessor out of the buffer easily then pass it to the kernel.
Thanks, Duncan. It seems like we can only get an accessor from a target buffer in a command groud since the function get_access has a handler as argument. Is it possible to get accessors outside queue.summit() so that I can directly use them in device kernels?
//build a buffer and get its accessor here
queue.submit([&] (cl::sycl::handler& cgh){
//... do something
cgh.parallel_for<class my_kernel1>(nd_range<1>(5,5),[=] (nd_item<1> item) {
size_t global_id = item.get_local_linear_id();
//use the accessor here
});
});
In your example, I would simply construct the accessor at the line // ... do something
, since that code is still host code at that point. If you want an alternative you could look for placeholder accessors in the specification, though they’re used much less frequently (but might match your problem better).
I had considered using a placeholder accessor. But if I am not wrong, we need to use cgh.require() to link it with a buffer, which means it still happens inside a command groud just like using get_access()
at the line // ... do something
.
Yes, so in this instance I would create the accessor at the // do something
line. Is there a reason why you can’t do that in this instance?
I want to build an abstract above SYCL. Just like the codes I fisrt posted(I have formatted it), I want to design a template struct myData
that contains accesor and the underlining data can be accessed using operator()(const int i)
. Then I will have a function object class Func
which have a myData type instance fData as member so that I can do some computation for the specific data of fData using operator()(const int i)
(such as fData(i) = i * 5 etc). Once I have myData
and Func
set, I can just simple pass the Func
object to device kernel to run.
To put it simply, I want to simplify programming of SYCL by exposing a data api (what kind of data you wanna set) and a function object api (what kind of computation you wanna run on the set data).
Hi @Ryan_xjh,
I am not sure you can write the code as you have intended it here. Accessor construction must happen on the host, and buffers can only exist on the host, so you can’t really have your Data
struct as it currently exists in SYCL.
I’m kind of assuming that you want to do this abstraction in order to paper over the differences between SYCL and some other kind of compute solution, like CUDA or similar, otherwise I don’t really see the point. In my opinion, ultimately you just end up sort of rewriting the API you’re abstracting but a little different for a small gain. I might be wrong in this instance, of course, but in this answer I am suggesting a nontrivial amount of work which (again IMO) generally would only pay off if there were some radically different systems underneath that you wanted to pretend were the same to the users writing compute kernels.
Looking at your sample in particular, I you have // do some computation using data(i)
in Func::operator()(const int)
. If that’s the case, instead of passing in an index, and letting the user operator dereference data_
(which I’m assuming in 90% of cases would be return acc[i]
), why not pass them a (mutable?) reference to a templated data type? That way you would be able to write the SYCL code which constructs the accessors in a way that you control, passing them to the user-written function object. You’d then be able to basically eliminate the Data
class.
Of course if you’d like to do more complicated stuff like halos then this idea won’t really work. At this stage, I would consider templating the user kernel on a pointer type of some kind, then instantiating the user template for each API you are abstracting over with the appropriate pointer type (for example, cl::sycl::multi_ptr<T>
). That might get you to the stage where, for example, you could instantiate kernels for both CUDA and SYCL from the same user code. In this scheme Data
would be a using
declaration, or in a more extreme case a wrapper that papers over the differences between how APIs treat data buffers.
Honestly, though, it’s quite a lot of faff. To get the library Eigen working on both CUDA and SYCL (which we added support for) was a lot of development effort.
Apologies if I have misunderstood your intentions here, but I’d rather not suggest some flippantly easy solution that ends up biting you later on.
Actually on second thoughts: you could maybe split Data
into two classes, one which is a HostData
class, and provides some DeviceData
class from a member function, which hide the buffer
and accessor
respectively. Then you really are just rewriting the SYCL API though! (In this scheme, you would use placeholder accessors; in the HostData::makeDeviceData()
function you would bind the buffer to the accessor, then require()
it at the same time - then the user Func
would be able to access the member through DeviceData::operator()
but again I see this as a straight rewrite of SYCL.)