Hello,
We are using the NVIDIA A100 GPU with Codeplay plugin for SYCL. we are having hard time implementing the atomics since it gives wrong results. This is the basic version of the code we would like to implement where if the usm_updating_mask element is 1 , the index of it will be pushed to the usm_pipe
for( int tid =0; tid < no_of_nodes;tid++){
char condition = usm_updating_mask[tid];
if(condition){
usm_pipe[iter] = tid;
iter++;
}
}
d_over[0] = iter;
So our initial attempt was this but our results are not valid :
const size_t local_size = 16; // Number of work-items per work-group
const size_t global_size = ((no_of_nodes + local_size - 1) / local_size) * local_size;
// Setup the range
nd_range<1> range(global_size, local_size);
auto e = q.parallel_for<class PipeGenerator>(range, [=](nd_item<1> item) [[intel::kernel_args_restrict]] {
int tid = item.get_global_linear_id();
sycl::atomic_ref<unsigned int, sycl::memory_order_relaxed,
sycl::memory_scope_device,sycl::access::address_space::global_space>
atomic_op_global(d_over[0]);
if (tid < no_of_nodes) {
char condition = usm_updating_mask[tid];
if(condition){
usm_pipe[atomic_op_global] = tid;
atomic_op_global+=1;
}
}
});
Is it valid to update atomic_op_global and also at the same time access it ?
Thank you