That uses the exclusive scan to calculation block total_sum but when I try to copy and use this function in my case produces 0 output what could be the reason ?
Is there any alternative way to achieve prefix sum inside of an nd_range ?
So this is what i want to achieve inside of an nd_range kernel
the oneMKL functions are designed to be called from the host, and will perform an optimised variant of the operation on an array. For the device-side, you can use the exclusive and inclusive scan operations, detailed in the SYCL 2020 specification.