Hello,
How are the SYCL Atomic operations implemented on MI250x?
Hello,
I found out the hard way that in hip/rocm theatomicAdd
function is very slow unless using the flag -munsafe-fp-atomics
or using the alternative function unsafeAtomicAdd
.
in a hackathon someone from AMD explained me that the unsafe atomic will still produce the correct results as long as I am using hipmalloc
to allocate the gpu variables.
These two versions give very high difference in execution time for kernels “abusing” atomics.
In CUDA atomicAdd
is hardware supported.
Which version does ONEAPI uses on MI250x?
Cristian