Atomics on AMD MI250x

Hello,

How are the SYCL Atomic operations implemented on MI250x?

Hello,

I found out the hard way that in hip/rocm theatomicAdd function is very slow unless using the flag -munsafe-fp-atomics or using the alternative function unsafeAtomicAdd.
in a hackathon someone from AMD explained me that the unsafe atomic will still produce the correct results as long as I am using hipmalloc to allocate the gpu variables.
These two versions give very high difference in execution time for kernels “abusing” atomics.

In CUDA atomicAdd is hardware supported.

Which version does ONEAPI uses on MI250x?

Cristian

Hi @Krisachi ,

Hopefully our website should have the information you’re looking for, in the AMD hardware section: Common Optimizations - Guides - oneAPI for AMD GPUs - Products - Codeplay Developer

Let us know if this works for you!

Duncan.