Hello,
I’ve recently tested NVIDIA plugin for oneAPI with BabelStream benchmarks.
The result bandwidths matches native CUDA very well.
However, there is a discrepancy between native HIP and SYCL which I wish to understand and rectify.
Spec:
- CPU: AMD Ryzen 7 5800X
- GPU: Radeon RX Vega 64 (gfx900)
- OS: Rocky Linux 9.2 with 5.14.0 kernel
Software stack:
- oneAPI: basekit 2023.2.1
- ROCm: rocm-hip-sdk5.4.3
rocminfo
*******
Agent 2
*******
Name: gfx900
Uuid: GPU-021505f72c144864
Marketing Name: AMD Radeon RX Vega
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 7 5800X 8-Core Processor 3.0 [2023.16.7.0.21_160000]
[ext_oneapi_hip:gpu:0] AMD HIP BACKEND, AMD Radeon RX Vega gfx900:xnack- [HIP 50422.80]
BabelStream
$ git clone https://github.com/UoB-HPC/BabelStream
HIP Bandwidth
$ cd BabelStream/src/hip/
$ hipcc -O2 -DHIP -I. -I.. HIPStream.cpp ../main.cpp -o stream.x
$ ./stream.x
BabelStream
Version: 4.0
Implementation: HIP
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using HIP device AMD Radeon RX Vega
Driver: 50422804
Function MBytes/sec Min (sec) Max Average
Copy 393206.592 0.00137 0.00141 0.00138
Mul 395044.431 0.00136 0.00141 0.00137
Add 359068.066 0.00224 0.00229 0.00225
Triad 358939.871 0.00224 0.00229 0.00225
Dot 363135.835 0.00148 0.00149 0.00148
SYCL Bandwidth
$ cd BabelStream/src/sycl2020
$ icpx -O2 -DSYCL2020 \
-fsycl -fsycl-targets=amdgcn-amd-amdhsa \
-Xsycl-target-backend=amdgcn-amd-amd --offload-arch=gfx900 \
-I. -I.. SYCLStream2020.cpp ../main.cpp -o stream.x
$ ./stream.x --device 2 BabelStream
Version: 4.0
Implementation: SYCL 2020
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device AMD Radeon RX Vega
Driver: HIP 50422.80
Function MBytes/sec Min (sec) Max Average
Copy 205989.209 0.00261 0.00297 0.00273
Mul 201782.697 0.00266 0.00299 0.00275
Add 220715.935 0.00365 0.00394 0.00371
Triad 226468.311 0.00356 0.00402 0.00367
Dot 348870.293 0.00154 0.00342 0.00166
The theoretical bandwidth of Vega 64 is 484 GB/s.
Efficiency are 74% and 46% for native HIP and SYCL, respectively.
Perhaps the paralle_for
kernel is not well mapped to AMD hardwares ?
Thanks.