br-ko
24 October 2024 11:03
1
Hi,
I followed the steps from the website and configured everything the example on the website works.
Also the p2p_access example on the intel/llvm repository works with small changes :
$ ./a.out
Running on devices:
0: AMD Instinct MI210
1: AMD Instinct MI210
PASS
But when I run my code which previously worked with NVIDIA GPUs runs perfectly fine with 1 GPU but it raises this error message for 2 GPU version :
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_hip.so [ PluginVersion: 15.49.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 15.47.1 ]
UR HIP ERROR:
Value: 400
Name: hipErrorInvalidHandle
Description: invalid resource handle
Function: urDeviceGetGlobalTimestamps
Source Location: /tmp/tmp.GuKRlfBC0o/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/hip/device.cpp:989
terminate called after throwing an instance of 'ur_result_t'
I don’t know how to approach or debug this problem
Thank you in advance
Hi @br-ko ,
which version of DPC++ are you using? Could you also try with a recent open-source daily build from github.com/intel/llvm/releases ? We made some changes to the handling of multiple devices in recent months. The error looks like if a call was made to the HIP backend with a wrong / invalid device context.
If that doesn’t help, would you be able to share a minimal reproducer for this issue? It looks like you’re using two AMD GPUs and are submitting a kernel to a queue with the enable_profiling
property, but it would be helpful to see how you create your queues and submit your kernels.
br-ko
24 October 2024 12:07
3
Thank you @rbielski , I will try the latest version from llvm. I installed via oneapi dpcpp standalone compiler version 2024.2.1
I will give it a try with LLVM and try to create a simple reproducing example
br-ko
24 October 2024 12:30
4
Also, I am just trying to double check from the documentation website if it is up-to-date in this :
is it still the case that AMD GPUs only support work group level atomics unlike the NVIDIA ones where device and system wide atomics are supported ?
That is in fact outdated, thank you for flagging this. All atomic scopes should be supported and we will update our documentation for the next release.
Apologies, it turns out the support was reverted due to some issues so the documentation is correct at the moment. We plan to reinstate the support for device and system scopes in a future release. You can check if a given context supports the scopes with get_info()
, e.g. like this:
#include <sycl/sycl.hpp>
#include <iostream>
int main() {
sycl::queue q{};
auto scopes{q.get_context().get_info<sycl::info::context::atomic_memory_scope_capabilities>()};
std::cout << q.get_device().get_info<sycl::info::device::name>()
<< " supports " << scopes.size() << " atomic memory scopes:" << std::endl;
for (sycl::memory_scope s : scopes) {
switch (s) {
case sycl::memory_scope::work_item : {
std::cout << " work_item" << std::endl;
break;
}
case sycl::memory_scope::sub_group : {
std::cout << " sub_group" << std::endl;
break;
}
case sycl::memory_scope::work_group : {
std::cout << " work_group" << std::endl;
break;
}
case sycl::memory_scope::device : {
std::cout << " device" << std::endl;
break;
}
case sycl::memory_scope::system : {
std::cout << " system" << std::endl;
break;
}
default: {
std::cout << " unknown memory_scope value " << static_cast<int>(s) << std::endl;
break;
}
}
}
}
This indeed prints only:
AMD Instinct MI210 supports 3 atomic memory scopes:
work_item
sub_group
work_group
in recent releases, but will also print device
and system
in the future once the support is added back.
br-ko
26 October 2024 17:45
7
Hi @rbielski thank you for the detailed feedback. I think I found the commit that adds the support for device and system level atomic operations is it this one ? [HIP] Enable device and system memory scope for atomics on HIP · oneapi-src/unified-runtime@08b19b2 · GitHub
I am confused how to get this commit from the intel/llvm repository ? Since for our application it is crucial to have system wide atomics with MI210.
Thank you again,
br-ko
28 October 2024 13:34
8
I forked the unified-runtime and modified only the line I pointed out from the earlier PR and now it prints out that it has device and system support so problem solved
Thanks!
br-ko
28 October 2024 14:00
9
Apologies,
Now with my code I found the issue
#include <sycl/sycl.hpp>
#include <iostream>
#define NUM_GPU 2
int main() {
auto Devs = sycl::device::get_devices(sycl::info::device_type::gpu);
std::vector<sycl::queue> Queues;
// Insert not all devices only the required ones for model
std::transform(Devs.begin(), Devs.begin() + NUM_GPU, std::back_inserter(Queues),
[](const sycl::device &D) { return sycl::queue{D,sycl::property::queue::enable_profiling{}}; });
////////////////////////////////////////////////////////////////////////
if (Devs.size() > 1){
if (!Devs[0].ext_oneapi_can_access_peer(
Devs[1], sycl::ext::oneapi::peer_access::access_supported)) {
std::cout << "P2P access is not supported by devices, exiting."
<< std::endl;
}
}
std::cout <<"\n----------------------------------------"<< std::endl;
std::cout << "Running on devices:" << std::endl;
for(int i =0; i < Queues.size(); i++){
std::cout << i << ":\t" << Queues[i].get_device().get_info<sycl::info::device::name>()
<< std::endl;
}
std::cout <<"----------------------------------------"<< std::endl;
// Enables Devs[x] to access Devs[y] memory and vice versa.
if (Devs.size() > 1){
for(auto gpuID_i = 0; gpuID_i < NUM_GPU; gpuID_i++) {
for(auto gpuID_j = 0; gpuID_j < NUM_GPU; gpuID_j++) {
if (gpuID_i != gpuID_j) {
Devs[gpuID_i].ext_oneapi_enable_peer_access(Devs[gpuID_j]);
}
}
}
}
// sycl::queue q{};
auto scopes{Queues[0].get_context().get_info<sycl::info::context::atomic_memory_scope_capabilities>()};
std::cout << Queues[0].get_device().get_info<sycl::info::device::name>()
<< " supports " << scopes.size() << " atomic memory scopes:" << std::endl;
for (sycl::memory_scope s : scopes) {
switch (s) {
case sycl::memory_scope::work_item : {
std::cout << " work_item" << std::endl;
break;
}
case sycl::memory_scope::sub_group : {
std::cout << " sub_group" << std::endl;
break;
}
case sycl::memory_scope::work_group : {
std::cout << " work_group" << std::endl;
break;
}
case sycl::memory_scope::device : {
std::cout << " device" << std::endl;
break;
}
case sycl::memory_scope::system : {
std::cout << " system" << std::endl;
break;
}
default: {
std::cout << " unknown memory_scope value " << static_cast<int>(s) << std::endl;
break;
}
}
}
}
when I delete the enable_profiling it works but if not it still gives elapsedTime error how could I solve this problem @rbielski ?
Using this : Release DPC++ daily 2024-10-25 · intel/llvm · GitHub
(pre-built)
duncan
28 October 2024 18:54
10
Hi @br-ko , can you paste the actual error you see please? I am not sure we’ve run into this before. Are you saying that when profiling is enabled, running this code exits with an error? Which line causes the error?
br-ko
28 October 2024 19:10
11
Hi @duncan ,
This is the output :
$ rm -rf a.out;clang++ -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a atomic_test.cpp ; ./a.out
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path:
ZE_LOADER_DEBUG_TRACE:0 Drivers Discovered
<HIP>[ERROR]:
UR HIP ERROR:
Value: 400
Name: hipErrorInvalidHandle
Description: invalid resource handle
Function: getElapsedTime
Source Location: /home/aac/llvm-nightly-2024-10-25/build/_deps/unified-runtime-src/source/adapters/hip/device.cpp:31
----------------------------------------
Running on devices:
0: AMD Instinct MI210
1: AMD Instinct MI210
----------------------------------------
AMD Instinct MI210 supports 5 atomic memory scopes:
work_item
sub_group
work_group
device
system
This is the run command :
export DPCPP_HOME=/home/aac/
export PATH=$DPCPP_HOME/llvm-nightly-2024-10-25/build/bin:$PATH;export LD_LIBRARY_PATH=$DPCPP_HOME/llvm-nightly-2024-10-25/build/lib:$LD_LIBRARY_PATH
rm -rf a.out;clang++ -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a atomic_test.cpp ; ./a.out
With this snippet it continues, but with my actual code it stops executing when I try to capture the profiling information with this function :
double GetExecutionTime(const event &e) {
double start_k = e.get_profiling_info<info::event_profiling::command_start>();
double end_k = e.get_profiling_info<info::event_profiling::command_end>();
double kernel_time = (end_k - start_k) * 1e-6; // ns to ms
return kernel_time;
}
...
start_time[gpuID] = executeEvent[gpuID].get_profiling_info<info::event_profiling::command_start>();
EDIT: It works for 1 GPU but fails for 2 GPU
Thank you for the reproducer. We’re getting things ready to try this in a similar setup to yours and will let you know as soon as we know more.
Hi @br-ko ,
that is indeed a runtime bug, many thanks for reporting this. It should be fixed in this PR:
oneapi-src:main
← rafbiels:rafbiels/fix-hip-evbase
opened 04:44PM - 01 Nov 24 UTC
Without any default device in the current thread, all base events were associate… d with device 0, causing failures when used on other devices. Fix this by calling `hipSetDevice` before recording the event.
This issue was [reported by a user](https://support.codeplay.com/t/amd-gpu-sycl-plugin-urdevicegetglobaltimestamps-error/743) who was running on a system with two AMD GPUs and tried to do the following:
```cpp
#include <sycl/sycl.hpp>
int main() {
auto Devs = sycl::device::get_devices(sycl::info::device_type::gpu);
std::vector<sycl::queue> Queues;
for (auto D : Devs) {
Queues.push_back(sycl::queue{D,sycl::property::queue::enable_profiling{}});
}
}
```
Resulting in
```
UR HIP ERROR:
Value: 400
Name: hipErrorInvalidHandle
Description: invalid resource handle
Function: getElapsedTime
Source Location: _deps/unified-runtime-src/source/adapters/hip/device.cpp:31
```
in the constructor of the second queue.
intel/llvm PR: https://github.com/intel/llvm/pull/15964
br-ko
1 November 2024 17:14
14
Hi @rbielski ,
Thank you for the quick fix! have a nice weekend!