AMD GPU SYCL Plugin urDeviceGetGlobalTimestamps Error

br-ko · 24 October 2024 11:03

Hi,
I followed the steps from the website and configured everything the example on the website works.
Also the p2p_access example on the intel/llvm repository works with small changes :

$ ./a.out 
Running on devices:
0:      AMD Instinct MI210
1:      AMD Instinct MI210
PASS

But when I run my code which previously worked with NVIDIA GPUs runs perfectly fine with 1 GPU but it raises this error message for 2 GPU version :

SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_hip.so [ PluginVersion: 15.49.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 15.47.1 ]

UR HIP ERROR:
        Value:           400
        Name:            hipErrorInvalidHandle
        Description:     invalid resource handle
        Function:        urDeviceGetGlobalTimestamps
        Source Location: /tmp/tmp.GuKRlfBC0o/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/hip/device.cpp:989

terminate called after throwing an instance of 'ur_result_t'

I don’t know how to approach or debug this problem

Thank you in advance

rbielski · 24 October 2024 11:37

Hi @br-ko,
which version of DPC++ are you using? Could you also try with a recent open-source daily build from github.com/intel/llvm/releases? We made some changes to the handling of multiple devices in recent months. The error looks like if a call was made to the HIP backend with a wrong / invalid device context.

If that doesn’t help, would you be able to share a minimal reproducer for this issue? It looks like you’re using two AMD GPUs and are submitting a kernel to a queue with the enable_profiling property, but it would be helpful to see how you create your queues and submit your kernels.

br-ko · 24 October 2024 12:07

Thank you @rbielski , I will try the latest version from llvm. I installed via oneapi dpcpp standalone compiler version 2024.2.1

I will give it a try with LLVM and try to create a simple reproducing example

br-ko · 24 October 2024 12:30

Also, I am just trying to double check from the documentation website if it is up-to-date in this :

is it still the case that AMD GPUs only support work group level atomics unlike the NVIDIA ones where device and system wide atomics are supported ?

rbielski · 24 October 2024 14:10

~~That is in fact outdated, thank you for flagging this. All atomic scopes should be supported and we will update our documentation for the next release.~~

rbielski · 24 October 2024 14:35

Apologies, it turns out the support was reverted due to some issues so the documentation is correct at the moment. We plan to reinstate the support for device and system scopes in a future release. You can check if a given context supports the scopes with get_info(), e.g. like this:

#include <sycl/sycl.hpp>
#include <iostream>

int main() {
  sycl::queue q{};
  auto scopes{q.get_context().get_info<sycl::info::context::atomic_memory_scope_capabilities>()};
  std::cout << q.get_device().get_info<sycl::info::device::name>()
	    << " supports " << scopes.size() << " atomic memory scopes:" << std::endl;
  for (sycl::memory_scope s : scopes) {
    switch (s) {
      case sycl::memory_scope::work_item : {
        std::cout << "  work_item" << std::endl;
	break;
      }
      case sycl::memory_scope::sub_group : {
        std::cout << "  sub_group" << std::endl;
        break;
      }
      case sycl::memory_scope::work_group : {
        std::cout << "  work_group" << std::endl;
        break;
      }
      case sycl::memory_scope::device : {
        std::cout << "  device" << std::endl;
        break;
      }
      case sycl::memory_scope::system : {
        std::cout << "  system" << std::endl;
        break;
      }
      default: {
        std::cout << "  unknown memory_scope value " << static_cast<int>(s) << std::endl;
	break;
      }
    }
  }
}

This indeed prints only:

AMD Instinct MI210 supports 3 atomic memory scopes:
  work_item
  sub_group
  work_group

in recent releases, but will also print device and system in the future once the support is added back.

br-ko · 26 October 2024 17:45

Hi @rbielski thank you for the detailed feedback. I think I found the commit that adds the support for device and system level atomic operations is it this one ? [HIP] Enable device and system memory scope for atomics on HIP · oneapi-src/unified-runtime@08b19b2 · GitHub

I am confused how to get this commit from the intel/llvm repository ? Since for our application it is crucial to have system wide atomics with MI210.

Thank you again,

br-ko · 28 October 2024 13:34

I forked the unified-runtime and modified only the line I pointed out from the earlier PR and now it prints out that it has device and system support so problem solved

Thanks!

br-ko · 28 October 2024 14:00

Apologies,
Now with my code I found the issue

#include <sycl/sycl.hpp>
#include <iostream>
#define NUM_GPU 2
int main() {

  auto Devs = sycl::device::get_devices(sycl::info::device_type::gpu);


  std::vector<sycl::queue> Queues;
  // Insert not all devices only the required ones for model


  std::transform(Devs.begin(), Devs.begin() + NUM_GPU, std::back_inserter(Queues),
                 [](const sycl::device &D) { return sycl::queue{D,sycl::property::queue::enable_profiling{}}; });

  ////////////////////////////////////////////////////////////////////////
  if (Devs.size() > 1){
  if (!Devs[0].ext_oneapi_can_access_peer(
          Devs[1], sycl::ext::oneapi::peer_access::access_supported)) {
    std::cout << "P2P access is not supported by devices, exiting."
              << std::endl;

  }
  }
    std::cout <<"\n----------------------------------------"<< std::endl;

    std::cout << "Running on devices:" << std::endl;
    for(int i =0; i < Queues.size(); i++){

    std::cout << i << ":\t" << Queues[i].get_device().get_info<sycl::info::device::name>()
              << std::endl;
    }
std::cout <<"----------------------------------------"<< std::endl;
  // Enables Devs[x] to access Devs[y] memory and vice versa.
  if (Devs.size() > 1){
  for(auto gpuID_i = 0; gpuID_i < NUM_GPU; gpuID_i++) {
    for(auto gpuID_j = 0; gpuID_j < NUM_GPU; gpuID_j++) {
        if (gpuID_i != gpuID_j) {
              Devs[gpuID_i].ext_oneapi_enable_peer_access(Devs[gpuID_j]);
          }
    }
  }
  }







//   sycl::queue q{};
  auto scopes{Queues[0].get_context().get_info<sycl::info::context::atomic_memory_scope_capabilities>()};
  std::cout << Queues[0].get_device().get_info<sycl::info::device::name>()
	    << " supports " << scopes.size() << " atomic memory scopes:" << std::endl;
  for (sycl::memory_scope s : scopes) {
    switch (s) {
      case sycl::memory_scope::work_item : {
        std::cout << "  work_item" << std::endl;
	break;
      }
      case sycl::memory_scope::sub_group : {
        std::cout << "  sub_group" << std::endl;
        break;
      }
      case sycl::memory_scope::work_group : {
        std::cout << "  work_group" << std::endl;
        break;
      }
      case sycl::memory_scope::device : {
        std::cout << "  device" << std::endl;
        break;
      }
      case sycl::memory_scope::system : {
        std::cout << "  system" << std::endl;
        break;
      }
      default: {
        std::cout << "  unknown memory_scope value " << static_cast<int>(s) << std::endl;
	break;
      }
    }
  }
}

when I delete the enable_profiling it works but if not it still gives elapsedTime error how could I solve this problem @rbielski ?

Using this : Release DPC++ daily 2024-10-25 · intel/llvm · GitHub
(pre-built)

duncan · 28 October 2024 18:54

Hi @br-ko, can you paste the actual error you see please? I am not sure we’ve run into this before. Are you saying that when profiling is enabled, running this code exits with an error? Which line causes the error?

br-ko · 28 October 2024 19:10

Hi @duncan ,

This is the output :

$ rm -rf a.out;clang++ -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a atomic_test.cpp ; ./a.out 
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:0 Drivers Discovered
<HIP>[ERROR]: 
UR HIP ERROR:
        Value:           400
        Name:            hipErrorInvalidHandle
        Description:     invalid resource handle
        Function:        getElapsedTime
        Source Location: /home/aac/llvm-nightly-2024-10-25/build/_deps/unified-runtime-src/source/adapters/hip/device.cpp:31


----------------------------------------
Running on devices:
0:      AMD Instinct MI210
1:      AMD Instinct MI210
----------------------------------------
AMD Instinct MI210 supports 5 atomic memory scopes:
  work_item
  sub_group
  work_group
  device
  system

This is the run command :

export DPCPP_HOME=/home/aac/
export PATH=$DPCPP_HOME/llvm-nightly-2024-10-25/build/bin:$PATH;export LD_LIBRARY_PATH=$DPCPP_HOME/llvm-nightly-2024-10-25/build/lib:$LD_LIBRARY_PATH
rm -rf a.out;clang++ -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a atomic_test.cpp ; ./a.out

With this snippet it continues, but with my actual code it stops executing when I try to capture the profiling information with this function :

double GetExecutionTime(const event &e) {
  double start_k = e.get_profiling_info<info::event_profiling::command_start>();
  double end_k = e.get_profiling_info<info::event_profiling::command_end>();
  double kernel_time = (end_k - start_k) * 1e-6; // ns to ms
  return kernel_time;
}
...


           start_time[gpuID] = executeEvent[gpuID].get_profiling_info<info::event_profiling::command_start>();

EDIT: It works for 1 GPU but fails for 2 GPU

rbielski · 31 October 2024 11:38

Thank you for the reproducer. We’re getting things ready to try this in a similar setup to yours and will let you know as soon as we know more.

rbielski · 1 November 2024 16:52

Hi @br-ko,
that is indeed a runtime bug, many thanks for reporting this. It should be fixed in this PR:

github.com/oneapi-src/unified-runtime

Set the right HIP device before creating base event counter

oneapi-src:main ← rafbiels:rafbiels/fix-hip-evbase

opened 04:44PM - 01 Nov 24 UTC

rafbiels

+1 -0

Without any default device in the current thread, all base events were associate…d with device 0, causing failures when used on other devices. Fix this by calling `hipSetDevice` before recording the event. This issue was [reported by a user](https://support.codeplay.com/t/amd-gpu-sycl-plugin-urdevicegetglobaltimestamps-error/743) who was running on a system with two AMD GPUs and tried to do the following: ```cpp #include <sycl/sycl.hpp> int main() { auto Devs = sycl::device::get_devices(sycl::info::device_type::gpu); std::vector<sycl::queue> Queues; for (auto D : Devs) { Queues.push_back(sycl::queue{D,sycl::property::queue::enable_profiling{}}); } } ``` Resulting in ``` UR HIP ERROR: Value: 400 Name: hipErrorInvalidHandle Description: invalid resource handle Function: getElapsedTime Source Location: _deps/unified-runtime-src/source/adapters/hip/device.cpp:31 ``` in the constructor of the second queue. intel/llvm PR: https://github.com/intel/llvm/pull/15964

br-ko · 1 November 2024 17:14

Hi @rbielski ,
Thank you for the quick fix! have a nice weekend!

Topic		Replies	Views
AMD/HIP sycl-ls segfault	12	237	27 December 2024
Working example on Windows 10 + Nvidia SYCL development	11	2069	19 April 2019
Error compiling SYCL code SYCL development	18	2804	13 May 2020
AMD GPU reporting SPIR support but an error occurs on execution SYCL development	2	1083	18 April 2019
Cl::sycl::compile_program_error exception thrown when running simple-vector-add sample SYCL development	15	3155	18 September 2020

AMD GPU SYCL Plugin urDeviceGetGlobalTimestamps Error

Related topics