I switched to OneAPI 2025.0 as suggested. ROCm version is now 6.1.0. The plugin is libur_adapter_hip.so.0.10.0.
I also made an apptainer container to exclude environment/copying issues and ease reproducibility.
I still get a segfault.
rocm-smi finds the device OK:
Apptainer> rocm-smi
========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 32.0c 41.0W 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
1 33.0c 40.0W 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
====================================================================================
=============================== End of ROCm SMI Log ================================
Hip device listing segfaults:
Apptainer> ONEAPI_DEVICE_SELECTOR="hip:*" SYCL_UR_TRACE=-1 sycl-ls
INFO: Output filtered by ONEAPI_DEVICE_SELECTOR environment variable, which is set to hip:*.
To see device ids, use the --ignore-device-selectors CLI option.
<LOADER>[INFO]: loaded adapter 0x0x55f31f76b140 (libur_adapter_hip.so.0)
---> urAdapterGet(.NumEntries = 0, .phAdapters = {}, .pNumAdapters = 0x7ffcd7117f1c (1)) -> UR_RESULT_SUCCESS;
---> urAdapterGet(.NumEntries = 1, .phAdapters = {0x7f5735b6d190}, .pNumAdapters = nullptr) -> UR_RESULT_SUCCESS;
---> urAdapterGetInfo(.hAdapter = 0x7f5735b6d190, .propName = UR_ADAPTER_INFO_BACKEND, .propSize = 4, .pPropValue = 0x7ffcd7117f80 (UR_ADAPTER_BACKEND_HIP), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGetSegmentation fault (core dumped)
Full trace without filtering:
Apptainer> SYCL_UR_TRACE=-1 sycl-ls
<LOADER>[INFO]: failed to load adapter 'libur_adapter_level_zero.so.0' with error: libze_loader.so.1: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_level_zero.so.0' with error: libze_loader.so.1: cannot open shared object file: No such file or directory
<LOADER>[INFO]: loaded adapter 0x0x55b461a5ad50 (libur_adapter_opencl.so.0)
<LOADER>[INFO]: failed to load adapter 'libur_adapter_cuda.so.0' with error: libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: loaded adapter 0x0x55b461a5d4e0 (libur_adapter_hip.so.0)
<LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
---> urAdapterGet(.NumEntries = 0, .phAdapters = {}, .pNumAdapters = 0x7ffe7aa2912c (2)) -> UR_RESULT_SUCCESS;
---> urAdapterGet(.NumEntries = 2, .phAdapters = {0x55b461b520c0, 0x55b461b52100}, .pNumAdapters = nullptr) -> UR_RESULT_SUCCESS;
---> urAdapterGetInfo(.hAdapter = 0x55b461b520c0, .propName = UR_ADAPTER_INFO_BACKEND, .propSize = 4, .pPropValue = 0x7ffe7aa29190 (UR_ADAPTER_BACKEND_OPENCL), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urAdapterGetInfo(.hAdapter = 0x55b461b52100, .propName = UR_ADAPTER_INFO_BACKEND, .propSize = 4, .pPropValue = 0x7ffe7aa29190 (UR_ADAPTER_BACKEND_HIP), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGet(.phAdapters = {0x55b461b520c0}, .NumAdapters = 1, .NumEntries = 0, .phPlatforms = {}, .pNumPlatforms = 0x7ffe7aa291dc (1)) -> UR_RESULT_SUCCESS;
---> urPlatformGet(.phAdapters = {0x55b461b520c0}, .NumAdapters = 1, .NumEntries = 1, .phPlatforms = {0x55b461cdc730}, .pNumPlatforms = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_BACKEND, .propSize = 4, .pPropValue = 0x7ffe7aa2924c (UR_PLATFORM_BACKEND_OPENCL), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_NAME, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa29178 (16)) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_NAME, .propSize = 16, .pPropValue = 0x55b461cedae0 (Intel(R) OpenCL), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_NAME, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa29178 (16)) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_NAME, .propSize = 16, .pPropValue = 0x55b461cedae0 (Intel(R) OpenCL), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGet(.hPlatform = 0x55b461cdc730, .DeviceType = UR_DEVICE_TYPE_ALL, .NumEntries = 0, .phDevices = {}, .pNumDevices = 0x7ffe7aa291c4 (1)) -> UR_RESULT_SUCCESS;
---> urDeviceGet(.hPlatform = 0x55b461cdc730, .DeviceType = UR_DEVICE_TYPE_ALL, .NumEntries = 1, .phDevices = {0x55b461cedef0}, .pNumDevices = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_VERSION, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa28f08 (17)) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_VERSION, .propSize = 17, .pPropValue = 0x55b461cee450 (OpenCL 3.0 LINUX), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_NAME, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa28f08 (16)) -> UR_RESULT_SUCCESS;
---> urPlatformGetInfo(.hPlatform = 0x55b461cdc730, .propName = UR_PLATFORM_INFO_NAME, .propSize = 16, .pPropValue = 0x55b461cee450 (Intel(R) OpenCL), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_TYPE, .propSize = 4, .pPropValue = 0x55b461c4b468 (UR_DEVICE_TYPE_CPU), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_PARENT_DEVICE, .propSize = 8, .pPropValue = 0x55b461c4b470 (nullptr), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceRetain(.hDevice = 0x55b461cedef0) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_EXTENSIONS, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa28bf8 (1012)) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_EXTENSIONS, .propSize = 1012, .pPropValue = 0x55b461cf4b20 (cl_khr_spirv_linkonce_odr cl_khr_fp64 cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_extended_bit_ops cl_khr_icd cl_khr_il_program cl_khr_suggested_local_work_size cl_intel_unified_shared_memory cl_intel_devicelib_assert cl_khr_subgroup_ballot cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_clustered_reduce cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_required_subgroup_size cl_intel_spirv_subgroups cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_intel_device_attribute_query cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_image2d_from_buffer cl_intel_concurrent_dispatch), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_TYPE, .propSize = 4, .pPropValue = 0x7ffe7aa28e3c (UR_DEVICE_TYPE_CPU), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_VENDOR_ID, .propSize = 4, .pPropValue = 0x7ffe7aa28f28 (32902), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_DRIVER_VERSION, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa28ec8 (23)) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_DRIVER_VERSION, .propSize = 23, .pPropValue = 0x55b461cf5c60 (2024.18.10.0.08_160000), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_NAME, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa28ea8 (48)) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_NAME, .propSize = 48, .pPropValue = 0x55b461cf5d10 (AMD EPYC 7F72 24-Core Processor ), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceRelease(.hDevice = 0x55b461cedef0) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_TYPE, .propSize = 4, .pPropValue = 0x55b461c4b0c8 (UR_DEVICE_TYPE_CPU), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_PARENT_DEVICE, .propSize = 8, .pPropValue = 0x55b461c4b0d0 (nullptr), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceRetain(.hDevice = 0x55b461cedef0) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_EXTENSIONS, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 0x7ffe7aa28f18 (1012)) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 0x55b461cedef0, .propName = UR_DEVICE_INFO_EXTENSIONS, .propSize = 1012, .pPropValue = 0x55b461cf4f20 (cl_khr_spirv_linkonce_odr cl_khr_fp64 cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_extended_bit_ops cl_khr_icd cl_khr_il_program cl_khr_suggested_local_work_size cl_intel_unified_shared_memory cl_intel_devicelib_assert cl_khr_subgroup_ballot cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_clustered_reduce cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_required_subgroup_size cl_intel_spirv_subgroups cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_intel_device_attribute_query cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_image2d_from_buffer cl_intel_concurrent_dispatch), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
---> urDeviceRelease(.hDevice = 0x55b461cedef0) -> UR_RESULT_SUCCESS;
---> urDeviceRelease(.hDevice = 0x55b461cedef0) -> UR_RESULT_SUCCESS;
---> urPlatformGetSegmentation fault (core dumped)
Stack trace from gdb-oneapi sycl-ls
(no debug info from libamdhip64 unfortunately):
Thread 1 "sycl-ls" received signal SIGSEGV, Segmentation fault.
0x00007fffeabfad0d in ?? () from /opt/rocm-6.1.0/lib/libamdhip64.so.6
(gdb) bt
#0 0x00007fffeabfad0d in ?? () from /opt/rocm-6.1.0/lib/libamdhip64.so.6
#1 0x00007fffea9a4cb9 in ?? () from /opt/rocm-6.1.0/lib/libamdhip64.so.6
#2 0x00007fffea9a6be7 in ?? () from /opt/rocm-6.1.0/lib/libamdhip64.so.6
#3 0x00007fffea9ab36f in ?? () from /opt/rocm-6.1.0/lib/libamdhip64.so.6
#4 0x00007fffea9aba2e in ?? () from /opt/rocm-6.1.0/lib/libamdhip64.so.6
#5 0x00007ffff4f40626 in std::call_once<urPlatformGet::{lambda(ur_result_t&)#1}, ur_result_t&>(std::once_flag&, urPlatformGet::{lambda(ur_result_t&)#1}&&, ur_result_t&)::{lambda()#2}::_FUN() ()
from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0
#6 0x00007ffff7766ec3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007ffff4f40198 in urPlatformGet () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0
#8 0x00007ffff70923c8 in ur_loader::urPlatformGet(ur_adapter_handle_t_**, unsigned int, unsigned int, ur_platform_handle_t_**, unsigned int*) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
#9 0x00007ffff70a189a in urPlatformGet () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
#10 0x00007ffff7ec0422 in std::call_once<sycl::_V1::detail::plugin::getUrPlatforms()::{lambda()#1}>(std::once_flag&, sycl::_V1::detail::plugin::getUrPlatforms()::{lambda()#1}&&)::{lambda()#2}::__invoke() ()
from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#11 0x00007ffff7766ec3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007ffff7ebb179 in sycl::_V1::detail::platform_impl::get_platforms() () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#13 0x00007ffff7fa374d in sycl::_V1::platform::get_platforms() () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#14 0x000055555555d709 in main ()
I’ll try to get the strack track with the symbols from ROCm libs.
For what it’s worth, kernel and driver versions:
name: amdgpu
vermagic: 5.14.0-427.42.1.el9_4.x86_64 SMP preempt mod_unload modversions
rhelversion: 9.4
srcversion: 8DF76864569ABCCA399E1E1
Any idea would be appreciated.