I’m trying to use root_group experimental extension (docs link). I’ve looked at the experimental features in the docs (link) for the CUDA plugin, and I can’t see that root_group is mentioned, but sycl_ext_oneapi_non_uniform_groups is supported, which gave me hope considering it mentions root_group. But does not seem to work for me.
I’ve checked my CUDA version and that the macro is defined using the following code:
for(auto& plat : sycl::platform::get_platforms())
{
std::cout << "CUDA‐SYCL platform name: " << plat.get_info<sycl::info::platform::name>() << "\n"
<< "Reported version: " << plat.get_info<sycl::info::platform::version>() << "\n";
}
#ifdef SYCL_EXT_ONEAPI_ROOT_GROUP
printf("ROOT GPOUP SUPPPOERTED\n");
#else
printf("ROOT GROUP NOT SUPPORTED\n");
#endif
// Resulting printout:
/*
CUDA‐SYCL platform name: Intel(R) OpenCL
Reported version: OpenCL 3.0 LINUX
CUDA‐SYCL platform name: NVIDIA CUDA BACKEND
Reported version: CUDA 12.8
ROOT GPOUP SUPPPOERTED
*/
So the macro say it’s supported but that could be for the CPU target.
I tested it using these lines
namespace syclexp = sycl::ext::oneapi::experimental;
auto props = syclexp::properties{syclexp::use_root_sync};
// ...
cgh.parallel_for(sycl::nd_range{sg_region.global, sg_region.local},
props,
kernel());
// ... Inside the kernel
auto root = it.ext_oneapi_get_root_group();
sycl::group_barrier(root);
This results in the following error:
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): Native API failed. Native API returns: 66 (UR_RESULT_ERROR_ADAPTER_SPECIFIC)
This feature requires cuda 11.8 or later.
zsh: IOT instruction (core dumped)
So I wonder if I’m doing something wrong or if my install is not working properly or that it is simply not supported? I would love some feedback on this.
In the case that it’s not supported, am I better off using atomic add and a spinlock, or are there other good options for synchronising a wave (my kernel is configured to be a complete wave or less)
Hi @Jorgen ,
I’ve been able to reproduce this, but haven’t been able to debug why the feature isn’t working. I’ll take another look tomorrow and see if it is simply that the hardware doesn’t support the feature, and this is being misreported. Can I ask what GPU you’re trying to run this on?
Duncan.
Hi @Duncan,
I’ve tried to run it on both rtx 3070 ti running arch with open nvidia drivers and a rtx4080 running ubuntu with propritetarty drivers (I believe). It results in the same error message on both computers and the SYCL_EXT_ONEAPI_ROOT_GROUP macro is defined in both cases as well, but I guess that could be due to CPU or overall compiler support for the feature?
Best regards,
Jorgen
That’s good to know, thanks. I’ll dig into things here and see if I can find out what error is being triggered and why.
The macro is indeed just set by the implementation supporting the feature, it’s not per backend enabled in the compiler.
1 Like
OK I think I have figured out the problem: there is support in the code for this but it relies on code that is built as part of the oneAPI distribution, not on the stuff that is built when you’re building your code. There are two ways I can see to fix it.
If you are familiar with LLVM, you can try checking out the Intel fork of LLVM and building that, then using that as your compiler. It’s very similar to icpx
though there are a lot of additions inside icpx
that aren’t in the open source. When you build this, the code in question (llvm/unified-runtime/source/adapters/cuda/enqueue.cpp at f6a9c4d11e1a772f9bab6799196c9217c4dd1b08 · intel/llvm · GitHub) will see the newer CUDA version and work fine (this is what I did).
The alternative is to build just the component you require, which is the Unified Runtime CUDA adapter. This can be done in two ways: one is as part of the LLVM build. Instead of building all of LLVM, you can simply do something like
git clone git@github.com:intel/llvm
git checkout sycl-rel-6_1_0
cd llvm && python buildbot/configure.py --cuda
ninja -C build libur_adapter_cuda.so
This should build the one component that you need to be updated (and will build the matching version for 2025.1). If you make sure that this version of the library is on the library path before the version shipped with oneAPI, it should work.
It might also be possible to download and build Unified Runtime from github, as there’s a mirror there. Let me know if you’d like to try that option (as I haven’t yet tried it myself).
I hope this helps.
Duncan.