Create A Setup for custom XPU

MaxAigner · 13 September 2024 09:36

Hello everyone,

I’m working on a project where we’re developing a custom processor (XPU) designed to function like a GPU with OpenCL capabilities. We’re aiming to build a Doom-like game using the Intel oneAPI Construction Kit with SYCL in C++. Not sure if this cathegory here is the correct one

Current Setup:

Development Environment: Our programmer has set up a Docker container using the Intel oneAPI Construction Kit, which I access via Visual Studio Dev Containers on my machine.
Progress: We’ve successfully compiled and run the click example provided by the Construction Kit.

Challenges:

Floating-Point Support: Our XPU lacks native floating-point capabilities. I’m unsure how to implement floating-point support—should this be handled in the driver, or is there another recommended approach?
Memory Mapping and DMA Transfers: We plan to interface with the hardware via Infiniband and need to map SYCL buffers and accessors to the device memory using DMA. I’m not clear on how to correctly map these variables and ensure that the device processes them properly via kernels.
Hardware Abstraction Layer (HAL): I believe hardware interaction is managed in the HAL files provided by the Construction Kit. However, I’m uncertain how to modify these to suit our hardware specifications.
Variable Placement in SPIR-V: We have a compiler that translates SPIR-V code to our XPU’s instruction set (which we call “RayCore”). The compiler can indicate where variables should be placed in memory but doesn’t handle value initialization. How can I ensure that variables (e.g., game parameters like “health”) are correctly passed to the device and mapped to the appropriate memory locations?
Memory Constraints: Our XPU’s main memory is read-only during kernel execution—the kernel cannot write back to it, and we can only erase the entire memory at once. This poses challenges for typical read-write operations.
Development Environment Considerations: Should we continue using Docker and Linux for development, or would it be feasible to switch to Windows, considering that our XPU simulator is written in C# .NET and may run only on Windows?

Request for Assistance:

Advice on Implementing Floating-Point Support: What are the best practices for handling floating-point operations when the hardware doesn’t support them natively?
Guidance on SYCL Buffers and Accessors: How can I properly map SYCL buffers and accessors to our device memory, especially given our unique memory architecture?
Modifying HAL for Custom Hardware: Any pointers or resources on adapting the HAL files in the Construction Kit to interface with custom hardware?
Variable Initialization and Memory Mapping: How can I ensure that variables are correctly initialized and mapped to the device memory for kernel execution?
Strategies for Read-Only Memory Constraints: Are there design patterns or workarounds to handle a read-only main memory during kernel execution?

Additional Information:

We plan to deploy this in a data center environment, potentially building or upgrading infrastructure to accommodate it.
My C++ skills are not very strong, so detailed explanations or references would be greatly appreciated - I used C++ 15 Years ago and since then only C#… (Maybe we have to employ a “real” C++ Hardware developer)

Thank you in advance for your help!

colin.davidson · 13 September 2024 16:37

Hi Max,

Thanks for your detailed question. I will give a brief answer today, but perhaps we can drill down with a bit more detail in time. With respect to floating point support, I think this can either be done entirely in the LLVM backend or you can use LLVM’s compiler-rt. You should be able to test all that (at least with code generation) entirely with clang/lld and a built lib (assuming you have an LLVM backend for your processor). We do have an example of a simpler linking of a library to get memcpy in the risc-v target.

For buffers and accessors HAL really implements the simplest way which is the buffers appear allocated on the device, and I suggest you start with that approach. However there is a level above this (mux), where you could support buffers being mapped into host memory - I’d take a look at the “host” target for this. You could then either modify the mux target and/or HAL as you see fit or remove the HAL completely.

For modifying HAL for custom hardware there is only really our tutorial such as Tutorials - Guides - oneAPI Construction Kit - Products - Codeplay Developer and some information on updating the compiler side. For example, Adding A Custom Builtins Extension - Guides - oneAPI Construction Kit - Products - Codeplay Developer.

I think we’d need to have more details to properly answer your questions about the read only memory and value initialization, but it may be that some sort of “boot” code for the initialization of variable could be generated if that’s helpful. Basically you have all the information on inputs in the hal::kernel_exec and you can pretty much do anything in there. I’d suggest using clik to get something working your hardware as a starting point and that gives you a basic HAL. I’m not quite sure how you get outputs if the memory is read-only during kernel runs?

The code for generating a new target will tend towards making something that works for a standard architecture, we’ve also done some work to demonstrate a more GPU like “work item per thread approach” - see oneapi-construction-kit/examples/refsi/refsi_g1_wi/compiler/refsi_g1_wi at main · codeplaysoftware/oneapi-construction-kit · GitHub. You will need to think about how you want to start up kernels and what the interface looks like and then it will become clearer how to get the llvm passes correct.

With respect to c++, the target specific “run time” (mux) code doesn’t require anything too hard, although LLVM compilation may require a bit more understanding.
I hope this helps to start with and we can have follow-on conversations.

MaxAigner · 22 May 2025 12:35

Hi there,

Thanks again for your earlier help!

I have a follow-up regarding memory addressing in SPIR-V and SYCL:

Our custom device can only use logical (“virtual”) memory addressing in SPIR-V. We cannot support physical memory addressing (such as Physical64) as required by the traditional OpenCL SPIR-V toolchain.
However, when compiling SYCL code with Intel DPCPP/LLVM, we only get SPIR-V modules with physical addressing (Physical64). For our backend, we’d need to emit SPIR-V with the Logical addressing model (OpMemoryModel Logical …), as required for Vulkan or other backends.

Is there a way—perhaps in the Construction Kit, or in DPCPP/LLVM, or through some flag or patch—to emit SPIR-V with logical addressing instead of physical, or to change the addressing model?
If not, is this something that needs to be changed in the LLVM/Clang frontend, or is there a recommended approach for supporting devices that require logical SPIR-V addressing?
If this is outside the scope of this forum, could you please point me to the right community or Intel group to ask?

Many thanks for your guidance!
Max

colin.davidson · 23 May 2025 14:27

Hi, I’ve consulted with a colleague and his opinion was as follows:
SYCL is based on C++ which heavily assumes all pointers are concrete objects and can be manipulated as any other object. In general, there are going to be cases where valid SYCL cannot be translated to use SPIR-V kernels using the logical addressing model, so even if it would be possible to get DPC++ to use it in some limited cases, it would not work in general. That said, I do not think that DPC++ currently has an option to use it even in those cases where it is able to figure out that it could. That would need to be done as a first step.

If that option is provided in DPC++, the next obstacle would be that the logical addressing model is not necessarily supported in OpenCL drivers. Our own implementation previously had a requirement (I’m not sure in which cases it was a checked requirement and in which cases it was an implicit assumption) on it too. If DPC++ were to change to unconditionally emit SPIR-V using the logical addressing model, it would break on our implementation and presumably also on others.

I think that the DPC++ people are the first people to contact about this, to see if they are willing to make any changes there.

Topic		Replies	Views
Error compiling SYCL code SYCL development	18	2789	13 May 2020
How to use oneAPI Construction Kit as SYCL backend?	7	224	24 May 2024
Questions about supported SYCL targets SYCL development	2	1223	16 October 2020
Working example on Windows 10 + Nvidia SYCL development	11	2061	19 April 2019
Compile a SYCL program for both INTEL and NVIDIA GPUs oneAPI for NVIDIA GPUs	2	444	13 March 2024

Create A Setup for custom XPU

Related topics