Create A Setup for custom XPU

Hello everyone,

I’m working on a project where we’re developing a custom processor (XPU) designed to function like a GPU with OpenCL capabilities. We’re aiming to build a Doom-like game using the Intel oneAPI Construction Kit with SYCL in C++. Not sure if this cathegory here is the correct one

Current Setup:

  • Development Environment: Our programmer has set up a Docker container using the Intel oneAPI Construction Kit, which I access via Visual Studio Dev Containers on my machine.
  • Progress: We’ve successfully compiled and run the click example provided by the Construction Kit.

Challenges:

  1. Floating-Point Support: Our XPU lacks native floating-point capabilities. I’m unsure how to implement floating-point support—should this be handled in the driver, or is there another recommended approach?
  2. Memory Mapping and DMA Transfers: We plan to interface with the hardware via Infiniband and need to map SYCL buffers and accessors to the device memory using DMA. I’m not clear on how to correctly map these variables and ensure that the device processes them properly via kernels.
  3. Hardware Abstraction Layer (HAL): I believe hardware interaction is managed in the HAL files provided by the Construction Kit. However, I’m uncertain how to modify these to suit our hardware specifications.
  4. Variable Placement in SPIR-V: We have a compiler that translates SPIR-V code to our XPU’s instruction set (which we call “RayCore”). The compiler can indicate where variables should be placed in memory but doesn’t handle value initialization. How can I ensure that variables (e.g., game parameters like “health”) are correctly passed to the device and mapped to the appropriate memory locations?
  5. Memory Constraints: Our XPU’s main memory is read-only during kernel execution—the kernel cannot write back to it, and we can only erase the entire memory at once. This poses challenges for typical read-write operations.
  6. Development Environment Considerations: Should we continue using Docker and Linux for development, or would it be feasible to switch to Windows, considering that our XPU simulator is written in C# .NET and may run only on Windows?

Request for Assistance:

  • Advice on Implementing Floating-Point Support: What are the best practices for handling floating-point operations when the hardware doesn’t support them natively?
  • Guidance on SYCL Buffers and Accessors: How can I properly map SYCL buffers and accessors to our device memory, especially given our unique memory architecture?
  • Modifying HAL for Custom Hardware: Any pointers or resources on adapting the HAL files in the Construction Kit to interface with custom hardware?
  • Variable Initialization and Memory Mapping: How can I ensure that variables are correctly initialized and mapped to the device memory for kernel execution?
  • Strategies for Read-Only Memory Constraints: Are there design patterns or workarounds to handle a read-only main memory during kernel execution?

Additional Information:

  • We plan to deploy this in a data center environment, potentially building or upgrading infrastructure to accommodate it.
  • My C++ skills are not very strong, so detailed explanations or references would be greatly appreciated - I used C++ 15 Years ago and since then only C#… (Maybe we have to employ a “real” C++ Hardware developer)

Thank you in advance for your help!

Hi Max,

Thanks for your detailed question. I will give a brief answer today, but perhaps we can drill down with a bit more detail in time. With respect to floating point support, I think this can either be done entirely in the LLVM backend or you can use LLVM’s compiler-rt. You should be able to test all that (at least with code generation) entirely with clang/lld and a built lib (assuming you have an LLVM backend for your processor). We do have an example of a simpler linking of a library to get memcpy in the risc-v target.

For buffers and accessors HAL really implements the simplest way which is the buffers appear allocated on the device, and I suggest you start with that approach. However there is a level above this (mux), where you could support buffers being mapped into host memory - I’d take a look at the “host” target for this. You could then either modify the mux target and/or HAL as you see fit or remove the HAL completely.

For modifying HAL for custom hardware there is only really our tutorial such as Tutorials - Guides - oneAPI Construction Kit - Products - Codeplay Developer and some information on updating the compiler side. For example, Adding A Custom Builtins Extension - Guides - oneAPI Construction Kit - Products - Codeplay Developer.

I think we’d need to have more details to properly answer your questions about the read only memory and value initialization, but it may be that some sort of “boot” code for the initialization of variable could be generated if that’s helpful. Basically you have all the information on inputs in the hal::kernel_exec and you can pretty much do anything in there. I’d suggest using clik to get something working your hardware as a starting point and that gives you a basic HAL. I’m not quite sure how you get outputs if the memory is read-only during kernel runs?

The code for generating a new target will tend towards making something that works for a standard architecture, we’ve also done some work to demonstrate a more GPU like “work item per thread approach” - see oneapi-construction-kit/examples/refsi/refsi_g1_wi/compiler/refsi_g1_wi at main · codeplaysoftware/oneapi-construction-kit · GitHub. You will need to think about how you want to start up kernels and what the interface looks like and then it will become clearer how to get the llvm passes correct.

With respect to c++, the target specific “run time” (mux) code doesn’t require anything too hard, although LLVM compilation may require a bit more understanding.
I hope this helps to start with and we can have follow-on conversations.

1 Like