Question about USM

@rod Thanks for sharing the SQL Academy example. I am using USM with DPCPP to port the GPU multi-d array library gtensor to SYCL, see https://github.com/wdmapp/gtensor/pull/19. I was hoping to test with ComputeCpp 2 as well, trying to figure out if I can jerry rig compatibility between the two.

So far I have it working with some toy examples, here: https://github.com/bd4/sycl-test/commit/d79208bcec2068559b8a911c7afed58a14a52ef3. The header difference is awkward because I can’t import CL/sycl.hpp first to test for COMPUTECPP, since according to the example the wrapper header must be included first, so I define my own pre-processor var to test for. In gtensor it’s easy enough work around the includes, and the namespace differences just require a few ifdefs since there are already malloc/free interfaces to support CUDA and HIP so it’s only needed in one place. The usm_wrapper is more awkward, because the usm memory pointers exist inside a complex nested CRTP data structure, and they can only be used inside kernel lambdas (in particular for q.memcpy the non-wrapped version is required). What is the latest thinking on this - is it an interim implementation, or is this on the table as a change to USM before it becomes an official part of the next SYCL spec?

The current gtensor port is doing things that are not strictly allowed by SYCL spec - passing non-standard layout non-trivially copyable types. But in practice they should be simple enough that they don’t cause any problems (and in fact work with Intel Oneapi beta6). I had to remove usage of std::tuple in one of the data structures, after that it basically just worked. The CRTP data structure approach used in gtensor also works fine in CUDA and HIP.