Hi @daniel.vansa, there is a host device available in SYCL implementations, ComputeCpp is no exception. The host device exactly corresponds to the code output by the host compiler, so if your host compiler outputs vector instructions, that’s what you’ll get. I think ComputeCpp launches about 8 threads.
That said, I would warn you that the host device is pretty slow. It’s a fairly limited use case, since there are good OpenCL CPU implementations (for example, pocl and Intel’s CPU runtime) which can run user code efficiently - in contrast, ComputeCpp simply calls a function in the host device implementation, which means it’s very reliant on the host compiler. It is also harder to provide optimised versions of some functions (for example, barriers, which a CPU OpenCL implementation could implement efficiently).