Std::complex multiplication

bdallen · 8 June 2020 15:50

Is there a convenient / portable way to use std::complex with sycl 1.2.1, including at least the four basic arithmetic operations?

I have noticed that addition and subtraction of std::complex works fine in both ComputeCpp and Intel DPCPP SYCL implementations. Multiplication and division fail at runtime when running on an Intel GPU or CPU (using latest intel-compute-runtime from git). ComputeCpp warns that __muldc3 is undefined during compilation and terminates with ‘cl::sycl::compile_program_error’. dpcpp has no compiler warning, also fails at runtime with compile_program_errors with error: undefined reference to __muldc3()’`. Interestingly hipsycl on AMD gfx803 works fine for this case. With DPCPP, if I link the libsycl-cmath-fp64.o and libsycl-complex-fp64.o, then it works on GPU. What is the recommendation for doing this in computecpp?

Here is my test code:

github.com

bd4/sycl-test/blob/master/complex.cxx

#include <CL/sycl.hpp>
#include <complex>
#include <iostream>

#define OP *

using namespace cl::sycl;

int main(int argc, char **argv) {
    constexpr int N = 16;
    using dcomplex = std::complex<double>;
    dcomplex h_a[N];
    dcomplex h_b[N];
    dcomplex h_c[N];
    int i;

    for (i = 0; i < N; i++) {
        h_a[i] = dcomplex(i, -i);
        h_b[i] = dcomplex(i,  i);
    }

This file has been truncated. show original

I noticed that the ComputeCpp mandelbrot example uses separate im/re variables and it’s own complex multiply algorithm in the kernel. Fine for simple kernels, but for porting libraries already using thrust::complex / std::complex to SYCL, it would be useful to have a portable way to do this.

duncan · 8 June 2020 17:39

Hi @bdallen,

Great question! I knew I had some code lying around from a couple of years ago when I was first messing around with this, so I looked it up - sadly I’ve not left any particularly enlightening comments for myself. I have a sample which is similar to yours - using std::complex<T> on-device with for some simple operations.

I’ve since had a quick search and I’m starting to remember what’s going on. The compiler replaces part of the complex multiply and divide operations with a function call to a support library that handles cases like NaN and infinite inputs. I’m not sure that it’s necessarily a great idea to do this in general, but on the assumption that your code avoids NaN and infinite numbers, you can use the compiler flag -ffast-math to avoid emitting this support code and I think at this stage it basically just works.

I think a more specificaion-based answer would suggest that it won’t work, but personally I think it’s interesting to look at code that we might be able to run, because the technology allows us to do so.

I hope this helps,
Duncan.

bdallen · 9 June 2020 15:45

Thanks for the tip! Unfortunately I was not able to get it working with ComputeCpp-CE 2.0.0. I tried building with -ffast-math, -cl-fast-relaxed-math, in different combinations, with the same result - still warning that __muldc3 is undefined. Do you remember which version you were testing with and what device/cpu backend you were using?

duncan · 9 June 2020 18:38

Hi @bdallen,

I tried my old code from when I was last messing with this, which appears to be working fine, and I also copied in your code (I hope you don’t mind!). I get the output:

[GPU  ] Intel(R) HD Graphics {Intel(R) Corporation}
0: (0,0) OP (0,0) = (0,0)
1: (1,-1) OP (1,1) = (2,0)
2: (2,-2) OP (2,2) = (8,0)
3: (3,-3) OP (3,3) = (18,0)
4: (4,-4) OP (4,4) = (32,0)
5: (5,-5) OP (5,5) = (50,0)
6: (6,-6) OP (6,6) = (72,0)
7: (7,-7) OP (7,7) = (98,0)
8: (8,-8) OP (8,8) = (128,0)
9: (9,-9) OP (9,9) = (162,0)
10: (10,-10) OP (10,10) = (200,0)
11: (11,-11) OP (11,11) = (242,0)
12: (12,-12) OP (12,12) = (288,0)
13: (13,-13) OP (13,13) = (338,0)
14: (14,-14) OP (14,14) = (392,0)
15: (15,-15) OP (15,15) = (450,0)

I assume this is basically the correct output? I’m using ComputeCpp 2.0.0 on an Intel GPU of some kind (though I should warn I have a very old driver installed, mostly down to laziness). I get the same results if I run on the Intel CPU driver.

How are you compiling these samples? I am using the CMake we distribute to compile these samples, and setting the flag -ffast-math by the CMake variable COMPUTECPP_USER_FLAGS, which is designed for this purpose.

I hope this helps!

bdallen · 10 June 2020 12:12

Thanks, it does work if I build with cmake using the FindComputeCpp etc from the sdk, and the add_sycl_to_target macro/function with COMPUTECPP_USER_FLAGS=-ffast-math. I was using the -sycl-driver command to build which is not working:

compute++ -sycl-driver -std=c++14 -ffast-math -I/opt/ComputeCpp-CE/include -L/opt/ComputeCpp-CE/lib -lComputeCpp -o build-computecpp/complex complex.cxx

I was able to get the Makefile build to work as well, using a separate -sycl pass with -ffast-math instead of -sycl-driver. I will experiment with it more.

duncan · 16 June 2020 10:23

Ah, that kind of makes sense. It’s really hijacking a host flag and I imagine in -sycl-driver mode -ffast-math gets applied to the host code only instead of the device code. Does that mean that -cl-fast-relaxed-math doesn’t do anything even in -sycl-driver mode?

anon94637465 · 30 June 2020 16:41

This is what I see:

(working) compute++ = -O3 -sycl-driver -std=c++14 -ffast-math complex.cpp -lComputeCpp
(not working) compute++ = -O3 -sycl-driver -std=c++14 complex.cpp -lComputeCpp
(not working) compute++ = -sycl-driver -std=c++14 -ffast-math complex.cpp -lComputeCpp

Thank you for your verification

bdallen · 10 July 2020 13:01

Thanks zjin, I get the same results with ComputeCpp 2.0.0. -sycl-driver works if both -O3 and -ffast-math are enabled.

bdallen · 10 July 2020 13:02

I started a discussion about standardizing complex numbers in SYCL here: https://github.com/intel/llvm/issues/2061, in case anyone here is interested.

Topic		Replies	Views
Error compiling SYCL code SYCL development	18	2789	13 May 2020
Error compiling computecpp samples SYCL development	16	3381	6 March 2020
SYCL Poor FFT Performance Compared to CUDA in Double Precision oneAPI for NVIDIA GPUs	9	285	15 November 2024
Poor performance on matrix multiplication oneAPI for NVIDIA GPUs	8	1118	7 November 2024
Cl::sycl::compile_program_error exception thrown when running simple-vector-add sample SYCL development	15	3142	18 September 2020

Std::complex multiplication

Related topics