MPI + SYCL seg faults on the example code send_recv_usm.cpp

Hello,

I am trying out example program send_recv_usm.cpp and the code seg faults at MPI_Send call. I also tried the buffer version and get the same error.

Here is the compile command:

mpiicpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 -o n1_codeplay_sample_usm_orig n1_codeplay_sample_usm_orig.cpp

Here is output:

snarayanan@intel-eagle:n1$ mpirun -np 2 ./n1_codeplay_sample_usm_orig

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 746246 RUNNING AT intel-eagle.converge.global
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 746247 RUNNING AT intel-eagle.converge.global
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

I am using oneAPI version 2023.2.1 and the corresponding NVIDIA plugin from codeplay. The CPU is Intel Sapphire Rapids and the GPU is NVIDIA A100. (I also tried this on a different machine with i12700 + RTX A2000 and ended up with the same error).

Could someone please point me to what I am missing here ?

Thanks,
Sidarth Narayanan

Hi @sidarth,

would you be able to try using a recent version of oneAPI? 2024.1 was released recently, and it’s what we use for testing.

Many thanks,
Duncan.

Hello @duncan ,

I tried it with version 2024.1 and the corresponding codeplay plug in and I get the same error.
Here is the output with SYCL_PI_TRACE set to 1:

snarayanan@intel-eagle:n1$ SYCL_PI_TRACE=1 mpirun -np 2 ./n1_codeplay_sample_usm_orig
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_level_zero.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_level_zero.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 14.39.1 ]
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA A100-PCIE-40GB
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA A100-PCIE-40GB

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 1452978 RUNNING AT intel-eagle.converge.global
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 1452979 RUNNING AT intel-eagle.converge.global
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

Hi @sidarth,
I have seen this failure before when using MPI implementation that wasn’t CUDA-aware. Which MPI implementation are you using, and has it been built with CUDA-awareness enabled?

See for example this page if you’re using Open MPI:
https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html#how-do-i-build-open-mpi-with-cuda-aware-support

Or this one for MVAPICH2:
https://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-userguide.html#x1-150004.5

If you’re using Cray MPICH, make sure the craype-accel-nvidia<NN> (<NN>=80, 90, etc) module is loaded:
https://cpe.ext.hpe.com/docs/mpt/mpich/intro_mpi.html#gpu-support-in-cray-mpich

1 Like

Hello @rbielski ,

I am using intel MPI which comes with the oneAPI toolkit 2024.1.0. This seems to work on Intel GPUs when I set I_MPI_OFFLOAD to 1. Please correct me if I am wrong, but it looks like Intel MPI is not CUDA aware and I might have to use some other CUDA aware MPI implementation.

We do have HPCX which is CUDA aware but I am not sure how to compile with it considering the fact that I need to pass in SYCL arguments (-fsycl and -fsycl-targets=<PTX/SPIR>).

That’s correct, the Intel MPI packaged in the oneAPI toolkit is not CUDA-aware.

I’ve never used HPCX myself, but there seems to be some documentation on building it to use the Intel compilers:
https://docs.nvidia.com/networking/display/hpcxv218/installing+and+loading+hpc-x#src-2568379047_InstallingandLoadingHPCX-BuildingHPC-XwiththeIntelCompilerSuite

In addition, some MPI implementations allow to pick a different compiler regardless of their build configuration. HPCX seems to be based on Open MPI, so these options might still work for it:
https://docs.open-mpi.org/en/v5.0.x/building-apps/customizing-wrappers.html

so you could try OMPI_CXX=icpx mpicc -fsycl <args>