AMD GPU Out of Memory Issue for 2nd GPU

Hi again,
I am having this issue MultiGPU malloc_device fails for second GPU (Out of Memory) · Issue #16038 · intel/llvm · GitHub
but I remember not having this issue before, so I was wondering what am I missing with my build routine and do you have any recipe to follow to get everything up and running ?

Thank you in advance,

It also fails with 2025.0.0 with codeplay plugin with this error

<HIP>[ERROR]: 
UR HIP ERROR:
        Value:           2
        Name:            hipErrorOutOfMemory
        Description:     out of memory
        Function:        getNextTransferStream
        Source Location: /tmp/tmp.6nq6FrBCn9/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/hip/queue.cpp:106

Hi,
I was wondering if there is any update/progress with this issue ?

Thanks!

Hi @br-ko,

we haven’t been able to replicate the setup that you have to be able to try this. We will try to create the nearest-possible setup soon though we might not be able to get to it before the end of the year.

If you had a small reproducer that exhibited this issue, this would be very helpful for us to be able to recreate your problem.

Many thanks,
Duncan.

Hi Duncan,
Thank you for the reply and explanation.
Would the provided reproducer in the intel/llvm issue linked in the first message helpful ?

Best,

Thanks @br-ko , you’re right, that should do it. We will try to take a look soon.