Ptxas fatal - Optimized debugging not supported when building CMake on Windows

I have created a rudimentary CMake config for a SYCL project on Windows.

cmake_minimum_required(VERSION 3.25)
set(CMAKE_CXX_STANDARD 17)

if(NOT CMAKE_CXX_COMPILER)
    set(CMAKE_CXX_COMPILER icx)
endif ()
if(NOT CMAKE_LINKER)
    set(CMAKE_LINKER icx)
endif ()
if(NOT CMAKE_EXPORT_COMPILE_COMMANDS)
    set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
endif ()

set(project_name sycl_test)

project(sycl_test CXX)

find_package(IntelSYCL REQUIRED)

set(PROJECT_SOURCES
    ${CMAKE_CURRENT_LIST_DIR}/main.cpp
)

add_executable(sycl_test
    ${PROJECT_SOURCES}
)

add_sycl_to_target(TARGET sycl_test SOURCES ${PROJECT_SOURCES})

target_compile_options(sycl_test PRIVATE -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64)
target_link_options(sycl_test PRIVATE -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64)

After a lot of trial and error I have settled on target_xxx_options at the end of the file as the most reliable way to force all the fsycl flags to be used for both compile and linking (otherwise I would be getting No kernel named … was found or UR_RESULT_ERROR_INVALID_BINARY runtime errors).

The project configures correctly with cmake -S . -B build -G Ninja (after setvars.cmd was ran), but building it in debug mode (default) reults in following error:

ptxas fatal   : Optimized debugging not supported

The ptxas invocation is as follows (replaced paths to temp with %TEMP% for readability:

"C:\\Program Files (x86)\\Intel\\oneAPI\\compiler\\2025.0\\bin\\compiler\\llvm-foreach" --out-ext=o "--in-file-list=%TEMP%\\icx-0ad6403e4c\\main-sm_50-7b6fe0.asm" "--in-replace=%TEMP%\\icx-0ad6403e4c\\main-sm_50-7b6fe0.asm" "--out-file-list=%TEMP%\\icx-0ad6403e4c\\main-sm_50-c5cd43.cubin" "--out-replace=%TEMP%\\icx-0ad6403e4c\\main-sm_50-c5cd43.cubin" -- "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6/bin\\ptxas" -m64 -O2 -v --gpu-name sm_50 --output-file "%TEMP%\\icx-0ad6403e4c\\main-sm_50-c5cd43.cubin" "%TEMP%\\icx-0ad6403e4c\\main-sm_50-7b6fe0.asm"

With CMAKE_BUILD_TYPE=Release, the project builds, links and runs correctly with the SYCL code going to the GPU.

How can one make it work in debug? I’d be happy to have just the host code be debuggable, with kernels not carrying any debug info, but I see no easy or documented way of doing that for a CMake project.

Hi @bjanowski,

I haven’t run into this issue before, but going by what I can find online, using ptxas is occasionally a little finicky, and LLVM can output things that it finds objectionable. There is a document describing the dpc+±specific flags here, and it’s possible that specifically trying the flags -Xs -O0 might fix this issue (this should ideally instruct the backend to pass no optimisation flags all the way down to ptxas).

If this doesn’t help then we’d need to look at setting more individual compilation flags to the different compilation steps and I don’t have as much experience with that, especially on Windows, but we can take a look!

Hi @duncan ,
While -Xs -O0 (or more precisely -Xcuda-ptxas -O0) did not help, you have helped me find out that -Xcuda-ptxas is even an option :slight_smile:

After looking through ptxas CLI I found the -suppress-debug-info option and the trick was to add

-Xcuda-ptxas -suppress-debug-info -g

flag to the link options. For some reason -suppress-debug-info by itself does not do much (I belive the symbols then come from clang instead of being generated by ptxas, so ptxas has no way of suppressing those. Or maybe something else?), but with added -g the program compiles without issues both in Release and in Debug, with full debug symbols for the host code.

Thank you for your help as well as willingness to assist more!

Great, I’m glad it’s working! I will make an issue internally to see if we can understand more about this and maybe improve the behaviour of the compiler, if this is a condition we can detect and avoid entirely.

Thanks, please get in touch if you have any other questions!

1 Like