AOT compilation using Intel GPU offline compiler ocloc

Hi,
I’d like to do AOT compilation for Intel GPU, but cannot use Intel GPU offline compilation tool ocloc, please help check if possible to support it.

Usage: ocloc [compile] -file -device <device_type> [-output ] [-out_dir <output_dir>] [-options ] [-32|-64] [-internal_options ] [-llvm_text|-llvm_input|-spirv_input] [-options_name] [-q] [-cpp_file] [-output_no_suffix] [–help]

Compile flags:
-sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input”

Log shows:
Invalid option (arg 1): /tmp/backprop_sycl-bc19c8.o
Command was: ocloc /tmp/backprop_sycl-bc19c8.o -o /tmp/backprop_sycl-e97f31.o -device cfl -spirv_input

Thanks

Can you post the full log output please?

Here is the reproducer steps:

  1. Need an Intel GPU

  2. Get benchmark source code from GitHub - zjin-lcf/Rodinia_SYCL.

  3. cd to any benchmark, sycl/backprop for example.

  4. add AOT compilation flags for Codeplay Compiler part in Makefile
    CFLAGS += -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input”

  5. make clean; make VENDOR=codeplay
    Log shows:
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU backprop_sycl.cpp -c
    backprop_sycl.cpp:96:26: remark: [Computecpp:CC0048]: Kernel name class forward is a local type - consider forward declaring the type
    inside of an accessible namespace [-Rsycl-kernel-naming]
    cgh.parallel_for(
    ^~~~~~~~~~~~~
    forward
    computecpp/2.8.0-pe/linux/include/SYCL/apis.h:1761:13: note: in instantiation of function template specialization
    ‘cl::sycl::detail::kernelgen_parallel_for_nd<forward, (lambda at backprop_sycl.cpp:98:55), 2>’ requested here
    detail::kernelgen_parallel_for_nd<
    ^
    backprop_sycl.cpp:96:13: note: in instantiation of function template specialization ‘cl::sycl::handler::parallel_for<forward,
    (lambda at backprop_sycl.cpp:98:55), 2>’ requested here
    cgh.parallel_for(
    ^
    backprop_sycl.cpp:148:26: remark: [Computecpp:CC0048]: Kernel name class adjust_weights is a local type - consider forward declaring
    the type inside of an accessible namespace [-Rsycl-kernel-naming]
    cgh.parallel_for(
    ^~~~~~~~~~~~~~~~~~~~
    adjust_weights
    computecpp/2.8.0-pe/linux/include/SYCL/apis.h:1761:13: note: in instantiation of function template specialization
    ‘cl::sycl::detail::kernelgen_parallel_for_nd<adjust_weights, (lambda at backprop_sycl.cpp:150:55), 2>’ requested here
    detail::kernelgen_parallel_for_nd<
    ^
    backprop_sycl.cpp:148:13: note: in instantiation of function template specialization ‘cl::sycl::handler::parallel_for<adjust_weights,
    (lambda at backprop_sycl.cpp:150:55), 2>’ requested here
    cgh.parallel_for(
    ^
    Invalid option (arg 1): /tmp/backprop_sycl-589390.o
    Command was: ocloc /tmp/backprop_sycl-589390.o -o /tmp/backprop_sycl-4ca40b.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU backprop.c -c
    compute++: warning: treating ‘c’ input as ‘c++’ when in C++ mode, this behavior is deprecated [-Wdeprecated]
    Invalid option (arg 1): /tmp/backprop-eb58ac.o
    Command was: ocloc /tmp/backprop-eb58ac.o -o /tmp/backprop-3d8273.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU imagenet.c -c
    compute++: warning: treating ‘c’ input as ‘c++’ when in C++ mode, this behavior is deprecated [-Wdeprecated]
    Invalid option (arg 1): /tmp/imagenet-da35de.o
    Command was: ocloc /tmp/imagenet-da35de.o -o /tmp/imagenet-3e3bb7.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU facetrain.c -c
    compute++: warning: treating ‘c’ input as ‘c++’ when in C++ mode, this behavior is deprecated [-Wdeprecated]
    Invalid option (arg 1): /tmp/facetrain-790fdf.o
    Command was: ocloc /tmp/facetrain-790fdf.o -o /tmp/facetrain-a8d3eb.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU backprop_sycl.o backprop.o imagenet.o facetrain.o -o backprop -lm -lComputeCpp
    compute++: warning: argument unused during compilation: ‘–sycl-custom-tool=ocloc’ [-Wunused-command-line-argument]
    compute++: warning: argument unused during compilation: ‘-sycl-custom-args -device cfl -spirv_input’ [-Wunused-command-line-argument]

compute++ --version
Codeplay ComputeCpp - PE 2.8.0 Device Compiler - clang version 8.0.0 (based on LLVM 8.0.0svn)

Here is reproducer steps:

  1. need an Intel GPU
  2. get benchmarks source code from GitHub - zjin-lcf/Rodinia_SYCL
  3. cd to any benchmark, sycl/backprop for example
  4. add AOT flags to Codeplay compiler part in Makefile
    CFLAGS += -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input”
  5. make clean;make VENDOR=codeplay
    Log shows:
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU backprop_sycl.cpp -c
    backprop_sycl.cpp:96:26: remark: [Computecpp:CC0048]: Kernel name class forward is a local type - consider forward declaring the type
    inside of an accessible namespace [-Rsycl-kernel-naming]
    cgh.parallel_for(
    ^~~~~~~~~~~~~
    forward
    computecpp/2.8.0-pe/linux/include/SYCL/apis.h:1761:13: note: in instantiation of function template specialization
    ‘cl::sycl::detail::kernelgen_parallel_for_nd<forward, (lambda at backprop_sycl.cpp:98:55), 2>’ requested here
    detail::kernelgen_parallel_for_nd<
    ^
    backprop_sycl.cpp:96:13: note: in instantiation of function template specialization ‘cl::sycl::handler::parallel_for<forward,
    (lambda at backprop_sycl.cpp:98:55), 2>’ requested here
    cgh.parallel_for(
    ^
    backprop_sycl.cpp:148:26: remark: [Computecpp:CC0048]: Kernel name class adjust_weights is a local type - consider forward declaring
    the type inside of an accessible namespace [-Rsycl-kernel-naming]
    cgh.parallel_for(
    ^~~~~~~~~~~~~~~~~~~~
    adjust_weights
    computecpp/2.8.0-pe/linux/include/SYCL/apis.h:1761:13: note: in instantiation of function template specialization
    ‘cl::sycl::detail::kernelgen_parallel_for_nd<adjust_weights, (lambda at backprop_sycl.cpp:150:55), 2>’ requested here
    detail::kernelgen_parallel_for_nd<
    ^
    backprop_sycl.cpp:148:13: note: in instantiation of function template specialization ‘cl::sycl::handler::parallel_for<adjust_weights,
    (lambda at backprop_sycl.cpp:150:55), 2>’ requested here
    cgh.parallel_for(
    ^
    Invalid option (arg 1): /tmp/backprop_sycl-589390.o
    Command was:ocloc /tmp/backprop_sycl-589390.o -o /tmp/backprop_sycl-4ca40b.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU backprop.c -c
    compute++: warning: treating ‘c’ input as ‘c++’ when in C++ mode, this behavior is deprecated [-Wdeprecated]
    Invalid option (arg 1): /tmp/backprop-eb58ac.o
    Command was: ocloc /tmp/backprop-eb58ac.o -o /tmp/backprop-3d8273.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU imagenet.c -c
    compute++: warning: treating ‘c’ input as ‘c++’ when in C++ mode, this behavior is deprecated [-Wdeprecated]
    Invalid option (arg 1): /tmp/imagenet-da35de.o
    Command was: ocloc /tmp/imagenet-da35de.o -o /tmp/imagenet-3e3bb7.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU facetrain.c -c
    compute++: warning: treating ‘c’ input as ‘c++’ when in C++ mode, this behavior is deprecated [-Wdeprecated]
    Invalid option (arg 1): /tmp/facetrain-790fdf.o
    Command was: ocloc /tmp/facetrain-790fdf.o -o /tmp/facetrain-a8d3eb.o -device cfl -spirv_input
    compute++ -Wall -no-serial-memop -sycl -sycl-driver -sycl-target custom-spirv64 --sycl-custom-tool=ocloc -sycl-custom-args “-device cfl -spirv_input” -O3 -DUSE_GPU backprop_sycl.o backprop.o imagenet.o facetrain.o -o backprop -lm -lComputeCpp
    compute++: warning: argument unused during compilation: ‘–sycl-custom-tool=ocloc’ [-Wunused-command-line-argument]
    compute++: warning: argument unused during compilation: ‘-sycl-custom-args -device cfl -spirv_input’ [-Wunused-command-line-argument]

compute++ --version
Codeplay ComputeCpp - PE 2.8.0 Device Compiler - clang version 8.0.0 (based on LLVM 8.0.0svn)

Thanks for the information. We are investigating.

ComputeCpp expects the offline compilation tool to take arguments in a certain format, and ocloc arguments are in a different form. You can work around this by creating a wrapper script, the following script shows how this could work, though you may wish to create something more suited to your own use.

#!/bin/bash

# Wrap ocloc for use with ComputeCpp.
#
# ComputeCpp expects to pass arguments in the form:
#
#     <tool> <in file> -o <out file> <flags>
#
# ocloc expects to recieve arguments in the form:
#
#     ocloc -file <in file> -out_dir <out dir> -output <out basename> <flags>
#
# This wrapper translates from the form that ComputeCpp provides into the form
# that ocloc expects, and places the result where ComputeCpp specified.  ocloc
# cannot place the result exactly where ComputeCpp expects because it sets the
# suffix, and ComputeCpp may be expecting a different suffix.
#
# This wrapper assumes that input type flags, e.g. `-spirv_input`, will be set
# via the `-sycl-custom-args` passed to ComputeCpp when required. E.g. to
# compile via SPIR-V for a SkyLake GPU using this wrapper:
#
#     compute++ <flags> -c file.cpp \
#         -sycl-target custom-spirv64 \
#         --sycl-custom-tool=ocloc_wrapper \
#         -sycl-custom-args "-device skl -spirv_input"
#
# Wrapper was tested on an i7-6560U with an Iris Graphics 540 using driver
# version OpenCL 3.0 NEO 21.46.21636.

function die {
  echo "ERROR: $*"
  exit 1
}

echo " IN> $(basename $0) $*"

# Data for running ocloc.
INPUT=""      # The input source file.
OUTPUT=""     # The output binary file.
OUTDIR=""     # Directory ocloc will put compiled binaries in.
OUTBASE=""    # Filename that ocloc will use.
EXTRA_ARGS="" # Additional compilation flags.
# Argument processing state.
PROCESSED_FIRST_ARGUMENT=0 # Whether we have processed the first argument yet.
NEXT_IS_OUTPUT=0           # When we see '-o' the output name comes next.
INPUT_SUFFIX=".o"

for arg in "$@" ; do
  if [ $PROCESSED_FIRST_ARGUMENT -eq 0 ] ; then
    # This is the first argument, which we know will be the input file.  We
    # also generate the OUTDIR and OUTBASE parameter values from the input,
    # as the input is a ComputeCpp intermediate file, so we know it should be
    # safe to temporarily leave the compiled binary there.
    INPUT="$arg"
    OUTDIR="$(dirname "$arg")"
    OUTBASE="$(basename "${arg%${INPUT_SUFFIX}}")"
    PROCESSED_FIRST_ARGUMENT=1
    [ ! -e "$arg" ] && echo "WARNING: IN FILE ('$arg') DOES NOT EXIST"
  elif [[ "$arg" == "-o" ]] ; then
    # Do nothing with this argument, note what to do with the next argument.
    NEXT_IS_OUTPUT=1
  elif [ $NEXT_IS_OUTPUT -eq 1 ] ; then
    # This is the argument after `-o`.
    OUTPUT="$arg"
    NEXT_IS_OUTPUT=0
  else
    # We know nothing about this argument, just pass it through to ocloc.
    EXTRA_ARGS+=" $arg"
  fi
done


[ ! -e "$INPUT" ] && die "Input file in missing."
[ ! -e "$OUTDIR" ] && die "Temporary output directory is missing."

# Where ocloc will place binaries, we will use `-output_no_suffix` below so
# that the names are not affected by device-specific compilation flags.
TMP_CPU="$OUTDIR"/"$OUTBASE"  # Unused in this wrapper.
TMP_GPU="$OUTDIR"/"$OUTBASE".gen

# Run ocloc!
echo -n "OUT> ocloc -file \"$INPUT\""
echo -n " -out_dir \"$OUTDIR\" -output \"$OUTBASE\""
echo    " -output_no_suffix $EXTRA_ARGS"
ocloc -file "$INPUT" \
      -out_dir "$OUTDIR" -output "$OUTBASE" \
      -output_no_suffix $EXTRA_ARGS \
      || die "ocloc returned an error code"

# Move the result to where ComputeCpp expects it to be.
[ ! -e "$TMP_GPU" ] && die "No GPU output file produced ($TMP_GPU)"
echo " MV> $TMP_GPU --> $OUTPUT"
mv "$TMP_GPU" "$OUTPUT" || die "Could not move result"

As an additional comment we will work on a cleaner integration for this with the compiler via a flag, keep an eye out on our release notes for that.

Thanks, the wrapper works from my side after remove .gen from TMP_GPU.
ocloc is intel official tool, hope for future computecpp release to adapt it. Thanks.

1 Like