Trying to compile the following kernel results in different errors on AMD and Intel runtimes.
compute_queue.submit([&](cl::sycl::handler& cgh)
{
auto old_lattice = latticeImages[Buffer::Front]->get_access<cl::sycl::float4, cl::sycl::access::mode::read>(cgh);
auto new_lattice = latticeImages[Buffer::Back]->get_access<cl::sycl::float4, cl::sycl::access::mode::write>(cgh);
cgh.parallel_for<kernels::ConwayStep>(cl::sycl::range<2>{ old_lattice.get_range() },
[=](const cl::sycl::item<2> item)
{
using namespace cl::sycl;
using elem_type = cl::sycl::float4::element_type;
sampler sampler(coordinate_normalization_mode::unnormalized,
addressing_mode::repeat,
filtering_mode::nearest);
auto old = [=](cl::sycl::id<2> id) { return old_lattice.read((cl::sycl::int2)id, sampler).r(); };
auto id = item.get_id();
std::array<elem_type, 8> neighbours =
{ old(id + id<2>(-1,+1)), old(id + id<2>(0,+1)), old(id + id<2>(+1,+1)),
old(id + id<2>(-1,0)), old(id + id<2>(+1,0)),
old(id + id<2>(-1,-1)), old(id + id<2>(0,-1)), old(id + id<2>(+1,-1))
};
elem_type self = old(id);
auto count = std::count_if(neighbours.cbegin(), neighbours.cend(), [](const cl::sycl::cl_float val) { return val > 0.5f; });
auto val = self > 0.5f ?
(count < 2 || count > 3 ? 0.f : 1.f) :
(count == 3 ? 1.f : 0.f);
new_lattice.write((cl::sycl::int2)id, cl::sycl::float4{ val, val, val, 1.f });
});
});
Using AMD Radeon Software 18.2.3 on Windows results in
Exception thrown at 0x00007FFA6D717A0C (amdocl64.dll) in SYCL-Conway.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFE8.
inside amdocl64.dll which cannot be caught in any way.
Using Intel runtime (still on Windows) it results in a cl::sycl::exception which has a log entry containing:
+ message "Error: [ComputeCpp:RT0100] Failed to build program (<Build log for program 00000264885F2510 device 0 (size 363):\nerror: undefined reference to `ZN2cl4sycl7samplerC1E_ns1NS0_29coordinate_normalizatio... std::basic_string<char,std::char_traits<char>,std::allocator<char> >
Which seems as if not just the abacus methods are missing, but also sampler references would be missing too. Unfortunately, we have no Linux box which is SYCL and also OpenGL enabled to test this code on Linux.
Code extracted from original repo can be found here [OneDrive]. (may still contain minor bugs inside the kernel as we’ve never had the chance to actually run it, however it should work.)