Skip to content

oneAPI DPC++ Compiler 2021-09

Compare
Choose a tag to compare
@tfzhu tfzhu released this 19 Nov 05:34
· 137973 commits to sycl since this release
bd68232

New features

SYCL Compiler

SYCL Library

Documentation

Improvements

SYCL Compiler

  • Added default device triple spir64 when the compiler encounters any
    incoming object/libraries that have been built with the spir64 target.
    -fno-sycl-link-spirv can be used for disabling this behaviour [1342360]
  • Added support for non-uniform IMul and FMul operation for ptx-nvidiacl
    [98a339d]
  • Added splitting modules capability when compiling for NVPTX and AMDGCN
    [c1324e6]
  • Added -fsycl-footer-path=<path> command-line option to set path where to
    store integration footer [155acd1]
  • Improved read only accessor handling - added readonly attribute to the
    associated pointer to memory [3661685]
  • Improved the output project name generation. If the output value has one of
    .a .o .out .lib .obj .exe extension, the output project directory name will
    omit extension. For any other extension the output project directory will
    keep this extension [d8237a6]
  • Improved handling of default device with AOCX archive [e3a579f]
  • Added support for NVPTX device printf [4af2eb5]
  • Added support for non-const private statics in ESIMD kernels [bc51fe0]
  • Improved diagnostic generation when incorrect accessor format is used
    [a292214]
  • Allowed passing -Xsycl-target-backend and -Xsycl-target-link when
    default target is used [d37b832]
  • Disabled kernel function propagation up the call tree to callee when in
    SYCL 2020 mode [2667e3e]

SYCL Library

  • Improved information passed to XPTI subscribers [2af0599] [66770f0]
  • Added event interoperability to Level Zero plugin [ef33c57]
  • Enabled blitter engine for in-order queues in Level Zero plugin [904967e]
  • Removed deprecation warning for SYCL 1.2.1 barriers [18c80fa]
  • Moved free function queries to experimental namespace
    sycl::ext::oneapi::experimental [63ba1ce]
  • Added info query for device::info::atomic_memory_order_capabilities and
    context::info::atomic_memory_order_capabilities [9b04f41]
  • Improved performance of generic shuffles [fb08adf]
  • Renamed ONEAPI/INTEL namespace to ext::oneapi/intel [d703f57] [ea4b8a9]
    [e9d308e]
  • Added Level Zero interoperability which allows to specify ownership of queue
    [4614ee4] [6cf48fa]
  • Added support for reqd_work_group_size attribute in CUDA plugin [a8fe4a5]
  • Introduced SYCL_CACHE_DIR environment variable which allows to specify a
    directory for persistent cache [4011775]
  • Added version of parallel_for accepting range and a reduction variable
    [d1556e4]
  • Added verbosity to some errors handling [84ee39a]
  • Added SYCL 2020 sycl::errc_for API [02756e3]
  • Added SYCL 2020 byte_size method for sycl::buffer and sycl::vec
    classes. get_size was deprecated [282d1de]
  • Added support for USM pointers for sycl::joint_exclusive_scan and
    sycl::joint_inclusive_scan [2de0f92]
  • Added copy and move constructors for
    sycl::ext::intel::experimental::esimd::simd_view [daae147]
  • Optimized memory allocation when sub-devices are used in Level Zero plugin
    [6504ba0]
  • Added constexpr constructors for vec and marray classes
    [e7cd86b][449721b]
  • Optimized kernel cache [c16705a]
  • Added caching of device properties in Level Zero plugin [a50f45b]
  • Optimized Cuda plugin work with small kernels [07189af]
  • Optimized submission of kernels [441dc3b][33432df]
  • Aligned implementation of SYCL_EXT_ONEAPI_LOCAL_MEMORY extension
    document with updated
    document [b3db5e5]
  • Improved sycl::accessor initialization performance on device [a10199d]
  • Added support sycl::get_kernel_ids and cache for sycl::kernel_id objects
    [491ec6d]
  • Deprecated ::half since it should not be available in global
    namespace, sycl::half can be used instead [6ff9cf7]
  • Deprecated sycl::interop_handler, sycl::handler::interop_task,
    sycl::handler::run_on_host_intel, sycl::kernel::get_work_group_info and
    sycl::spec_constant APIs [5120763]
  • Marked sycl::marray device copyable [6e02880]
  • Made Level Zero interoperability API SYCL 2020 compliant for
    sycl::platform, sycl::device and sycl::context [c696415]
  • Deprecated unstable keys of SYCL_DEVICE_ALLOWLIST [b27c57c]
  • Added predefined vendor macro SYCL_IMPLEMENTATION_ONEAPI and
    SYCL_IMPLEMENTATION_INTEL [6d34ebf]
  • Deprecated sycl::ext::intel::online_compiler,
    sycl::ext::intel::experimental::online_compiler can be used instead
    [7fb56cf]
  • Deprecated global_device_space and global_host_space values of
    sycl::access::address_space enumeration, ext_intel_global_device_space
    ext_intel_host_device_space can be used instead [7fb56cf]
  • Deprecated sycl::handler::barrier and sycl::queue::submit_barrier,
    sycl::handler::ext_oneapi_barrier and
    sycl::queue::ext_oneapi_submit_barrier can be used instead [7fb56cf]
  • Removed sycl::handler::codeplay_host_task API [9a0ea9a]

Tools

  • Added support for ROCm devices in get_device_count_by_type [03155e7]

Documentation

Bug fixes

SYCL Compiler

  • Fixed emission of integration header with type aliases [e3cfa19]
  • Fixed compilation for AMD GPU with -fsycl-dead-args-optimization [5ed48b4]
  • Removed faulty implementations for atomic loads and stores for acquire,
    release and seq_cst memory orders libclc for NVPTX [4876443]
  • Fixed having two specialization for the specialization_id, one of which was
    invalid [f71a1d5]
  • Fixed context destruction in HIP plugin [6042d3a]
  • Changed queue::mem_advise and handler::mem_advise to take int instead
    of pi_mem_advice [af2bf96]
  • Prevented passing of -fopenmp-simd to device compilation when used along
    with -fsycl [226ed8b]
  • Fixed generation of the integration header when non-base-ascii chars are
    used in the kernel name [91f5047]
  • Fixed a problem which could lead to picking up incorrect kernel at runtime in
    some situations when unnamed lambda feature is used [27c632e]
  • Fixed suggested target triple in the warning message [7cc89fa]
  • Fixed identity for multiplication on CUDA backend [a6447ca]
  • Fixed a problem with dependency file generation [fd6d948] [1d5b2cb]
  • Fixed builtins address space type for CUDA backend [1e3136e]
  • Fixed a problem which could lead to incorrect user header to be picked up
    [c23fe4b]

SYCL Library

  • Added assign operator to specializations of sycl::ext::oneapi::atomic_ref
    [c6bc5a6]
  • Fixed the way managed memory is freed in CUDA plugin [e825916]
  • Changed names of some SYCL internal enumerations to avoid possible
    conflicts with user macro [1419415]
  • Fixed status which was returned for host events by
    event::get_info<info::event::command_execution_status>() call [09715f6]
  • Fixed memory ordering used for barriers [73455a1]
  • Fixed several places in CUDA and HIP plugins where bool was used instead
    of uint32_t [764b6ff]
  • Fixed event pool memory leak in Level Zero plugin [0e95e5a]
  • Removed redundant memcpy call for copying struct using fpga_reg
    [a5d290d]
  • Fixed an issue where the native memory object passed to interoperability
    memory object constructor was ignored on devices without host unified memory
    [da19678]
  • Fixed a bug in simd::replicate_w API [d36480d]
  • Fixed group operations for (u)int8/16 types [6a055ec]
  • Fixed a problem with non-native specialization constants being undefined if
    they are not explicitly updated to non-default values [3d96e1d]
  • Fixed a crash which could happen when a default constructed event is passed
    to sycl::handler::depends_on[2fe7dd3]
  • Fixed sycl::link which was returning a kernel bundle in the incorrect state
    [6d98beb]
  • Fixed missing dependency for host tasks in in-order queues [739487c]
  • Fixed a memory leak in Level Zero plugin [a5b221f]
  • Fixed a problem with copying buffers when an offset is specified in CUDA
    backend [0abdfd6]
  • Fixed a problem which was preventing passing asynchronous exceptions produced
    in host tasks [f823d61]
  • Fixed a crash which could happen when submitting tasks to multiple queues
    from multiple threads [4ccfd21]
  • Fixed a possible multiple definitions problem which could happen when using
    fsycl-device-code-split option [d21082f]
  • Fixed several problem in configuration file processing [2a35df0]
  • Fixed imbalance in events release and retain in Level Zero plugin [6117a1b]
  • Fixed a problem which could lead to leaks of full paths to source files from
    the build environment [6bbfe42]

API/ABI breakages

  • Removed intel::reqd_work_group_size attribute,
    sycl::reqd_work_group_size can be used instead [c583a20]

Known issues

  • [new] SYCL 2020 barriers show worse performance than SYCL 1.2.1 do [18c80fa]
  • [new] When using fallback assert in separate compilation flow requires
    explicit, linking against lib/libsycl-fallback-cassert.o or
    lib/libsycl-fallback-cassert.spv
  • [new] Performance may be impacted by JIT-ing an extra 'copier' kernel and due
    running the 'copier' kernel and host-task after each kernel which uses
    assert
  • [new] Driver issue. When a two-step AOT build is used and there's at least a
    single call to devicelib function from within kernel, the device binary
    image gets corrupted
  • [new] Limit alignment of allocation requests at 64KB which is the only
    alignment supported by Level Zero[7dfaf3b]
  • [new] On the following scenario on Level Zero backend:
    1. Kernel A, which uses buffer A, is submitted to queue A.
    2. Kernel B, which uses buffer B, is submitted to queue B.
    3. queueA.wait().
    4. queueB.wait().
      DPCPP runtime used to treat unmap/write commands for buffer A/B as host
      dependencies (i.e. they were waited for prior to enqueueing any command
      that's dependent on them). This allowed Level Zero plugin to detect that
      each queue is idle on steps 1/2 and submit the command list right away.
      This is no longer the case since we started passing these dependencies in an
      event waitlist and Level Zero plugin attempts to batch these commands, so
      the execution of kernel B starts only on step 4. The workaround restores the
      old behavior in this case until this is resolved [2023e10][6c137f8].
  • User-defined functions with the name and signature matching those of any
    OpenCL C built-in function (i.e. an exact match of arguments, return type
    doesn't matter) can lead to Undefined Behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process
    execution. Creating a context opens the device associated with the context
    and places a lock on it for that process. No other process may use that
    device. Some queries about the device through device.get_info<>() also
    open up the device and lock it to that process since the runtime needs
    to query the actual device to obtain that information.
  • The format of the object files produced by the compiler can change between
    versions. The workaround is to rebuild the application.
  • Using sycl::program/sycl::kernel_bundle API to refer to a kernel defined
    in another translation unit leads to undefined behavior
  • Linkage errors with the following message:
    error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
    can happen when a SYCL application is built using MS Visual Studio 2019
    version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
  • Printing internal defines isn't supported on Windows [50628db]