You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AMDGPU ukernels were ported from HIP to self-contained C in d704051.
AMDGPU ukernels for select targets are now bundled with the compiler and are selected, loaded, and linked in during compilation thanks to 8272490, cbb11f2, and dc29ee7. Externally-defined ukernels can also be provided: 82a89e3.
Compilation time for large models has improved substantially with 71d6de7.
Runtime
There were breaking changes to some iree_hal_* APIs in eae7bfb:
iree_hal_buffer_subspan now requires an iree_allocator_t host_allocator that was previously implicit.
iree_hal_subspan_buffer_initialize was removed as it was not safe.
iree_hal_deferred_buffer_t was removed as placement checks without an allocated buffer reference are not possible.
iree_hal_heap_buffer_wrap and all other buffer creation now requires a placement.
iree_hal_buffer_initialize used by HAL implementations now requires a placement.
The HIP runtime can now create a single logical device backed by multiple physical devices: 0e71e72, 1f19761.
Improved support for loading large programs in the HIP HAL: 47ccd93, 7ff83ea.