Skip to content

Commit

Permalink
[BACKEND] Fix __builtin_clz implementation on Windows (#5774)
Browse files Browse the repository at this point in the history
This PR fixes a typo in the Windows implementation of `__builtin_clz`
that was introduced in #5621.

According to [this in-code
comment](https://github.com/triton-lang/triton/blob/b3dcc32f387d1d54ccd6cbbbc087296c0539e703/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L12)
these Windows implementations should have been copied from [this gist
snippet](https://gist.github.com/pps83/3210a2f980fd02bb2ba2e5a1fc4a2ef0).
In the snippet however the `clz` implementation additionally [XORs the
result of
`_BitScanReverse`](https://gist.github.com/pps83/3210a2f980fd02bb2ba2e5a1fc4a2ef0#file-ctz_clz-cpp-L51-L53)
in order to convert the result from the <i>most significant bit</i>
produced by `_BitScanReverse` to the expected <i>number of leading
zeros</i>. I believe the implementation was copied to the triton without
the finalizing XOR by accident.

<b>What is affected by this error?</b>
This implementation of CLZ is used in
[`pext_i32`](https://github.com/intel/intel-xpu-backend-for-triton/blob/4a9967137548f8fe9b1a93383e4fd12646352231/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L635)
that is used in
[`delinearize`](https://github.com/intel/intel-xpu-backend-for-triton/blob/4a9967137548f8fe9b1a93383e4fd12646352231/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L662)
that is used by
[`ReduceOpToLLVM`](https://github.com/intel/intel-xpu-backend-for-triton/blob/4a9967137548f8fe9b1a93383e4fd12646352231/lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp#L243-L247)
pattern. This bug caused `tt.reduce()` ops to be incorrectly lowered on
Windows in cases, where shared memory is needed to store temporary
reduced results.

Signed-off-by: dchigarev <[email protected]>
  • Loading branch information
dchigarev authored Jan 31, 2025
1 parent b3dcc32 commit e7457d3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/Conversion/TritonGPUToLLVM/Utility.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
static int __builtin_clz(unsigned x) {
unsigned long r;
_BitScanReverse(&r, x);
return static_cast<int>(r);
return static_cast<int>(r ^ 31);
}

static int __builtin_ctz(unsigned x) {
Expand Down

0 comments on commit e7457d3

Please sign in to comment.