[BACKEND] Fix `__builtin_clz` implementation on Windows (#5774) · triton-lang/triton@e7457d3

Commit

[BACKEND] Fix __builtin_clz implementation on Windows (#5774)

This PR fixes a typo in the Windows implementation of `__builtin_clz`
that was introduced in #5621.

According to [this in-code
comment](https://github.com/triton-lang/triton/blob/b3dcc32f387d1d54ccd6cbbbc087296c0539e703/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L12)
these Windows implementations should have been copied from [this gist
snippet](https://gist.github.com/pps83/3210a2f980fd02bb2ba2e5a1fc4a2ef0).
In the snippet however the `clz` implementation additionally [XORs the
result of
`_BitScanReverse`](https://gist.github.com/pps83/3210a2f980fd02bb2ba2e5a1fc4a2ef0#file-ctz_clz-cpp-L51-L53)
in order to convert the result from the <i>most significant bit</i>
produced by `_BitScanReverse` to the expected <i>number of leading
zeros</i>. I believe the implementation was copied to the triton without
the finalizing XOR by accident.

<b>What is affected by this error?</b>
This implementation of CLZ is used in
[`pext_i32`](https://github.com/intel/intel-xpu-backend-for-triton/blob/4a9967137548f8fe9b1a93383e4fd12646352231/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L635)
that is used in
[`delinearize`](https://github.com/intel/intel-xpu-backend-for-triton/blob/4a9967137548f8fe9b1a93383e4fd12646352231/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L662)
that is used by
[`ReduceOpToLLVM`](https://github.com/intel/intel-xpu-backend-for-triton/blob/4a9967137548f8fe9b1a93383e4fd12646352231/lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp#L243-L247)
pattern. This bug caused `tt.reduce()` ops to be incorrectly lowered on
Windows in cases, where shared memory is needed to store temporary
reduced results.

Signed-off-by: dchigarev <[email protected]>

Loading branch information

dchigarev authored Jan 31, 2025

1 parent b3dcc32 commit e7457d3

lib/Conversion/TritonGPUToLLVM/Utility.cpp

-Original file line number
+Diff line change
@@ Expand Up / @@ -15,7 +15,7 @@ @@
     static int __builtin_clz(unsigned x) {
       unsigned long r;
       _BitScanReverse(&r, x);
-      return static_cast<int>(r);
+      return static_cast<int>(r ^ 31);
     }
     static int __builtin_ctz(unsigned x) {
@@ Expand Down @@

0 comments on commit `e7457d3`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `e7457d3`

Commit

There are no files selected for viewing

0 comments on commit e7457d3

0 comments on commit `e7457d3`