-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-place shift of uint vectors corrupts s1 and further components #358
Comments
I assumed that
|
proski
added a commit
to proski/mfakto
that referenced
this issue
Jan 9, 2025
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive. intel/intel-graphics-compiler#358 The bug affects all but the first component of the vectors, so the self-tests would pass with VectorSize=1. For higher values of VectorSize, including the default VectorSize=2, approximately half of the self-tests fail, all in barrett32 kernels. Add generic_bitalign() that is always implemented using shifts. Use it in all cases when the destination is the same as one of the sources. If Intel Compute Runtime is detected, use 64-bit shifts in generic_bitalign(). For other platforms, keep using 32-bit shifts. Make amd_bitalign() an alias to generic_bitalign() on systems where amd_bitalign() is not available. That way, it would also expand to 64-bit shifts for Intel Compute Runtime.
proski
added a commit
to proski/mfakto
that referenced
this issue
Jan 9, 2025
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive. intel/intel-graphics-compiler#358 The bug affects all but the first component of the vectors, so the self-tests would pass with VectorSize=1. For higher values of VectorSize, including the default VectorSize=2, approximately half of the self-tests fail, all in barrett32 kernels. Add generic_bitalign() that is always implemented using shifts. Use it in all cases when the destination is the same as one of the sources. If Intel Compute Runtime is detected, use 64-bit shifts in generic_bitalign(). For other platforms, keep using 32-bit shifts. Make amd_bitalign() an alias to generic_bitalign() on systems where amd_bitalign() is not available. That way, it would also expand to 64-bit shifts for Intel Compute Runtime.
proski
added a commit
to proski/mfakto
that referenced
this issue
Jan 11, 2025
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive. intel/intel-graphics-compiler#358 The bug affects all but the first component of the vectors, so the self-tests would pass with VectorSize=1. For higher values of VectorSize, including the default VectorSize=2, approximately half of the self-tests fail, all in barrett32 kernels. Add generic_bitalign() that is always implemented using shifts. Use it in all cases when the destination is the same as one of the sources. If Intel Compute Runtime is detected, use 64-bit shifts in generic_bitalign(). For other platforms, keep using 32-bit shifts. Make amd_bitalign() an alias to generic_bitalign() on systems where amd_bitalign() is not available. That way, it would also expand to 64-bit shifts for Intel Compute Runtime.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a duplicate of intel/compute-runtime#790 - moved here as there is evidence that the issue is compiler related.
The issue can be reproduced Intel Compute Runtime versions from 23.22.26516.18 until 24.45.31740.9 (inclusive on both ends). Versions 23.17.26241.22 and older are not affected. Version 24.48.31907.7 (currently the latest release) is not affected either. Even though the latest release is not affected, I'd like someone to have a closer look, as the fix might be accidental.
Following is an improved version of the demo I posted in the original ticket.
3 uint4 vectors (d0, d1 and d2) al loaded with identical values 10, 11 and 12. Then they are shifted by 1 bit in place, and then the top bit of the next vector (it should always be 0) is fed into the lower bit.
The expected result is that the values are multiplied by 2:
The actual output shows corruption of s1 and further components of the vectors that received the top bit from another vector.
The issue was originally observed in mfakto: primesearch/mfakto#15
mfakto saves the binary kernel and uses it as long as the configuration remains the same. If mfakto is run and then another version of Intel Compute Runtime is installed, the behavior of mfakto doesn't change, i.e. the behavior (buggy or correct) is captured in the compiled binary file. That makes me think that the issue is with the compiler.
The text was updated successfully, but these errors were encountered: