-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Avx10.2 Instructions in Floating Point Conversions #111775
base: main
Are you sure you want to change the base?
Conversation
…Lower Avx10.2 nodes accordingly.
…DE." This reverts commit 067e31e.
…DE." This reverts commit 067e31e.
…embedded rounding" This reverts commit 493572f.
…DE." This reverts commit 067e31e.
This reverts commit 61719f8.
Co-authored-by: Bruce Forstall <[email protected]>
…1996/runtime into kcm-avx102-api-public-pr
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Co-authored-by: Michał Petryka <[email protected]>
@tannergooding can you help get this PR reviewed? It contains the optimization for saturating conversions instructions introduced with AVX10.2. |
src/coreclr/jit/codegenxarch.cpp
Outdated
@@ -7430,7 +7430,7 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode) | |||
noway_assert((dstSize == EA_ATTR(genTypeSize(TYP_INT))) || (dstSize == EA_ATTR(genTypeSize(TYP_LONG)))); | |||
|
|||
// We shouldn't be seeing uint64 here as it should have been converted | |||
// into a helper call by either front-end or lowering phase, unless we have AVX512F | |||
// into a helper call by either front-end or lowering phase, unless we have AVX512F/AVX10.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be AVX10.1+
since we're emitting optimized (just not fully optimal) codegen for any EVEX encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh yeah that makes sense. I was trying to showcase the instructions (the new ones introduced in AVX10.2 and previou ones in AVX512F) rather than ISA. will change it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes in my latest commit.
src/coreclr/jit/instr.cpp
Outdated
@@ -2463,42 +2463,88 @@ instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr) | |||
break; | |||
|
|||
case TYP_FLOAT: | |||
switch (to) | |||
if (compiler->compOpportunisticallyDependsOn(InstructionSet_AVX10v2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to end up unnecessarily flagging code as requesting AVX10v2 support when to == TYP_FLOAT
It should probably be something like:
if (to == TYP_FLOAT)
{
return ins_Move_Extend(TYP_FLOAT, false);
}
else if (to == TYP_DOUBLE)
{
return INS_cvtss2sd;
}
bool isAvx10v2 = compiler->compOpportunisticallyDependsOn(InstructionSet_AVX10v2);
switch (to)
{
case TYP_INT:
return isAvx10v2 ? INS_vcvttss2sis32 : INS_cvttss2si32;
case TYP_LONG:
return isAvx10v2 ? INS_vcvttss2sis64 : INS_cvttss2si64;
case TYP_ULONG:
return isAvx10v2 ? INS_vcvttss2usis64 : INS_vcvttss2usi64;
case TYP_UINT:
return isAvx10v2 ? INS_vcvttss2usis32 : INS_vcvttss2usi32;
default:
unreached();
}
This reduces duplication, ensures the opportunistic check is only done where required, and better matches how we're doing instruction or intrinsic selection in some other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Initially i thought the existing as well as your code would lead to 1 isa check in each case but yes yours is more optimal in a way that it would not do those ISA checks for double -> double
and double -> float
kind of conversions. I will change it accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I have made the suggested changes.
@tannergooding can you please help to review this PR? The prelim review changes have been made and the CI looks good here. |
Overview
This PR tracks optimizing x64 floating point to integer conversions using the new saturating instructions introduced in AVX10.2. We are following the spec doc to add the new instructions and optimize the x64/x86 conversions.
Addresses #109080
Testing
All of the changes made for testing are present in this branch
Step 1: Run superpmi.exe on library mch files using JITLateDisasm to check if any errors occur. Use JITLateDisasm to check for a valid decoding of the byte stream through LLVM disasmbler
For this step, a new coredistools was used built from the LLVM repo. After running superpmi with JITLateDisasm, no decoding failures were detected. Please contact for getting access to the superpmi logs.
Step 2: Run superpmi and check for asmdiffs and assert errors.
Below is the summary of superpmi run between this PR and PR #111209
Diff makes sense here. All of the diffs in superpmi logs belong to conversion scenario. E.g.
Since these diffs are expected, we can conclude that the superpmi run is successful
Step 3: Run the JIT test suite using a stable subset of tests on SDE
Results
![image](https://private-user-images.githubusercontent.com/13829923/406603154-abed5666-31d8-45f5-9375-5a481885f6d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNTMwNTYsIm5iZiI6MTczOTA1Mjc1NiwicGF0aCI6Ii8xMzgyOTkyMy80MDY2MDMxNTQtYWJlZDU2NjYtMzFkOC00NWY1LTkzNzUtNWE0ODE4ODVmNmQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDIyMTIzNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc1M2Q0YjBmYTNmMWJkZTFjYzc5MzViNGI1ZWU5ZjZjODA3ZGY0ZDQ3MDE3NWJiY2JkNzM3ZTQwOWRiZjg4YTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.NWfqXFDtuSimvRvwN5KhEK7UgsP8IjWIum2-dy_0mP0)
Optimized ASM
Note: Below is a case by case basis of comparison between asm generated for
Avx512
vsAvx10.2
. TheAvx10v2
asm has been collected in sde.Case: Float to Int packed
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/42de079e-cb18-4c1b-98ab-f9e9fc4ef8f1)
Case: Float to UInt packed
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/c0b3a34d-7b7b-4cd7-b536-d8ea8ea2a4c2)
Case: Double to long packed
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/a198d974-d02e-4318-b02d-1a98766bb879)
Case: Double to Ulong packed
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/8854dce2-8581-46f6-94ee-c72834b103ef)
Case: Float to Int Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/84650a66-58a0-4e81-9be9-9fd08506b233)
Case: Float to UInt Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/d33a7acc-b9ed-47bc-955b-6a5c4f79f945)
Case: Float to Long Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/666bf13a-6a4a-4e25-b436-b40262ad6dc6)
Case: Float to ULong Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/7bbb2126-e84f-474f-8ee4-a35b8dc08580)
Case: Double to Long Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/c0cdc3e0-9dd6-4bda-a850-b4a2cc95f0e2)
Case: Double to ULong Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/5d3fad62-5db3-4beb-b334-f888a17ece2e)
Case: Double to int Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/d5f646ec-a9d8-4d61-b208-c9b136a6cf8c)
Case: Double to UInt Scalar
** Test code**
Left Side is AVX512 vs Right Side is AVX10.2
![image](https://github.com/user-attachments/assets/5aeb068c-9a80-4fbf-b150-b8f89e624ddc)