-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Enable conditional chaining for Intel APX #111072
base: main
Are you sure you want to change the base?
[JIT] Enable conditional chaining for Intel APX #111072
Conversation
Update comments. Merge the REX2 changes into the original legacy emit path bug fix: Set REX2.W with correct mask code. register encoding and prefix emitting logics. Add REX2 prefix emit logic bug fixes Add Stress mode for REX2 encoding and some bug fixes resolve comments: 1. add assertion check for UD opcodes. 2. add checks for EGPRs. Add REX2 to emitOutputAM, and let LEA to be REX2 compatible. Add REX2.X encoding for SIB byte But fixes: add REX2 prefix on the path in RI where MOV is specially handled. Enable REX2 encoding for `movups` fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT legacy map index-er bug fixes some clean-up Adding initial APX unit testing path. Adding a coredistools dll that has LLVM APX disasm capability. It must be coppied into a CORE_ROOT manually. clean up work for REX2 narrow the REX2 scope to `sub` only some clean up based on the comments. bug fix resolve comment
- SV path is mostly for debugging purposes Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
…lled before adding any prefix.
Refactor REX2 encoding stress logics.
(this will have side effect that the estimated code will go up and mismatch with actual code size.)
…om LOCK prefixd instructions.
…otion from LOCK prefixd instructions." This reverts commit 1be4b12.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
1. Intel SDE TestingTest run with SDE: Test run with SDE with 2. SuperPMI resultsDiffs are based on 2,635,272 contexts (1,050,818 MinOpts, 1,584,454 FullOpts). MISSED contexts: 2,984 (0.11%) Base JIT options: JitBypassApxCheck=1 Diff JIT options: JitBypassApxCheck=1;JitEnableApxIfConv=1 Overall (-169,140 bytes)
FullOpts (-169,140 bytes)
|
// On X86, a FP compare is implemented as a fallthrough, which requires two flag checks; hence, | ||
// we cannot simply get a single output condition to feed into a ccmp. Might be possible to chain | ||
// this, but skipping those cases for now | ||
GenCondition cond1; | ||
if (op2->OperIsCmpCompare() && varTypeIsIntegralOrI(op2->gtGetOp1()) && IsInvariantInRange(op2, tree) && | ||
ProducesPotentialConsumableFlagsForCCMP(op1) && TryLowerConditionToFlagsNode(tree, op1, &cond1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to get rid of ProducedPotentialConsumableFlagsForCCMP
and add an argument to TryLowerConditionToFlagsNode
about whether it is allowed to lower to a condition that requires multiple flags checks. Otherwise we end up having to keep ProducedPotentialConsumableFlagsForCCMP
and TryLowerConditionToFlagsNode
in sync.
You can use GenConditionDesc::Get(cond).jumpKind2== EJ_NONE
to check this condition in the appropriate places in TryLowerConditionToFlagsNode
.
Overview
This PR is built on top of #110881.
Design
This PR mostly enables existing conditional chaining logic for X86 with the inclusion of APX
ccmp
instruction. Currently, the optimization must be explicitly enabled viaDOTNET_JitEnableApxConditionalChaining=1
.Testing
Note: The testing plan for APX work has been discussed in #106557, please refer to that PR for details, only results and comments will be posted in this PR. Results posted below.