Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Enable conditional chaining for Intel APX #111072

Draft
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

anthonycanino
Copy link
Contributor

Overview

This PR is built on top of #110881.

Design

This PR mostly enables existing conditional chaining logic for X86 with the inclusion of APX ccmp instruction. Currently, the optimization must be explicitly enabled via DOTNET_JitEnableApxConditionalChaining=1.

Testing

Note: The testing plan for APX work has been discussed in #106557, please refer to that PR for details, only results and comments will be posted in this PR. Results posted below.

Update comments.

Merge the REX2 changes into the original legacy emit path

bug fix: Set REX2.W with correct mask code.

register encoding and prefix emitting logics.

Add REX2 prefix emit logic

bug fixes

Add Stress mode for REX2 encoding and some bug fixes

resolve comments:
1. add assertion check for UD opcodes.
2. add checks for EGPRs.

Add REX2 to emitOutputAM, and let LEA to be REX2 compatible.

Add REX2.X encoding for SIB byte

But fixes: add REX2 prefix on the path in RI where MOV is specially handled.

Enable REX2 encoding for `movups`

fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT

legacy map index-er

bug fixes

some clean-up

Adding initial APX unit testing path.

Adding a coredistools dll that has LLVM APX disasm capability.

It must be coppied into a CORE_ROOT manually.

clean up work for REX2

narrow the REX2 scope to `sub` only

some clean up based on the comments.

bug fix

resolve comment
 - SV path is mostly for debugging purposes

Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
Refactor REX2 encoding stress logics.
(this will have side effect that the estimated code will go up and mismatch with actual code size.)
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 3, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@anthonycanino
Copy link
Contributor Author

anthonycanino commented Jan 3, 2025

1. Intel SDE Testing

Test run with SDE:

base

Test run with SDE with DOTENT_JitEnableApxConditionalChaining=1

diff

2. SuperPMI results

Diffs are based on 2,635,272 contexts (1,050,818 MinOpts, 1,584,454 FullOpts).

MISSED contexts: 2,984 (0.11%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;JitEnableApxIfConv=1

Overall (-169,140 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 42,216,437 -6,257 -9.13%
benchmarks.run.windows.x64.checked.mch 8,860,704 -21,878 -4.99%
benchmarks.run_pgo.windows.x64.checked.mch 35,294,983 -25,089 -8.82%
benchmarks.run_tiered.windows.x64.checked.mch 12,613,813 -20,816 -4.88%
coreclr_tests.run.windows.x64.checked.mch 389,370,227 -11,578 -8.49%
libraries.crossgen2.windows.x64.checked.mch 44,888,851 +1,338 -8.60%
libraries.pmi.windows.x64.checked.mch 60,136,361 -7,956 -9.53%
libraries_tests.run.windows.x64.Release.mch 322,952,768 -56,484 -10.31%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 147,678,294 -20,097 -6.96%
realworld.run.windows.x64.checked.mch 10,242,976 -451 -6.47%
smoke_tests.nativeaot.windows.x64.checked.mch 4,496,305 +128 -6.51%
FullOpts (-169,140 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 23,294,229 -6,257 -9.13%
benchmarks.run.windows.x64.checked.mch 8,860,282 -21,878 -4.99%
benchmarks.run_pgo.windows.x64.checked.mch 20,646,767 -25,089 -8.82%
benchmarks.run_tiered.windows.x64.checked.mch 3,214,524 -20,816 -4.88%
coreclr_tests.run.windows.x64.checked.mch 118,145,145 -11,578 -8.49%
libraries.crossgen2.windows.x64.checked.mch 44,887,136 +1,338 -8.60%
libraries.pmi.windows.x64.checked.mch 60,023,468 -7,956 -9.53%
libraries_tests.run.windows.x64.Release.mch 132,262,168 -56,484 -10.31%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,028,529 -20,097 -6.96%
realworld.run.windows.x64.checked.mch 10,018,142 -451 -6.47%
smoke_tests.nativeaot.windows.x64.checked.mch 4,495,216 +128 -6.51%

@anthonycanino anthonycanino changed the title Enable conditional chaining for Intel APX [JIT] Enable conditional chaining for Intel APX Jan 3, 2025
@BruceForstall BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Jan 7, 2025
Comment on lines +410 to +415
// On X86, a FP compare is implemented as a fallthrough, which requires two flag checks; hence,
// we cannot simply get a single output condition to feed into a ccmp. Might be possible to chain
// this, but skipping those cases for now
GenCondition cond1;
if (op2->OperIsCmpCompare() && varTypeIsIntegralOrI(op2->gtGetOp1()) && IsInvariantInRange(op2, tree) &&
ProducesPotentialConsumableFlagsForCCMP(op1) && TryLowerConditionToFlagsNode(tree, op1, &cond1))
Copy link
Member

@jakobbotsch jakobbotsch Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to get rid of ProducedPotentialConsumableFlagsForCCMP and add an argument to TryLowerConditionToFlagsNode about whether it is allowed to lower to a condition that requires multiple flags checks. Otherwise we end up having to keep ProducedPotentialConsumableFlagsForCCMP and TryLowerConditionToFlagsNode in sync.

You can use GenConditionDesc::Get(cond).jumpKind2== EJ_NONE to check this condition in the appropriate places in TryLowerConditionToFlagsNode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants