-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for 512-bit Vector<T> #111472
Open
saucecontrol
wants to merge
8
commits into
dotnet:main
Choose a base branch
from
saucecontrol:vectort512
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
3bc39e4
improve support for 512-bit Vector<T>
saucecontrol eae32e9
fix size check
saucecontrol 699f15e
add MaxVectorTBitWidth configs to jitstress-isas-x86 pipeline
saucecontrol 8064d04
Merge remote-tracking branch 'upstream/main' into vectort512
saucecontrol 94cb2c5
clamp Vector<T> size to largest accelerated fixed-sized vector
saucecontrol c7dab5b
Merge remote-tracking branch 'upstream/main' into vectort512
saucecontrol 2c26436
move PreferredVectorBitWidth logic to VM
saucecontrol 70f391c
Merge branch 'main' into vectort512
saucecontrol File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we avoid reading the config setting in both VM and the JIT so that there is just one source of truth?
I assume that the JIT should be able to get the PreferredVectorBitWidth from the VM's
Vector<T>
now thatVector<T>
size is set based on it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a disconnect, because we'll continue to default
Vector<T>
to 256-bit even whenPreferredVectorBitWidth
is 512 (or 0 with fully-accelerated AVX-512). i.e. 512-bitVector<T>
will continue to be strictly opt-in for the foreseeable future.The fact that the fixed-sized vectors are known only JIT-side is odd, and I've been thinking through how it could be cleaned up, but I haven't come up with anything that markedly improves what we have now. If you have ideas, I could give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change a UX improvement then? It seems to make the explanation of how the switches work and interact more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's early enough in the cycle I think we can allow the default size of Vector to be 512 if the hardware supports it.
It would go out P2 at the latest and give us an early indicator if too many users are broken or regress in performance because of it. Such users can always do
MaxVectorTBitWidth=256
as a workaround, particularly if the amount of breaks is "small", and we can revert and do the more complex configuration knobs if the signal is high enough.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we will end up conceptually having in the "worst" case is the following:
Where
MaxVectorBitWidth
is the conceptual value the hardware supports and the "worst" case is the need to default Vector to no more than 256-bits in size.This ensures that
sizeof(Vector<T>)
is not greater than the maximum size that we report as hardware accelerated. For example, you could not on x64 haveVector512.IsHardwareAccelerated == false
andVector<byte>.Count == 64
as that would add needless complexity to the user consideration.Ideally though we can simplify it to the following:
I think in both cases we don't need the JIT to handle anything, however. We can pick the sizeof Vector entirely on the VM side and then have the JIT simply get it via the
compExactlyDependsOn(InstructionSet_VectorT512)
checks that currently propagate that size information to the JIT: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/compiler.h#L9285-L9330There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but then how does JIT decide which fixed-width classes to report as hardware accelerated? It can't use the VectorT ISA, because we have to continue to support
Vector512.IsHardwareAccelerated == true
withsizeof(Vector<T>) == 32
.I agree with this either way. The name
PreferredVectorBitWidth
isn't helping. Perhaps that could be better named. Something likeMaxAcceleratedVectorBitWidth
?Then the largest fixed-width vector considered accelerated is
Min(MaxAcceleratedVectorBitWidth, [Max hardware SIMD size])
And
Vector<T>
size isMin(MaxVectorTBitWidth, MaxAcceleratedVectorBitWidth, [Max hardware SIMD size])
, where we can still defaultMaxVectorTBitWidth
to 256 for back compat.That is, of course, the same thing I've implemented here, but the naming is easier to follow. I wouldn't mind taking on a cleanup task to deal with the 'virtual' ISAs that today only JIT knows about. Removing this duplication (if it stays) should be one of the goals of that effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way the VM currently sets one of
InstructionSet_VectorT###
to indicate its size, it should likely be usingInstructionSet_Vector###
to indicate acceleration.That is the JIT doesn't need access to
PreferredVectorBitWidth
norMaxVectorTBitWidth
. Rather instead it knowssizeof(Vector<T>)
based onInstructionSet_VectorT###
and it knows theopts.preferredVectorByteLength
based on the largestInstructionSet_Vector###
.It can then continue internally setting
InstructionSet_Vector###
itself (after its computed theopts.preferredVectorByteLength
value) based on theInstructionSet_Isa
that are supported so that we still get the expected codegen, just without reporting itself as accelerated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstructionSet_Vector###
is exactly the problem. Today, only JIT knows about those ISAs. Moving them to VM would be a breaking change for R2R and NAOT, because they would also have to pass those to JIT, and they don't.Fixing that (and I do agree it needs to be fixed) is a much bigger change that is better done separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked through this on Discord by explaining the difference between
readytoruninstructionset
andcorinfoinstructionset
. The latter isn't tracked by R2R but is used by both the JIT (codeman.cpp) and NAOT (CorInfoImpl.cs) VMs. Accordingly, we should be able to do all the selection on the VM side. The nuance was we can't get the JIT info from just theVector<T>
size unless we agree thatVector<T>
implicitly grows up toPreferredVectorBitWidth
. There are concerns with that due to it historically being no larger than 32-bytes and the likelyhood code may break (bad assumptions were hardcoded) or silently regress (code paths no longer accelerate but were guarded against larger sizes).The safest thing for us, given that concern, is to have the VM (https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/codeman.cpp#L1225 and https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs#L4197) start setting
InstructionSet_Vector###
based onDOTNET_PreferredVectorBitWidth
. This ensures we have a single "source of truth" and avoids mirroring config knob access between the JIT/VM. We can then, in a separate PR, do a one line change in the VM to experiment with defaulting Vector to be the same asPreferredVectorBitWidth
, rather than restricting it to no more than 256 by default. This makes it easy to back out if the concerns are founded and decently widespread breaks are found.This keeps things simple with one source of truth and doesn't require more comprehensive refactoring to achieve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I believe I have done what we discussed. It did allow removing a JIT flag and got the throttling logic completely on the VM side, but I couldn't move the
Vector###
ISAs completely over because the meanings are still overloaded. e.g.InstructionSet_Vector256
can't mean we want fully-accelerated Vector256 (AVX2+PreferredVectorBitWidth>=256) because it still means partially-accelerated Vector256 (AVX) is actually supported.I also found that we were doing the Vector512 throttling on the AOT side, so I've duplicated the logic to clamp
Vector<T>
size there as well. So we still have duplicated logic, but I suppose it's a bit cleaner now.