-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Allow stack allocate objects in loops #106526
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
It doesn't appear that this addresses the fundamental problems with why we do not allow this?
|
Thanks for the historical discussion link! Seems that we need more analysis to make sure the object doesn't escape from the iteration of the loop before enabling this. Going to let the CI finish to see how much we can benefit from this (approximately). |
@jakobbotsch It does give us really nice diffs. For example, it managed to remove the box for |
63e6984
to
463f36e
Compare
Addresses the fundamental problems by checking locals in preds to make sure locals in a loop won't be stored outsides the loop, now tests are passing. Ideally we can build a def-use for this instead of checking the preds, but the current way should be enough (though a bit conservative). |
if (basicBlockHasBackwardJump) | ||
{ | ||
lclsInPreds = BitVecOps::MakeEmpty(&m_bitVecTraits); | ||
for (BasicBlock* const predBlock : block->PredBlocks()) | ||
{ | ||
for (Statement* const stmt : predBlock->Statements()) | ||
{ | ||
GenTree* stmtExpr = stmt->GetRootNode(); | ||
if (stmtExpr->OperIsLocalStore()) | ||
{ | ||
GenTreeLclVarCommon* lclVar = stmtExpr->AsLclVarCommon(); | ||
unsigned int lclNum = lclVar->GetLclNum(); | ||
BitVecOps::AddElemD(&m_bitVecTraits, lclsInPreds, lclNum); | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is checking the preds like this sufficient? What prevents the allocated object from being stored in some other block inside the loop that reaches the backedge?
I think doing this analysis is going to require building the proper loop structures and probably also some form of liveness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents the allocated object from being stored in some other block inside the loop that reaches the backedge?
My thought is that when we see a backward edge, we check the local where the object is being stored to see if the target is a local defined in its preds, otherwise mark it escaping. For blocks that have back edges to the current block, they will be checked later so that we don't miss them.
IMO this should be a conservative approximation without accurate loop and liveness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Jakob here. I think the analysis is more complicated than this, and we need proper loop structure.
If you have a proper loop where the entry block dominates the body, then locals defined in the entry block can still be live on the back edges if they are first used before being defined.
Building loops and running liveness seems to be a necessary prerequisite, and unfortunately, liveness will be impaired at this stage of the jit, because we haven't yet figured out which locals are truly exposed, so we may end up also needing points-to analysis to help refine what can be aliased (and that likely should dovetail with or supplant the existing escape analysis), and proper loopness might also be a challenge given that we might end up creating a new loop after this phase because of a recursive tail call.
Assuming we can indeed soundly prove that an object's references cannot outlive a loop iteration (accounting for any possible liveness from EH), we'd also need to ensure that the object fields are zeroed (as they would be for a heap allocation) and/or that they are non-gc fields that are definitely written before they are read.
None of these problems are impossible, but it's a fair amount of work.
@BruceForstall, PTAL at this community PR. |
@AndyAyersMS since you're thinking about stack allocation enhancements (#104936), can you take a look? |
I'll try to take a look at this one soon... |
@hez2010 I don't believe we can take something like this yet, there are too many missing pieces. |
Feel free to close the PR as I don't have enough time to get into this yet. |
Ok, let's close this one then. I still think it's worth pursuing, someday, but we'll need to think carefully about how we can make it happen. |
Unlike stackalloc, stack allocated objects have no difference than using
struct
directly, and objects that don't escape from a loop are likely to be promoted later. Allow stack allocate objects in loops to get rid of more boxes.Inspired from https://devblogs.microsoft.com/java/improving-openjdk-scalar-replacement-part-1-3.
New codegen for
CompositeChecksum
: