-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LAYOUTS] Implement generically getUniqueContigPerThread #5840
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
04fb39a
to
18d4265
Compare
@@ -44,29 +44,30 @@ module attributes {"ttg.num-warps" = 8 : i32, "ttg.threads-per-warp" = 64 : i32} | |||
} | |||
} | |||
|
|||
// ----- | |||
// // ----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antiagainst This PR is breaking these two lit tests. I think it's coming from the changes in Allocation.cpp
, but I have not track them down. I assume the fix would be something like using LinearEncodingAttr::getOrder
instead of getOrder
somewhere or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be alright to merge this PR as-is and you guys could then send a forward-fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to land; we can fix later. @binarman FYI
@ThomasRaoux @Jokeren this one's ready for another review. I ended up having to fix a ton of things in the scan lowering, but hopefully all of them are reasonable enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be a larger refactoring work on reduce and scan as expected, though we should stop if there's any more important tasks.
This comment was marked as spam.
This comment was marked as spam.
…ectorisation lenghts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
This allows vectorisation on global loads and smem in some cases we didn't use it before, as we now compute the order of the elements looking at the actual LinearLayout associated to the given shape of the tensor, which is quite neat.
We end up touching a few things in the Scan lowering as BlockedLayouts when converted to LinearEncodings may not have the same order on elems/threads/warps. This is a feature, not a bug, as it takes us closer to supporting arbitrary LinearEncodings within the tt.scan op.