Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LAYOUTS] Implement generically getUniqueContigPerThread #5840

Merged
merged 9 commits into from
Feb 13, 2025
Merged

Conversation

lezcano
Copy link
Contributor

@lezcano lezcano commented Feb 6, 2025

This allows vectorisation on global loads and smem in some cases we didn't use it before, as we now compute the order of the elements looking at the actual LinearLayout associated to the given shape of the tensor, which is quite neat.

We end up touching a few things in the Scan lowering as BlockedLayouts when converted to LinearEncodings may not have the same order on elems/threads/warps. This is a feature, not a bug, as it takes us closer to supporting arbitrary LinearEncodings within the tt.scan op.

@lezcano lezcano requested a review from ptillet as a code owner February 6, 2025 16:50
Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lezcano lezcano marked this pull request as draft February 6, 2025 22:25
@lezcano lezcano force-pushed the layouts branch 3 times, most recently from 04fb39a to 18d4265 Compare February 11, 2025 20:07
@@ -44,29 +44,30 @@ module attributes {"ttg.num-warps" = 8 : i32, "ttg.threads-per-warp" = 64 : i32}
}
}

// -----
// // -----
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antiagainst This PR is breaking these two lit tests. I think it's coming from the changes in Allocation.cpp, but I have not track them down. I assume the fix would be something like using LinearEncodingAttr::getOrder instead of getOrder somewhere or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be alright to merge this PR as-is and you guys could then send a forward-fix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to land; we can fix later. @binarman FYI

@lezcano lezcano marked this pull request as ready for review February 11, 2025 22:44
@lezcano lezcano requested a review from Jokeren as a code owner February 11, 2025 22:44
@lezcano
Copy link
Contributor Author

lezcano commented Feb 11, 2025

@ThomasRaoux @Jokeren this one's ready for another review. I ended up having to fix a ton of things in the scan lowering, but hopefully all of them are reasonable enough.

Copy link
Contributor

@Jokeren Jokeren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be a larger refactoring work on reduce and scan as expected, though we should stop if there's any more important tasks.

lib/Analysis/Allocation.cpp Outdated Show resolved Hide resolved
lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated Show resolved Hide resolved
@chris002oak

This comment was marked as spam.

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@lezcano lezcano merged commit 06941f4 into main Feb 13, 2025
7 checks passed
@lezcano lezcano deleted the layouts branch February 13, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants