-
Notifications
You must be signed in to change notification settings - Fork 638
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GPU] Add barriers when resolving GPUMappedForall to fix race conditi…
…on (#19635) The barriers added here can be pessimistic and we can look into optimizing them at a later point if needed. However, we end up with a race if we dont have them. In some local testing I did on a MI300 GPU, I did not find any significant performance impact by these barriers. For example an unaligned matmul + elementwise took 47us and 48us with and without the barriers respectively with TileAndFuse with padding support and the corresponding default path takes 68us. The prefill stage of ToyLLAMA took 325us and 324us respectively with and without barriers while the default path takes 461us. Signed-off-by: Nirvedh Meshram <[email protected]>
- Loading branch information
1 parent
9b4906e
commit c484058
Showing
3 changed files
with
22 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters