-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the possibility to store only cubin/hsaco files in the cache directory #5827
Conversation
…ctory Since AMDGCN, PTX, LLIR, TTGIR, and TTIR are not fundamental for caching when the primary goal is improving performance in terms of speed, the environment variable `TRITON_STORE_CUBIN_HSACO_ONLY` was introduced to avoid storing these files. This change helps mitigate storage impact when caching multiple kernels. Signed-off-by: Alessandro Sangiorgi [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures
Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache. I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. I also tried to :
The files in the cache did not change. So in that case, I did not have the IRs stored. |
I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs? |
You are totally right! With the I did not test twice using that env var. Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite? |
I see, I got thrown off by the description saying test_core.py runs without error. I think overall it should work, if there is a use case I don't see a strong reason not to support it |
The variable name can be improved. |
@ThomasRaoux I hope in a future where most kernels will run on Triton, and in production (or even dev) environments, you could end up caching thousands of kernels. Given that, I was thinking about ways to optimize storage, especially in cases where the cache can be reused across a homogeneous cluster, reducing the need for redundant recompilation and improving initial performance.
@Jokeren I like your suggestion, a lot! Edit : @Jokeren I just updated the var names. Let me know if there is anything else I can change! |
It feels a bit like a premature optimization from the description but since this is a trivial change I don't want to block it. |
I don't have a strong opinion. Feel free to merge |
Since AMDGCN, PTX, LLIR, TTGIR, and TTIR are not fundamental for caching when the primary goal
is improving performance in terms of speed, the environment variable
TRITON_STORE_BINARY_ONLY
was introduced to avoid storing these files.This change helps mitigate storage impact when caching multiple kernels.
Signed-off-by: Alessandro Sangiorgi [email protected]
New contributor declaration
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsFILL THIS IN
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)