Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to store only cubin/hsaco files in the cache directory #5827

Merged
merged 14 commits into from
Feb 10, 2025

Conversation

fulvius31
Copy link
Contributor

@fulvius31 fulvius31 commented Feb 5, 2025

Since AMDGCN, PTX, LLIR, TTGIR, and TTIR are not fundamental for caching when the primary goal
is improving performance in terms of speed, the environment variable TRITON_STORE_BINARY_ONLY was introduced to avoid storing these files.

This change helps mitigate storage impact when caching multiple kernels.

Signed-off-by: Alessandro Sangiorgi [email protected]

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

…ctory

Since AMDGCN, PTX, LLIR, TTGIR, and TTIR are not fundamental for caching when the primary goal
is improving performance in terms of speed, the environment variable
`TRITON_STORE_CUBIN_HSACO_ONLY` was introduced to avoid storing these files.

This change helps mitigate storage impact when caching multiple kernels.

Signed-off-by: Alessandro Sangiorgi [email protected]
@fulvius31 fulvius31 requested a review from ptillet as a code owner February 5, 2025 16:16
Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.

If you run test_core.py 2 times in a row I expect you'll see failures

@fulvius31
Copy link
Contributor Author

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.

If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.

I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user.
Could you please clarify if there is a specific scenario I should test for?

I also tried to :

  • set TRITON_STORE_CUBIN_HSACO_ONLY to 1
  • run a kernel
  • set TRITON_STORE_CUBIN_HSACO_ONLY to 0
  • run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

@ThomasRaoux
Copy link
Collaborator

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.

I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. Could you please clarify if there is a specific scenario I should test for?

I also tried to :

  • set TRITON_STORE_CUBIN_HSACO_ONLY to 1
  • run a kernel
  • set TRITON_STORE_CUBIN_HSACO_ONLY to 0
  • run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs?

@fulvius31
Copy link
Contributor Author

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.
I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. Could you please clarify if there is a specific scenario I should test for?
I also tried to :

  • set TRITON_STORE_CUBIN_HSACO_ONLY to 1
  • run a kernel
  • set TRITON_STORE_CUBIN_HSACO_ONLY to 0
  • run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs?

You are totally right!

With the TRITON_STORE_CUBIN_HSACO_ONLY=1 the test_core.py does not pass.

I did not test twice using that env var.

Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite?

@ThomasRaoux
Copy link
Collaborator

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.
I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. Could you please clarify if there is a specific scenario I should test for?
I also tried to :

  • set TRITON_STORE_CUBIN_HSACO_ONLY to 1
  • run a kernel
  • set TRITON_STORE_CUBIN_HSACO_ONLY to 0
  • run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs?

You are totally right!

With the TRITON_STORE_CUBIN_HSACO_ONLY=1 the test_core.py does not pass.

I did not test twice using that env var.

Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite?

I see, I got thrown off by the description saying test_core.py runs without error.
Are you running into specific problems with the cache size?

I think overall it should work, if there is a use case I don't see a strong reason not to support it

@Jokeren
Copy link
Contributor

Jokeren commented Feb 6, 2025

The variable name can be improved. TRITON_STORE_CUBIN_HSACO_ONLY -> TRITON_STORE_BINARY_ONLY IMO

@fulvius31
Copy link
Contributor Author

fulvius31 commented Feb 6, 2025

Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite?

I see, I got thrown off by the description saying test_core.py runs without error. Are you running into specific problems with the cache size?

@ThomasRaoux
Not a specific problem, but I’ve been doing some initial experiments on the "power of cache" in terms of speed (GitHub repo).

I hope in a future where most kernels will run on Triton, and in production (or even dev) environments, you could end up caching thousands of kernels. Given that, I was thinking about ways to optimize storage, especially in cases where the cache can be reused across a homogeneous cluster, reducing the need for redundant recompilation and improving initial performance.

The variable name can be improved. TRITON_STORE_CUBIN_HSACO_ONLY -> TRITON_STORE_BINARY_ONLY IMO

@Jokeren I like your suggestion, a lot!

Edit : @Jokeren I just updated the var names. Let me know if there is anything else I can change!

@fulvius31 fulvius31 requested a review from ThomasRaoux February 7, 2025 15:36
@ThomasRaoux
Copy link
Collaborator

It feels a bit like a premature optimization from the description but since this is a trivial change I don't want to block it.
@Jokeren, you have an opinion?

@ThomasRaoux ThomasRaoux dismissed their stale review February 7, 2025 18:49

not block

@Jokeren
Copy link
Contributor

Jokeren commented Feb 7, 2025

I don't have a strong opinion. Feel free to merge

@pawelszczerbuk pawelszczerbuk merged commit 5a99ff5 into triton-lang:main Feb 10, 2025
7 checks passed
@fulvius31 fulvius31 deleted the cache branch February 10, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants