Add the possibility to store only cubin/hsaco files in the cache directory #5827

fulvius31 · 2025-02-05T16:16:13Z

Since AMDGCN, PTX, LLIR, TTGIR, and TTIR are not fundamental for caching when the primary goal
is improving performance in terms of speed, the environment variable TRITON_STORE_BINARY_ONLY was introduced to avoid storing these files.

This change helps mitigate storage impact when caching multiple kernels.

Signed-off-by: Alessandro Sangiorgi [email protected]

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because FILL THIS IN.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

…tory

…ctory Since AMDGCN, PTX, LLIR, TTGIR, and TTIR are not fundamental for caching when the primary goal is improving performance in terms of speed, the environment variable `TRITON_STORE_CUBIN_HSACO_ONLY` was introduced to avoid storing these files. This change helps mitigate storage impact when caching multiple kernels. Signed-off-by: Alessandro Sangiorgi [email protected]

ThomasRaoux

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.

If you run test_core.py 2 times in a row I expect you'll see failures

fulvius31 · 2025-02-06T02:42:45Z

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.

If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.

I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user.
Could you please clarify if there is a specific scenario I should test for?

I also tried to :

set TRITON_STORE_CUBIN_HSACO_ONLY to 1
run a kernel
set TRITON_STORE_CUBIN_HSACO_ONLY to 0
run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

ThomasRaoux · 2025-02-06T02:45:02Z

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.

I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. Could you please clarify if there is a specific scenario I should test for?

I also tried to :

set TRITON_STORE_CUBIN_HSACO_ONLY to 1

run a kernel

set TRITON_STORE_CUBIN_HSACO_ONLY to 0

run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs?

fulvius31 · 2025-02-06T03:46:20Z

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.
I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. Could you please clarify if there is a specific scenario I should test for?
I also tried to :

set TRITON_STORE_CUBIN_HSACO_ONLY to 1

run a kernel

set TRITON_STORE_CUBIN_HSACO_ONLY to 0

run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs?

You are totally right!

With the TRITON_STORE_CUBIN_HSACO_ONLY=1 the test_core.py does not pass.

I did not test twice using that env var.

Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite?

ThomasRaoux · 2025-02-06T07:02:33Z

That means the intermediate IR would be there on a cold cache but not when hitting the cache? A cache hit shouldn't be observable by user.
If you run test_core.py 2 times in a row I expect you'll see failures

Thanks for the comment. My edits ensure that we never store the intermediate IR files (TTIR, TTGIR, LLIR, PTX, AMDGCN) regardless of whether the cache is cold or warm. In both cases, only the final binary (cubin and hsaco) and the metadata JSON are written to the cache.
I ran test_core.py twice in a row, and the results are identical. the intermediate IRs are never present, so the cache hit should be completely transparent to the user. Could you please clarify if there is a specific scenario I should test for?
I also tried to :

set TRITON_STORE_CUBIN_HSACO_ONLY to 1

run a kernel

set TRITON_STORE_CUBIN_HSACO_ONLY to 0

run the same kernel again

The files in the cache did not change. So in that case, I did not have the IRs stored.

I see, how does test_core.py passes without the intermediate IR since a lot of tests rely on checking the intermediate IRs?

You are totally right!

With the TRITON_STORE_CUBIN_HSACO_ONLY=1 the test_core.py does not pass.

I did not test twice using that env var.

Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite?

I see, I got thrown off by the description saying test_core.py runs without error.
Are you running into specific problems with the cache size?

I think overall it should work, if there is a use case I don't see a strong reason not to support it

Jokeren · 2025-02-06T14:38:20Z

The variable name can be improved. TRITON_STORE_CUBIN_HSACO_ONLY -> TRITON_STORE_BINARY_ONLY IMO

fulvius31 · 2025-02-06T14:49:30Z

Do you think offering users the option to save storage by limiting cached files to only cubin/hsaco could be valuable? Or do you see cases where having intermediate IR files is essential beyond just running the test suite?

I see, I got thrown off by the description saying test_core.py runs without error. Are you running into specific problems with the cache size?

@ThomasRaoux
Not a specific problem, but I’ve been doing some initial experiments on the "power of cache" in terms of speed (GitHub repo).

I hope in a future where most kernels will run on Triton, and in production (or even dev) environments, you could end up caching thousands of kernels. Given that, I was thinking about ways to optimize storage, especially in cases where the cache can be reused across a homogeneous cluster, reducing the need for redundant recompilation and improving initial performance.

The variable name can be improved. TRITON_STORE_CUBIN_HSACO_ONLY -> TRITON_STORE_BINARY_ONLY IMO

@Jokeren I like your suggestion, a lot!

Edit : @Jokeren I just updated the var names. Let me know if there is anything else I can change!

ThomasRaoux · 2025-02-07T18:49:03Z

It feels a bit like a premature optimization from the description but since this is a trivial change I don't want to block it.
@Jokeren, you have an opinion?

not block

Jokeren · 2025-02-07T18:52:14Z

I don't have a strong opinion. Feel free to merge

fulvius31 added 8 commits January 22, 2025 18:25

Add the possibility to store only cubin/hsaco file on the cache direc…

a00faed

…tory

Merge branch 'triton-lang:main' into cache

eecb5c6

Merge branch 'triton-lang:main' into cache

0bb137d

Merge branch 'triton-lang:main' into cache

9c3434b

Merge branch 'triton-lang:main' into cache

3ce9d5f

Merge branch 'triton-lang:main' into cache

cc0491e

Merge branch 'triton-lang:main' into cache

f9aa200

fulvius31 requested a review from ptillet as a code owner February 5, 2025 16:16

Merge branch 'main' into cache

559a690

ThomasRaoux previously requested changes Feb 6, 2025

View reviewed changes

Merge branch 'main' into cache

31045e5

fulvius31 added 4 commits February 6, 2025 08:53

Change TRITON_STORE_CUBIN_HSACO_ONLY with TRITON_STORE_BINARY_ONLY

f46acb9

Change store_only_cubin_hsaco with store_only_binary

12dda28

Merge branch 'main' into cache

a1939df

Merge branch 'main' into cache

5948574

fulvius31 requested a review from ThomasRaoux February 7, 2025 15:36

Jokeren approved these changes Feb 8, 2025

View reviewed changes

pawelszczerbuk merged commit 5a99ff5 into triton-lang:main Feb 10, 2025
7 checks passed

fulvius31 deleted the cache branch February 10, 2025 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the possibility to store only cubin/hsaco files in the cache directory #5827

Add the possibility to store only cubin/hsaco files in the cache directory #5827

fulvius31 commented Feb 5, 2025 •

edited

Loading

ThomasRaoux left a comment

fulvius31 commented Feb 6, 2025

ThomasRaoux commented Feb 6, 2025

fulvius31 commented Feb 6, 2025

ThomasRaoux commented Feb 6, 2025

Jokeren commented Feb 6, 2025

fulvius31 commented Feb 6, 2025 •

edited

Loading

ThomasRaoux commented Feb 7, 2025

Jokeren commented Feb 7, 2025

Add the possibility to store only cubin/hsaco files in the cache directory #5827

Add the possibility to store only cubin/hsaco files in the cache directory #5827

Conversation

fulvius31 commented Feb 5, 2025 • edited Loading

New contributor declaration

ThomasRaoux left a comment

Choose a reason for hiding this comment

fulvius31 commented Feb 6, 2025

ThomasRaoux commented Feb 6, 2025

fulvius31 commented Feb 6, 2025

ThomasRaoux commented Feb 6, 2025

Jokeren commented Feb 6, 2025

fulvius31 commented Feb 6, 2025 • edited Loading

ThomasRaoux commented Feb 7, 2025

Jokeren commented Feb 7, 2025

fulvius31 commented Feb 5, 2025 •

edited

Loading

fulvius31 commented Feb 6, 2025 •

edited

Loading