Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIR-V] Add proposal for I/O builtin in Clang #97

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Keenuts
Copy link
Collaborator

@Keenuts Keenuts commented Nov 8, 2024

Initial proposal to implement Input/Output built ins with both semantic & inline SPIR-V

Copy link
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that there should be a really high bar for changing the spec in a way that is not backwards compatible. For now, I don't think we should change the spec. However, I do not know the clang code very well. If others who know it better say that changing the spec is necessary, then I will go along with it.

proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
@damyanp damyanp requested a review from tex3d November 8, 2024 18:56
@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 15, 2024

Thanks, I have published another draft PR and updated this PR to avoid changing the spec.
This should be closer to what DXIL is doing, but will change a bit how the few semantics we already have are implemented (since we now have a generic intrinsic).

New draft PR is llvm/llvm-project#116393

@Keenuts Keenuts requested a review from s-perron November 15, 2024 15:29
@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 20, 2024

@tex3d what were your concerns about this design?

Copy link
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. The only question is what we will do with the address space. I'll wait until that is finalized before I approve.

@s-perron
Copy link
Collaborator

@tex3d what were your concerns about this design?

Some of his question were about how well it could be optimized. The two main cases are:

  1. If the input variable is never used, will the call to the load intrinsic be removed?
  2. If the output variable is never actually written it, could the store intrinsic be removed?

@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 20, 2024

@tex3d what were your concerns about this design?

Some of his question were about how well it could be optimized. The two main cases are:

  1. If the input variable is never used, will the call to the load intrinsic be removed?
  2. If the output variable is never actually written it, could the store intrinsic be removed?

For 1 and 2, I'd assume if we optimize everything else, and just leave a load/store, driver should be able to optimize those away.

However, form the Vulkan spec, an Output BuiltIn starts with an undefined value.
So in this design, if no value is written by the user, a default value will be written to the output in the dtor.
I'd say that's OK, since we replace an undefined behavior with another behavior.

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems reasonable to me and aligns with how we're handling inputs and outputs in DirectX today.

Copy link
Collaborator

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think input and output are different enough to warrant separate examples and descriptions of behavior. I'm not sure why we would be storing to an input in a destructor or loading from an output in a constructor.

This seems mainly about defining a mechanism for declaring global static variables as built-in shader inputs/outputs, rather than the wrapped entry parameter mechanism. So as a note, the example would be different wrapping a main with an output parameter, and implications for IR (pointer arg instead of value arg) and optimization opportunities are different as well. Perhaps this should only focus on the special global variable attribute, and not the standard HLSL entry parameter case.

I am a bit concerned how we specify that variables only read from or written to in control flow preserve these intended load/store locations. Unless you are always expecting unconditional load of input before shader and unconditional store of output after shader, this scheme relies on an unspoken assumption that optimizations can translate these cases:

From: unconditional load of built-in before entry and store to static global var, then conditional load from static global under control flow
To: conditional load of built-in under control flow.

From: conditional store to static global var under control flow, then unconditionally load static global after entry and store to built-in.
To: conditional store to built-in under control flow.

Additionally, if a built-in is never read or written, the load or store should be eliminated. Eliminating the store seems particularly problematic. This is one reason I think inputs and outputs need to be explored separately.

This proposal doesn't seem to discuss what's required for this to work properly.

I don't know of a language construct that expresses these semantics clearly without special-handling for these globals. Without a description of any required special language and compiler implementation implications, it's hard to evaluate whether the proposal can solve these issues when using a constructor/destructor paradigm.

I'm not a fan of the language semantics being significantly changed for a static global variable decl by using an attribute, but I realize that's the existing construct implemented in DXC, so I guess we have to support it somehow.

@tex3d
Copy link
Collaborator

tex3d commented Nov 26, 2024

@tex3d what were your concerns about this design?

Some of his question were about how well it could be optimized. The two main cases are:

  1. If the input variable is never used, will the call to the load intrinsic be removed?
  2. If the output variable is never actually written it, could the store intrinsic be removed?

For 1 and 2, I'd assume if we optimize everything else, and just leave a load/store, driver should be able to optimize those away.

However, form the Vulkan spec, an Output BuiltIn starts with an undefined value. So in this design, if no value is written by the user, a default value will be written to the output in the dtor. I'd say that's OK, since we replace an undefined behavior with another behavior.

The never-use case is the simplest to resolve with special handling, but I think these should be guaranteed to be eliminated before SPIR-V, otherwise they could be illegal, since you're adding everything declared globally to any entry compiled from the file (you could have a built-in that isn't valid in another entry point type).

I'm still not sure how you reliably move accesses to the correct control-flow locations without a special pass or something. If you are going to use a custom pass for these anyway, why bother with the constructor/destructor mechanism in the first place, since that will just obfuscate the real access locations which would be much easier and more reliable to translate directly from loads/stores of the original global variable?

Adding a special constructor/destructor to initialize/store a global variable when an attribute is present feels like a weird hack from the language semantics perspective already.

IMHO, this attribute should be deprecated in 202x and replaced with something cleaner for 202y.

@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 26, 2024

I think input and output are different enough to warrant separate examples and descriptions of behavior. I'm not sure why we would be storing to an input in a destructor or loading from an output in a constructor.

Agree, I'll modify the proposal to remove dtor for input as those are not useful.
For ctor for output, see part below about storing undef.

I am a bit concerned how we specify that variables only read from or written to in control flow preserve these intended load/store locations.

Ok I think I got it. The concern is: what if a lane conditionally stores to a built-in.
Since store to the builtin is done at the end, you worry the undefined value in the output will be changed in all paths, since the built-in store is now unconditional.

This specific issue can be solved by having a ctor on output builtins.

  • load undefined value in the global
  • shader modifies it or not in some control flow.
  • store back the value, which is either the original undefined, or a new value.

@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 26, 2024

@tex3d answered to your concerns, and changed the spec to share more details around those specifics handling of inputs/outputs.
Let me know if I missed something!

@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 27, 2024

@tex3d so Steven found a case which hinted as something incomplete: the PointSize BuiltIn.
Unlike this others, this one is assumed to have the value 1.0 if not stored to. But question was: would a load/store pair be OK.

The answer is actually not that clear.
In SPIR-V, those yield the same result:

  • Load, Store
  • Store 1.0
  • do nothing
    All 3 options gives us a point size of 1.0.
    And all are translated in a reg = 1.f in the final ISA I have access to (AMD RNDA).

But what's interesting is:

  • a = load(), store(a + 123)
    This specific case yields a reg = NaN. Which make me believe AMD has a special case for store(load()), but otherwise, a load yields an undefined value, hence why we have a NaN.

Additional weird detail:

printf("%f, %x", point_size, asuint(point_size)); // Prints "0, 0".
point_size = point_size + 10.f; // Yields point_size = NaN.

So this hints in the direction that the actual store could be assumed to have a hidden side-effect, and thus I shall not emit a load/store pair if the user hasn't written to the builtin.
So this means if a branch stores a value, and another doesn't, I cannot simple store in both.

Need to think how to solve this.

Signed-off-by: Nathan Gauër <[email protected]>
@Keenuts
Copy link
Collaborator Author

Keenuts commented Nov 28, 2024

Hello all!
So for the reasons outlined above (builtin store has an hidden side-effect), I changed the whole design to implement this as regular global variables, and remove the ctor/dtor/load/store builtins.

Copy link
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a SPIR-V perspective this looks good to me.

proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
proposals/NNNN-spirv-input-builtin.md Outdated Show resolved Hide resolved
Signed-off-by: Nathan Gauër <[email protected]>
Signed-off-by: Nathan Gauër <[email protected]>
@Keenuts
Copy link
Collaborator Author

Keenuts commented Dec 2, 2024

Thanks for the review, feedback applied!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants