-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPIR-V] Add proposal for I/O builtin in Clang #97
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nathan Gauër <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that there should be a really high bar for changing the spec in a way that is not backwards compatible. For now, I don't think we should change the spec. However, I do not know the clang code very well. If others who know it better say that changing the spec is necessary, then I will go along with it.
Thanks, I have published another draft PR and updated this PR to avoid changing the spec. New draft PR is llvm/llvm-project#116393 |
@tex3d what were your concerns about this design? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. The only question is what we will do with the address space. I'll wait until that is finalized before I approve.
Some of his question were about how well it could be optimized. The two main cases are:
|
For 1 and 2, I'd assume if we optimize everything else, and just leave a load/store, driver should be able to optimize those away. However, form the Vulkan spec, an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems reasonable to me and aligns with how we're handling inputs and outputs in DirectX today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think input and output are different enough to warrant separate examples and descriptions of behavior. I'm not sure why we would be storing to an input in a destructor or loading from an output in a constructor.
This seems mainly about defining a mechanism for declaring global static variables as built-in shader inputs/outputs, rather than the wrapped entry parameter mechanism. So as a note, the example would be different wrapping a main with an output parameter, and implications for IR (pointer arg instead of value arg) and optimization opportunities are different as well. Perhaps this should only focus on the special global variable attribute, and not the standard HLSL entry parameter case.
I am a bit concerned how we specify that variables only read from or written to in control flow preserve these intended load/store locations. Unless you are always expecting unconditional load of input before shader and unconditional store of output after shader, this scheme relies on an unspoken assumption that optimizations can translate these cases:
From: unconditional load of built-in before entry and store to static global var, then conditional load from static global under control flow
To: conditional load of built-in under control flow.
From: conditional store to static global var under control flow, then unconditionally load static global after entry and store to built-in.
To: conditional store to built-in under control flow.
Additionally, if a built-in is never read or written, the load or store should be eliminated. Eliminating the store seems particularly problematic. This is one reason I think inputs and outputs need to be explored separately.
This proposal doesn't seem to discuss what's required for this to work properly.
I don't know of a language construct that expresses these semantics clearly without special-handling for these globals. Without a description of any required special language and compiler implementation implications, it's hard to evaluate whether the proposal can solve these issues when using a constructor/destructor paradigm.
I'm not a fan of the language semantics being significantly changed for a static global variable decl by using an attribute, but I realize that's the existing construct implemented in DXC, so I guess we have to support it somehow.
The never-use case is the simplest to resolve with special handling, but I think these should be guaranteed to be eliminated before SPIR-V, otherwise they could be illegal, since you're adding everything declared globally to any entry compiled from the file (you could have a built-in that isn't valid in another entry point type). I'm still not sure how you reliably move accesses to the correct control-flow locations without a special pass or something. If you are going to use a custom pass for these anyway, why bother with the constructor/destructor mechanism in the first place, since that will just obfuscate the real access locations which would be much easier and more reliable to translate directly from loads/stores of the original global variable? Adding a special constructor/destructor to initialize/store a global variable when an attribute is present feels like a weird hack from the language semantics perspective already. IMHO, this attribute should be deprecated in 202x and replaced with something cleaner for 202y. |
Agree, I'll modify the proposal to remove dtor for input as those are not useful.
Ok I think I got it. The concern is: what if a lane conditionally stores to a built-in. This specific issue can be solved by having a ctor on output builtins.
|
@tex3d answered to your concerns, and changed the spec to share more details around those specifics handling of inputs/outputs. |
@tex3d so Steven found a case which hinted as something incomplete: the PointSize BuiltIn. The answer is actually not that clear.
But what's interesting is:
Additional weird detail: printf("%f, %x", point_size, asuint(point_size)); // Prints "0, 0".
point_size = point_size + 10.f; // Yields point_size = NaN. So this hints in the direction that the actual Need to think how to solve this. |
Signed-off-by: Nathan Gauër <[email protected]>
Hello all! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a SPIR-V perspective this looks good to me.
Signed-off-by: Nathan Gauër <[email protected]>
Signed-off-by: Nathan Gauër <[email protected]>
Thanks for the review, feedback applied! |
Initial proposal to implement Input/Output built ins with both semantic & inline SPIR-V