-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support accumulating GEMMs in TileAndFuse with intrinsic without needing c promotion #19546
Comments
Here is what is causing this to fail to bufferize, after
The point being that the accumlator input slice was derived from Later after
At this stage the acc_input is a slice of
|
I spot an issue in your dump. The issue that I spot in the dump is that the output binding is ReadOnly. This is because your input program write the result to the function argument. IREE is not smart enough to create a global buffer for the output tensor, and maybe it should not happen -- I can't interpret the meaning of writing the result into input argument. It is a tensor, not a pointer. func.func @bmm(%arg0: tensor<512x128xi8>, %arg1: tensor<512x128xi8>, %arg2: tensor<512x512xi32>) -> tensor<512x512xi32> {
%0 = linalg.matmul_transpose_b ins(%arg0, %arg1 : tensor<512x128xi8>, tensor<512x128xi8>) outs(%arg2 : tensor<512x512xi32>) -> tensor<512x512xi32>
return %0 : tensor<512x512xi32>
} I think it is better to have a dump with func.func @bmm(%arg0: tensor<512x128xi8>, %arg1: tensor<512x128xi8>) -> tensor<512x512xi32> {
%arg2 = tensor.empty() : tensor<512x512xi32>
%0 = linalg.matmul_transpose_b ins(%arg0, %arg1 : tensor<512x128xi8>, tensor<512x128xi8>) outs(%arg2 : tensor<512x512xi32>) -> tensor<512x512xi32>
return %0 : tensor<512x512xi32>
} It is clearer because we explicitly ask IREE to create a global buffer for the tensor and output the result at the end. I did not run the example myself because I don't know what the compilation command is. |
If the issue comes from |
The other solution might be running something like RemoveArgOutsDependency pattern at global level, which is similar to RemoveCstOutsDependency pattern in the ConvertToDestinationPassingStylePass. iree/compiler/src/iree/compiler/Codegen/Common/ConvertToDestinationPassingStylePass.cpp Lines 492 to 527 in c7086cf
|
@MaheshRavishankar WDYT of the two suggestions from @hanhanW above. Based on our conversations previously we want to support accumulating GEMMS without transforming them to non-accumulating GEMM + elementwise add, That being said to make progress on the TileAndFuse side where we have codegen issues described above on it, we decided we could go down the elementwise add path only for that pipeline till we have a proper solution, so thats what I tried in this PR |
My memory of the IR we looked at was that conversion to DPS was not the issue, the issue was a RaW conflict arising because folders kicked in on the read that didn't happen on the writes.
I see the readwrite binding in the dump
Maybe the binding layout looks confusing because it lists the layouts of all bindings rather than just the one it is representing, so in this case it's the last entry |
ah, I see. I did not notice that it lists all the layouts.. |
This is a perfectly valid program and should be supported correctly in IREE (I think it does today). The semantics is that the initial value of the
Yeah, Like Quinn said this is just going to be garbage output (and there is a flag in IREE where we force zero-initialize undefined tensors like this which just happens to work).
I dont think those work... This is really a pipeline issue and is a local fix for the pipeline itself. We shouldnt be trying to work around this at the full program level cause those will be hard to control. Converting GEMM to fill + GEMM + accumulate is kind of a hack, but just something for us to fix in the long run. For now this is fine. |
Converts dispatches with accumulating GEMMs that are doing in place read/write to GEMM + elementwise add. This is needed for the TileAndFuse path until we find a more permanent fix for #19546 --------- Signed-off-by: Nirvedh Meshram <[email protected]>
I see what you meant. It makes sense to me, thanks! |
Currently for accumulating GEMM we fail to bufferize if we dont do c promotion in TileAndFuse pipeline when using intrinsics. See dump here . I know there are some tranforms that are still in development but I wasnt sure they will serve this case as well.
The text was updated successfully, but these errors were encountered: