Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary buffers creation when enabling scf.forall distribution on the vector distribute pipeline. #19608

Open
pashu123 opened this issue Jan 6, 2025 · 1 comment

Comments

@pashu123
Copy link
Contributor

pashu123 commented Jan 6, 2025

Dump with forall distribution: https://gist.github.com/pashu123/2a162391c5212dc7351a08d0748833fd

Dump without forall distribution: https://gist.github.com/pashu123/b49299b19d14959244079d75ecc502ba

After the IR dump after EmptyTensorToAllocTensor pass the dump with forall distribution has an extra buffer.

%17 = bufferization.alloc_tensor() : tensor<64x64xf16>
%31 = vector.transfer_write %30, %17[%c0, %c0] {in_bounds = [true, true]} : vector<64x64xf16>, tensor<64x64xf16>
%extracted_slice = tensor.extract_slice %arg3[%arg0, %arg2, %arg1, 0] [1, 64, 1, 64] [1, 1, 1, 1] : tensor<2x4096x10x64xf16> to tensor<1x64x1x64xf16>
%inserted_slice = tensor.insert_slice %31 into %extracted_slice[0, 0, 0, 0] [1, 64, 1, 64] [1, 1, 1, 1] : tensor<64x64xf16> into tensor<1x64x1x64xf16>

Looking further, the allocated tensor (%17) seems to be redundant, and we can directly transfer write to the %extracted slice.

@pashu123
Copy link
Contributor Author

pashu123 commented Jan 6, 2025

Reference PR: #19420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant