-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405
Comments
@llvm/issue-subscribers-backend-risc-v Author: Luke Lau (lukel97)
After #101641, this vp merge began to be emitted from the loop vectorizer with EVL tail folding from SPEC CPU 2017:
define <vscale x 16 x i1> @<!-- -->f(<vscale x 16 x i1> %m, <vscale x 16 x i1> %x, <vscale x 16 x i1> %y, i32 %evl) {
%1 = tail call <vscale x 16 x i1> @<!-- -->llvm.vp.merge.nxv16i1(<vscale x 16 x i1> %m, <vscale x 16 x i1> %x, <vscale x 16 x i1> %y, i32 %evl)
ret <vscale x 16 x i1> %1
} It crashes with
|
cc @Mel-Chen |
I think we should be able to lower this in the RISC-V backend, and I think it makes sense to transform selects to VP nodes during EVL tail folding since it allows us to get rid of a mask user. But I think this goes to show that VP intrinsics aren't as well supported as regular, non-predicated instructions in the RISC-V backend. Now that the VLOptimizer pass is enabled by default, do we want to reconsider emitting VP intrinsics during EVL tail folding if they're not necessary? I.e. not selects/loads/stores, but binary ops/other intrinsics. My gut feeling is that we'll be missing out on a lot of combines and optimisations that just aren't implemented for VP intrinsics/nodes. In that case we should probably emit regular instructions where possible in the loop vectorizer, then teach DAGCombiner and friends to handle VP nodes first. Then we could revisit emitting VP intrinsics once the codegen is a bit better. (This is somewhat speculation: It would be good to evaluate this first! But to do that we need to fix the assertion failures 🥲) cc @Mel-Chen @arcbbb @LiqinWeng @preames @npanchen @michaelmaitland |
We probably have a fix for this in our downstream. We've been using VP intrinsics in our vectorizer for a while. I'll see if I can find it. |
How can we get such issues surfaced upstream sooner? It sounds like it might be good to put a hold on future EVL patches in LoopVectorize until all issues with the current state are fixed. |
Well, at least some of the current EVL patches are addressing functional issues, so pausing that would seem slightly self defeating. However, I do agree that we need to get the EVL stuff into a correct working state before continuing to work on optimization quality. This does seem to be happening, but it needs to be more of a focus. @lukel97's recent test runs (and reporting) is a great help here, but we clearly need to be doing more of this. |
The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it almost 3 years ago when a crash was reported in our downstream. Fixes llvm#120405.
The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it almost 3 years ago when a crash was reported in our downstream. Fixes llvm#120405.
The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it almost 3 years ago when a crash was reported in our downstream. Fixes llvm#120405.
The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.
Yep I didn't mean any patches addressing issues, only patches generating more |
Apologize for not noticing the issue with i1 vp.merge in codegen. This appears to be the i1 vp.merge generated by AnyOf.
I believe the reason this issue wasn't discovered earlier is that AnyOf lacks test coverage in the llvm test-suite. I’ve strengthened this in llvm/llvm-test-suite#195 — please take a look. |
After #120479 everything is working again, does it still need disabled? I understand that i1 vectors aren't supported in isLegalElementTypeForRVV, but presumably the loop vectorizer emits them in other places too e.g. predication or mask tail folding. So it doesn't sound too unreasonable to expect the backend to support them. |
This patch adds runtime test case for vectorization of AnyOf reduction idiom: int32_t Rdx = -1; for (unsigned I = 0; I < TC; I++) { Rdx = A[I] > B[I] ? 3 : Rdx; } return Rdx; Improving test coverage, and please refer the issue llvm/llvm-project#120405.
After #101641, this vp merge began to be emitted from the loop vectorizer with EVL tail folding from SPEC CPU 2017:
It crashes with
llc -mtriple=riscv64 -mattr=+v
withLLVM ERROR: Invalid size request on a scalable vector.
because it tries to unroll it when legalizing vector ops. The DAG after type legalization looks like:The text was updated successfully, but these errors were encountered: