[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

lukel97 · 2024-12-18T11:31:09Z

After #101641, this vp merge began to be emitted from the loop vectorizer with EVL tail folding from SPEC CPU 2017:

define <vscale x 16 x i1> @f(<vscale x 16 x i1> %m, <vscale x 16 x i1> %x, <vscale x 16 x i1> %y, i32 %evl) {
  %1 = tail call <vscale x 16 x i1> @llvm.vp.merge.nxv16i1(<vscale x 16 x i1> %m, <vscale x 16 x i1> %x, <vscale x 16 x i1> %y, i32 %evl)
  ret <vscale x 16 x i1> %1
}

It crashes with llc -mtriple=riscv64 -mattr=+v with LLVM ERROR: Invalid size request on a scalable vector. because it tries to unroll it when legalizing vector ops. The DAG after type legalization looks like:

Type-legalized selection DAG: %bb.0 'f:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
      t2: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %0
      t4: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %1
      t6: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %2
        t8: i64,ch = CopyFromReg t0, Register:i64 %3
      t18: i64 = and t8, Constant:i64<4294967295>
    t12: nxv16i1 = vp_merge t2, t4, t6, t18
  t15: ch,glue = CopyToReg t0, Register:nxv16i1 $v0, t12
  t16: ch = RISCVISD::RET_GLUE t15, Register:nxv16i1 $v0, t15:1

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-12-18T11:31:32Z

@llvm/issue-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

After #101641, this vp merge began to be emitted from the loop vectorizer with EVL tail folding from SPEC CPU 2017:

define &lt;vscale x 16 x i1&gt; @<!-- -->f(&lt;vscale x 16 x i1&gt; %m, &lt;vscale x 16 x i1&gt; %x, &lt;vscale x 16 x i1&gt; %y, i32 %evl) {
  %1 = tail call &lt;vscale x 16 x i1&gt; @<!-- -->llvm.vp.merge.nxv16i1(&lt;vscale x 16 x i1&gt; %m, &lt;vscale x 16 x i1&gt; %x, &lt;vscale x 16 x i1&gt; %y, i32 %evl)
  ret &lt;vscale x 16 x i1&gt; %1
}

It crashes with llc -mtriple=riscv64 -mattr=+v with LLVM ERROR: Invalid size request on a scalable vector. because it tries to unroll it when legalizing vector ops. The DAG after type legalization looks like:

Type-legalized selection DAG: %bb.0 'f:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
      t2: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %0
      t4: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %1
      t6: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %2
        t8: i64,ch = CopyFromReg t0, Register:i64 %3
      t18: i64 = and t8, Constant:i64&lt;4294967295&gt;
    t12: nxv16i1 = vp_merge t2, t4, t6, t18
  t15: ch,glue = CopyToReg t0, Register:nxv16i1 $v0, t12
  t16: ch = RISCVISD::RET_GLUE t15, Register:nxv16i1 $v0, t15:1

fhahn · 2024-12-18T13:24:27Z

cc @Mel-Chen

lukel97 · 2024-12-18T14:01:37Z

I think we should be able to lower this in the RISC-V backend, and I think it makes sense to transform selects to VP nodes during EVL tail folding since it allows us to get rid of a mask user.

But I think this goes to show that VP intrinsics aren't as well supported as regular, non-predicated instructions in the RISC-V backend.

Now that the VLOptimizer pass is enabled by default, do we want to reconsider emitting VP intrinsics during EVL tail folding if they're not necessary? I.e. not selects/loads/stores, but binary ops/other intrinsics.

My gut feeling is that we'll be missing out on a lot of combines and optimisations that just aren't implemented for VP intrinsics/nodes.

In that case we should probably emit regular instructions where possible in the loop vectorizer, then teach DAGCombiner and friends to handle VP nodes first. Then we could revisit emitting VP intrinsics once the codegen is a bit better.

(This is somewhat speculation: It would be good to evaluate this first! But to do that we need to fix the assertion failures 🥲)

cc @Mel-Chen @arcbbb @LiqinWeng @preames @npanchen @michaelmaitland

topperc · 2024-12-18T16:19:54Z

We probably have a fix for this in our downstream. We've been using VP intrinsics in our vectorizer for a while. I'll see if I can find it.

fhahn · 2024-12-18T19:13:05Z

How can we get such issues surfaced upstream sooner? It sounds like it might be good to put a hold on future EVL patches in LoopVectorize until all issues with the current state are fixed.

preames · 2024-12-18T20:23:59Z

How can we get such issues surfaced upstream sooner? It sounds like it might be good to put a hold on future EVL patches in LoopVectorize until all issues with the current state are fixed.

Well, at least some of the current EVL patches are addressing functional issues, so pausing that would seem slightly self defeating. However, I do agree that we need to get the EVL stuff into a correct working state before continuing to work on optimization quality. This does seem to be happening, but it needs to be more of a focus. @lukel97's recent test runs (and reporting) is a great help here, but we clearly need to be doing more of this.

The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it almost 3 years ago when a crash was reported in our downstream. Fixes llvm#120405.

The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.

fhahn · 2024-12-19T13:22:17Z

Well, at least some of the current EVL patches are addressing functional issues, so pausing that would seem slightly self defeating. However, I do agree that we need to get the EVL stuff into a correct working state before continuing to work on optimization quality. This does seem to be happening, but it needs to be more of a focus. @lukel97's recent test runs (and reporting) is a great help here, but we clearly need to be doing more of this.

Yep I didn't mean any patches addressing issues, only patches generating more vp.* intrinsics

Mel-Chen · 2025-01-03T10:40:59Z

Apologize for not noticing the issue with i1 vp.merge in codegen.

This appears to be the i1 vp.merge generated by AnyOf.
While isLegalToVectorizeReduction does check isLegalElementTypeForRVV to filter out cases where the type is i1, AnyOf is later transformed into an i1 Or reduction.
The quickest way to address this would be to disable AnyOf in isLegalToVectorizeReduction. If we still want to enable AnyOf reduction, one option is to attempt implementing an in-loop AnyOf, and another is to try widening the reduction type to the smallest legal element type. @fhahn Do you have a preferred approach?

How can we get such issues surfaced upstream sooner?

I believe the reason this issue wasn't discovered earlier is that AnyOf lacks test coverage in the llvm test-suite. I’ve strengthened this in llvm/llvm-test-suite#195 — please take a look.

lukel97 · 2025-01-03T13:51:41Z

The quickest way to address this would be to disable AnyOf in isLegalToVectorizeReduction

After #120479 everything is working again, does it still need disabled? I understand that i1 vectors aren't supported in isLegalElementTypeForRVV, but presumably the loop vectorizer emits them in other places too e.g. predication or mask tail folding. So it doesn't sound too unreasonable to expect the backend to support them.

This patch adds runtime test case for vectorization of AnyOf reduction idiom: int32_t Rdx = -1; for (unsigned I = 0; I < TC; I++) { Rdx = A[I] > B[I] ? 3 : Rdx; } return Rdx; Improving test coverage, and please refer the issue llvm/llvm-project#120405.

lukel97 added the backend:RISC-V label Dec 18, 2024

topperc self-assigned this Dec 18, 2024

topperc mentioned this issue Dec 18, 2024

[RISCV] Custom legalize vp.merge for mask vectors. #120479

Merged

topperc closed this as completed in #120479 Dec 19, 2024

lukel97 mentioned this issue Dec 24, 2024

[VPlan] Use VPWidenIntrinsicRecipe to support binary and unary operations with EVL-vectorization #114205

Open

Mel-Chen mentioned this issue Jan 3, 2025

[SingleSource/Vectorizer] Add unit tests for AnyOf pattern. llvm/llvm-test-suite#195

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

lukel97 commented Dec 18, 2024

llvmbot commented Dec 18, 2024

fhahn commented Dec 18, 2024

lukel97 commented Dec 18, 2024

topperc commented Dec 18, 2024

fhahn commented Dec 18, 2024

preames commented Dec 18, 2024

fhahn commented Dec 19, 2024

Mel-Chen commented Jan 3, 2025

lukel97 commented Jan 3, 2025

[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

Comments

lukel97 commented Dec 18, 2024

llvmbot commented Dec 18, 2024

fhahn commented Dec 18, 2024

lukel97 commented Dec 18, 2024

topperc commented Dec 18, 2024

fhahn commented Dec 18, 2024

preames commented Dec 18, 2024

fhahn commented Dec 19, 2024

Mel-Chen commented Jan 3, 2025

lukel97 commented Jan 3, 2025