Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

Closed
lukel97 opened this issue Dec 18, 2024 · 9 comments · Fixed by #120479
Closed

[RISCV] vp.merge of nxv16i1 tries to be unrolled #120405

lukel97 opened this issue Dec 18, 2024 · 9 comments · Fixed by #120479
Assignees

Comments

@lukel97
Copy link
Contributor

lukel97 commented Dec 18, 2024

After #101641, this vp merge began to be emitted from the loop vectorizer with EVL tail folding from SPEC CPU 2017:

define <vscale x 16 x i1> @f(<vscale x 16 x i1> %m, <vscale x 16 x i1> %x, <vscale x 16 x i1> %y, i32 %evl) {
  %1 = tail call <vscale x 16 x i1> @llvm.vp.merge.nxv16i1(<vscale x 16 x i1> %m, <vscale x 16 x i1> %x, <vscale x 16 x i1> %y, i32 %evl)
  ret <vscale x 16 x i1> %1
}

It crashes with llc -mtriple=riscv64 -mattr=+v with LLVM ERROR: Invalid size request on a scalable vector. because it tries to unroll it when legalizing vector ops. The DAG after type legalization looks like:

Type-legalized selection DAG: %bb.0 'f:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
      t2: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %0
      t4: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %1
      t6: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %2
        t8: i64,ch = CopyFromReg t0, Register:i64 %3
      t18: i64 = and t8, Constant:i64<4294967295>
    t12: nxv16i1 = vp_merge t2, t4, t6, t18
  t15: ch,glue = CopyToReg t0, Register:nxv16i1 $v0, t12
  t16: ch = RISCVISD::RET_GLUE t15, Register:nxv16i1 $v0, t15:1
@llvmbot
Copy link
Member

llvmbot commented Dec 18, 2024

@llvm/issue-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

After #101641, this vp merge began to be emitted from the loop vectorizer with EVL tail folding from SPEC CPU 2017:
define &lt;vscale x 16 x i1&gt; @<!-- -->f(&lt;vscale x 16 x i1&gt; %m, &lt;vscale x 16 x i1&gt; %x, &lt;vscale x 16 x i1&gt; %y, i32 %evl) {
  %1 = tail call &lt;vscale x 16 x i1&gt; @<!-- -->llvm.vp.merge.nxv16i1(&lt;vscale x 16 x i1&gt; %m, &lt;vscale x 16 x i1&gt; %x, &lt;vscale x 16 x i1&gt; %y, i32 %evl)
  ret &lt;vscale x 16 x i1&gt; %1
}

It crashes with llc -mtriple=riscv64 -mattr=+v with LLVM ERROR: Invalid size request on a scalable vector. because it tries to unroll it when legalizing vector ops. The DAG after type legalization looks like:

Type-legalized selection DAG: %bb.0 'f:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
      t2: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %0
      t4: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %1
      t6: nxv16i1,ch = CopyFromReg t0, Register:nxv16i1 %2
        t8: i64,ch = CopyFromReg t0, Register:i64 %3
      t18: i64 = and t8, Constant:i64&lt;4294967295&gt;
    t12: nxv16i1 = vp_merge t2, t4, t6, t18
  t15: ch,glue = CopyToReg t0, Register:nxv16i1 $v0, t12
  t16: ch = RISCVISD::RET_GLUE t15, Register:nxv16i1 $v0, t15:1

@fhahn
Copy link
Contributor

fhahn commented Dec 18, 2024

cc @Mel-Chen

@lukel97
Copy link
Contributor Author

lukel97 commented Dec 18, 2024

I think we should be able to lower this in the RISC-V backend, and I think it makes sense to transform selects to VP nodes during EVL tail folding since it allows us to get rid of a mask user.

But I think this goes to show that VP intrinsics aren't as well supported as regular, non-predicated instructions in the RISC-V backend.

Now that the VLOptimizer pass is enabled by default, do we want to reconsider emitting VP intrinsics during EVL tail folding if they're not necessary? I.e. not selects/loads/stores, but binary ops/other intrinsics.

My gut feeling is that we'll be missing out on a lot of combines and optimisations that just aren't implemented for VP intrinsics/nodes.

In that case we should probably emit regular instructions where possible in the loop vectorizer, then teach DAGCombiner and friends to handle VP nodes first. Then we could revisit emitting VP intrinsics once the codegen is a bit better.

(This is somewhat speculation: It would be good to evaluate this first! But to do that we need to fix the assertion failures 🥲)

cc @Mel-Chen @arcbbb @LiqinWeng @preames @npanchen @michaelmaitland

@topperc
Copy link
Collaborator

topperc commented Dec 18, 2024

We probably have a fix for this in our downstream. We've been using VP intrinsics in our vectorizer for a while. I'll see if I can find it.

@fhahn
Copy link
Contributor

fhahn commented Dec 18, 2024

How can we get such issues surfaced upstream sooner? It sounds like it might be good to put a hold on future EVL patches in LoopVectorize until all issues with the current state are fixed.

@preames
Copy link
Collaborator

preames commented Dec 18, 2024

How can we get such issues surfaced upstream sooner? It sounds like it might be good to put a hold on future EVL patches in LoopVectorize until all issues with the current state are fixed.

Well, at least some of the current EVL patches are addressing functional issues, so pausing that would seem slightly self defeating. However, I do agree that we need to get the EVL stuff into a correct working state before continuing to work on optimization quality. This does seem to be happening, but it needs to be more of a focus. @lukel97's recent test runs (and reporting) is a great help here, but we clearly need to be doing more of this.

@topperc topperc self-assigned this Dec 18, 2024
topperc added a commit to topperc/llvm-project that referenced this issue Dec 18, 2024
The default legalization uses vmslt with a vector of XLen to compute
a mask. This doesn't work if the type isn't legal. For fixed vectors
it will scalarize. For scalable vectors it crashes the compiler.

This patch uses an alternate strategy that promotes the i1 vector
to an i8 vector and does the merge. I don't claim this to be the
best lowering. I wrote it almost 3 years ago when a crash was
reported in our downstream.

Fixes llvm#120405.
topperc added a commit to topperc/llvm-project that referenced this issue Dec 18, 2024
The default legalization uses vmslt with a vector of XLen to compute
a mask. This doesn't work if the type isn't legal. For fixed vectors
it will scalarize. For scalable vectors it crashes the compiler.

This patch uses an alternate strategy that promotes the i1 vector
to an i8 vector and does the merge. I don't claim this to be the
best lowering. I wrote it almost 3 years ago when a crash was
reported in our downstream.

Fixes llvm#120405.
topperc added a commit to topperc/llvm-project that referenced this issue Dec 19, 2024
The default legalization uses vmslt with a vector of XLen to compute
a mask. This doesn't work if the type isn't legal. For fixed vectors
it will scalarize. For scalable vectors it crashes the compiler.

This patch uses an alternate strategy that promotes the i1 vector
to an i8 vector and does the merge. I don't claim this to be the
best lowering. I wrote it almost 3 years ago when a crash was
reported in our downstream.

Fixes llvm#120405.
topperc added a commit that referenced this issue Dec 19, 2024
The default legalization uses vmslt with a vector of XLen to compute a
mask. This doesn't work if the type isn't legal. For fixed vectors it
will scalarize. For scalable vectors it crashes the compiler.

This patch uses an alternate strategy that promotes the i1 vector to an
i8 vector and does the merge. I don't claim this to be the best
lowering. I wrote it quickly almost 3 years ago when a crash was
reported in our downstream.

Fixes #120405.
@fhahn
Copy link
Contributor

fhahn commented Dec 19, 2024

Well, at least some of the current EVL patches are addressing functional issues, so pausing that would seem slightly self defeating. However, I do agree that we need to get the EVL stuff into a correct working state before continuing to work on optimization quality. This does seem to be happening, but it needs to be more of a focus. @lukel97's recent test runs (and reporting) is a great help here, but we clearly need to be doing more of this.

Yep I didn't mean any patches addressing issues, only patches generating more vp.* intrinsics

@Mel-Chen
Copy link
Contributor

Mel-Chen commented Jan 3, 2025

Apologize for not noticing the issue with i1 vp.merge in codegen.

This appears to be the i1 vp.merge generated by AnyOf.
While isLegalToVectorizeReduction does check isLegalElementTypeForRVV to filter out cases where the type is i1, AnyOf is later transformed into an i1 Or reduction.
The quickest way to address this would be to disable AnyOf in isLegalToVectorizeReduction. If we still want to enable AnyOf reduction, one option is to attempt implementing an in-loop AnyOf, and another is to try widening the reduction type to the smallest legal element type. @fhahn Do you have a preferred approach?

How can we get such issues surfaced upstream sooner?

I believe the reason this issue wasn't discovered earlier is that AnyOf lacks test coverage in the llvm test-suite. I’ve strengthened this in llvm/llvm-test-suite#195 — please take a look.

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 3, 2025

The quickest way to address this would be to disable AnyOf in isLegalToVectorizeReduction

After #120479 everything is working again, does it still need disabled? I understand that i1 vectors aren't supported in isLegalElementTypeForRVV, but presumably the loop vectorizer emits them in other places too e.g. predication or mask tail folding. So it doesn't sound too unreasonable to expect the backend to support them.

Mel-Chen added a commit to llvm/llvm-test-suite that referenced this issue Jan 13, 2025
This patch adds runtime test case for vectorization of AnyOf reduction idiom:
  int32_t Rdx = -1;
  for (unsigned I = 0; I < TC; I++) {
    Rdx = A[I] > B[I] ? 3 : Rdx;
  }
  return Rdx;

Improving test coverage, and please refer the issue llvm/llvm-project#120405.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants