Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grok-1 model compilation slows with performance optimizing flags #19593

Open
archana-ramalingam opened this issue Jan 3, 2025 · 2 comments
Open
Labels
bug 🐞 Something isn't working

Comments

@archana-ramalingam
Copy link
Contributor

archana-ramalingam commented Jan 3, 2025

What happened?

Grok-1 model compilation is 7 mins slower and the vmfb file is 1 MB larger when using the following flags:
--iree-dispatch-creation-enable-aggressive-fusion=true --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))"

With flags:
vmfb size: 3 MB
compile time: 8 mins

Without flags:
vmfb size: 2 MB
compile time: under 1 min

Steps to reproduce your issue

  1. Fetch latest grok mlir from here: https://github.com/nod-ai/shark-ai/blob/grok-2/grok-1-q4_1.mlir
  2. On latest IREE build, run the following command and compile takes ~8 mins with the flags vs without them (~1 min):

../iree-build/tools/iree-compile grok-1-q4_1.mlir --iree-hal-target-backends=rocm --iree-hip-target=gfx942 -o grok-1-q4_1.vmfb --iree-dispatch-creation-enable-aggressive-fusion=true --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))"

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

@archana-ramalingam archana-ramalingam added the bug 🐞 Something isn't working label Jan 3, 2025
@IanWood1
Copy link
Contributor

IanWood1 commented Jan 3, 2025

I can take a look at this

@IanWood1
Copy link
Contributor

IanWood1 commented Jan 3, 2025

It seems to be coming from a single dispatch: https://gist.githubusercontent.com/IanWood1/e1cedac8fd6692ca2d25c108f0aff890/raw/4abd53eee6a8e60e231cd276445ddfef6d8d8110/grok-slow-dispatch.mlir

It can be reproduced with iree-compile grok-slow-dispatch. The majority of the time is spent inside LLVM's SLPVectorizerPass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants