[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

krzysz00 · 2024-12-12T01:08:10Z

Note: This PR is stacked on top of #19372, and so looks bigger than it is. The relevant changes are in the last commit.

Add an option to -iree-util-optimize-int-arithmetic to have it perform computations in i32 where possible, which is enabled when optimizing arithmetic for GPU codegen. This allows LLVM co correctly conclude that various computations don't need to be done at full 64-bit precision, thus saving registers and instructions. (LLVM has some rewrites for this, but they're, for example, gated on only having one use of the potentially-truncated value, which means that shared math stays in an over-wide data type).

compiler/src/iree/compiler/Dialect/Util/Transforms/Passes.td

compiler/src/iree/compiler/Codegen/Transforms/Transforms.cpp

compiler/src/iree/compiler/Dialect/Util/Transforms/Passes.td

qedawkins · 2025-01-07T01:36:04Z

compiler/src/iree/compiler/Dialect/Util/Transforms/OptimizeIntArithmetic.cpp

+    ArrayRef<ArrayAttr> castAssumptions = ArrayRef(
+        static_cast<const ArrayAttr *>(assumptions.data()), assumptions.size());
+    auto newOp = rewriter.create<Util::AssumeIntOp>(op.getLoc(), newArgs,
+                                                    castAssumptions);


Can you not just do rewriter.create<Util::AssumeIntOp>(op.getLoc(), newArgs, op.getAssumptions());

If not we can either just clone with new operands or add a new builder.

There's a builder missing - and no really room to insert one

wdym there is no room to add a new builder? Can't we add a builder that take ArrayRef<Value> and ArrayAttr as operands?

... Yeah, might not be a bad idea, could add one.

compiler/src/iree/compiler/Dialect/Util/Transforms/OptimizeIntArithmetic.cpp

benvanik reviewed Dec 12, 2024

View reviewed changes

compiler/src/iree/compiler/Dialect/Util/Transforms/Passes.td Outdated Show resolved Hide resolved

krzysz00 force-pushed the index-narrowing branch 3 times, most recently from be116ef to ffa5fc4 Compare December 13, 2024 20:21

krzysz00 marked this pull request as ready for review December 17, 2024 17:28

krzysz00 requested review from antiagainst, MaheshRavishankar, kuhar, qedawkins and Groverkss as code owners December 17, 2024 17:28

krzysz00 force-pushed the index-narrowing branch 2 times, most recently from 01736ba to a367857 Compare January 6, 2025 21:47

qedawkins reviewed Jan 7, 2025

View reviewed changes

krzysz00 added 6 commits January 7, 2025 19:27

Let integer range optimizations narrow to i32

2f7730a

Update tests

64bb39f

Add pattern for narrowing assume.int with i32 inputs

3986602

Remove stray debug print

299522e

Narrow for loops too

acdf14d

Review comments

09d7c2d

krzysz00 force-pushed the index-narrowing branch from a367857 to 09d7c2d Compare January 7, 2025 22:26

I incidentally make a bunch of averagepool tests pass

ecf0ccb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

krzysz00 commented Dec 12, 2024

qedawkins Jan 7, 2025

krzysz00 Jan 7, 2025

qedawkins Jan 8, 2025

krzysz00 Jan 8, 2025

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

Are you sure you want to change the base?

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

Conversation

krzysz00 commented Dec 12, 2024

qedawkins Jan 7, 2025

Choose a reason for hiding this comment

krzysz00 Jan 7, 2025

Choose a reason for hiding this comment

qedawkins Jan 8, 2025

Choose a reason for hiding this comment

krzysz00 Jan 8, 2025

Choose a reason for hiding this comment