[GlobalOptimization] Fix a silent bug in DetatchElementwiseFromNamedOps pass #19356

zjgarvey · 2024-12-03T19:19:10Z

This moves match failure checks before modifying linalg ops, and loosens the check for identity map access to the output tensor.

Context:

Specific depthwise convolution ops were encountering numeric failures. See #18600 and #19339. I noticed that the bias was not affecting the output values, and tracked down where the bias was getting deleted.

The issue is that the pass DetatchElementwiseFromNamedOps was modifying the depthwise_conv op to use a zero-fill before checking for some match failures. This resulted in a partial application of the pattern where the original bias did not get added back to the modified linalg op result.

The depthwise conv ops were specifically failing to have an identity map for the output tensor access.

For example:

module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @torch_jit(%arg0: tensor<1x96x56x56xf32>, %arg1: tensor<96x1x7x7xf32>, %arg2: tensor<96xf32>) -> tensor<1x96x56x56xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %padded = tensor.pad %arg0 low[0, 0, 3, 3] high[0, 0, 3, 3] {
    ^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
      tensor.yield %cst : f32
    } : tensor<1x96x56x56xf32> to tensor<1x96x62x62xf32>
    %0 = tensor.empty() : tensor<1x96x56x56xf32>
    %broadcasted = linalg.broadcast ins(%arg2 : tensor<96xf32>) outs(%0 : tensor<1x96x56x56xf32>) dimensions = [0, 2, 3] 
    %collapsed = tensor.collapse_shape %arg1 [[0, 1], [2], [3]] : tensor<96x1x7x7xf32> into tensor<96x7x7xf32>
    %1 = linalg.depthwise_conv_2d_nchw_chw {dilations = dense<1> : vector<2xi64>, strides = dense<1> : vector<2xi64>} ins(%padded, %collapsed : tensor<1x96x62x62xf32>, tensor<96x7x7xf32>) outs(%broadcasted : tensor<1x96x56x56xf32>) -> tensor<1x96x56x56xf32>
    return %1 : tensor<1x96x56x56xf32>
  }
}

generalizes to

#map = affine_map<(d0, d1, d2, d3) -> (d1)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
#map2 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d3, d1 + d4, d2 + d5)>
#map3 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d3, d4, d5)>
#map4 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d3, d1, d2)>
module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @torch_jit(%arg0: tensor<1x96x56x56xf32>, %arg1: tensor<96x1x7x7xf32>, %arg2: tensor<96xf32>) -> tensor<1x96x56x56xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %padded = tensor.pad %arg0 low[0, 0, 3, 3] high[0, 0, 3, 3] {
    ^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
      tensor.yield %cst : f32
    } : tensor<1x96x56x56xf32> to tensor<1x96x62x62xf32>
    %0 = tensor.empty() : tensor<1x96x56x56xf32>
    %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<96xf32>) outs(%0 : tensor<1x96x56x56xf32>) {
    ^bb0(%in: f32, %out: f32):
      linalg.yield %in : f32
    } -> tensor<1x96x56x56xf32>
    %collapsed = tensor.collapse_shape %arg1 [[0, 1], [2], [3]] : tensor<96x1x7x7xf32> into tensor<96x7x7xf32>
    %2 = linalg.generic {indexing_maps = [#map2, #map3, #map4], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction"]} ins(%padded, %collapsed : tensor<1x96x62x62xf32>, tensor<96x7x7xf32>) outs(%1 : tensor<1x96x56x56xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %3 = arith.mulf %in, %in_0 : f32
      %4 = arith.addf %out, %3 : f32
      linalg.yield %4 : f32
    } -> tensor<1x96x56x56xf32>
    return %2 : tensor<1x96x56x56xf32>
  }
}

For some reason, the channel dim d3 appears after the spatial dims (d1 and d2) for this particular op.

…, and report possible match failures before rewriting Signed-off-by: zjgarvey <[email protected]>

Signed-off-by: zjgarvey <[email protected]>

hanhanW

Good catch, and thanks for the fix! I left a few style nits, and I have a question.

loosens the check for identity map access to the output tensor.

Why? Is there a test for it?

compiler/src/iree/compiler/GlobalOptimization/DetachElementwiseFromNamedOps.cpp

Signed-off-by: zjgarvey <[email protected]>

zjgarvey · 2024-12-04T03:21:18Z

Good catch, and thanks for the fix! I left a few style nits, and I have a question.

loosens the check for identity map access to the output tensor.

Why? Is there a test for it?

The op linalg.depthwise_conv_2d_nchw_chw has a non-identity indexing map for the output tensor, and this example is included in the lit test I added.

See the output tensor indexing map from the initial PR comment:

#map4 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d3, d1, d2)>

Did this answer your question? I also did verify the depthwise conv numerics are fixed with this change using the SHARK-TestSuite.

compiler/src/iree/compiler/GlobalOptimization/DetachElementwiseFromNamedOps.cpp

Signed-off-by: zjgarvey <[email protected]>

ScottTodd · 2024-12-06T15:51:22Z

Nice, this fixed mobilenet through onnx too: https://github.com/iree-org/iree/actions/runs/12166990284/job/33934898182#step:8:381

XPASS iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_mobilenet

(was xfail before this PR/commit: https://github.com/iree-org/iree/actions/runs/12179658638/job/33972710377#step:8:123)

Flagged by this nightly CI run: https://github.com/iree-org/iree-test-suites/actions/runs/12201314112. Fixed by iree-org/iree#19356. I linked some logs at iree-org/iree#19356 (comment).

zjgarvey added 2 commits December 3, 2024 13:06

Allow for non-identity iterator maps in DetachElementwiseFromNamedOps…

53006b5

…, and report possible match failures before rewriting Signed-off-by: zjgarvey <[email protected]>

Verify outputMap uses AffineDimExpr; add lit test

a2c6a77

Signed-off-by: zjgarvey <[email protected]>

zjgarvey marked this pull request as ready for review December 3, 2024 22:35

zjgarvey requested a review from hanhanW as a code owner December 3, 2024 22:35

zjgarvey requested review from MaheshRavishankar, qedawkins and IanWood1 and removed request for hanhanW December 3, 2024 22:35

This was linked to issues Dec 3, 2024

Incorrect Numerics for a f32 Depthwise Conv Op #18600

Closed

[numeric] Numeric failures with Conv operator #19339

Closed

hanhanW reviewed Dec 3, 2024

View reviewed changes

IanWood1 reviewed Dec 4, 2024

View reviewed changes

compiler/src/iree/compiler/GlobalOptimization/DetachElementwiseFromNamedOps.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/GlobalOptimization/DetachElementwiseFromNamedOps.cpp Show resolved Hide resolved

Address comments

08ae590

Signed-off-by: zjgarvey <[email protected]>

zjgarvey requested a review from IanWood1 December 4, 2024 18:01

IanWood1 approved these changes Dec 4, 2024

View reviewed changes

compiler/src/iree/compiler/GlobalOptimization/DetachElementwiseFromNamedOps.cpp Outdated Show resolved Hide resolved

make suggested change

abbf478

Signed-off-by: zjgarvey <[email protected]>

zjgarvey merged commit d48071d into iree-org:main Dec 5, 2024
38 checks passed

ScottTodd mentioned this pull request Dec 6, 2024

Mark onnx model test_mobilenet as passing. iree-org/iree-test-suites#54

Merged

ScottTodd mentioned this pull request Jan 7, 2025

Bump pinned IREE version to 20241206. nod-ai/shark-ai#778

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GlobalOptimization] Fix a silent bug in DetatchElementwiseFromNamedOps pass #19356

[GlobalOptimization] Fix a silent bug in DetatchElementwiseFromNamedOps pass #19356

zjgarvey commented Dec 3, 2024

hanhanW left a comment

zjgarvey commented Dec 4, 2024

ScottTodd commented Dec 6, 2024

[GlobalOptimization] Fix a silent bug in DetatchElementwiseFromNamedOps pass #19356

[GlobalOptimization] Fix a silent bug in DetatchElementwiseFromNamedOps pass #19356

Conversation

zjgarvey commented Dec 3, 2024

Context:

hanhanW left a comment

Choose a reason for hiding this comment

zjgarvey commented Dec 4, 2024

ScottTodd commented Dec 6, 2024