Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalOptimization] Fix a silent bug in DetatchElementwiseFromNamedOps pass #19356

Merged

Conversation

zjgarvey
Copy link
Contributor

@zjgarvey zjgarvey commented Dec 3, 2024

This moves match failure checks before modifying linalg ops, and loosens the check for identity map access to the output tensor.

Context:

Specific depthwise convolution ops were encountering numeric failures. See #18600 and #19339. I noticed that the bias was not affecting the output values, and tracked down where the bias was getting deleted.

The issue is that the pass DetatchElementwiseFromNamedOps was modifying the depthwise_conv op to use a zero-fill before checking for some match failures. This resulted in a partial application of the pattern where the original bias did not get added back to the modified linalg op result.

The depthwise conv ops were specifically failing to have an identity map for the output tensor access.

For example:

module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @torch_jit(%arg0: tensor<1x96x56x56xf32>, %arg1: tensor<96x1x7x7xf32>, %arg2: tensor<96xf32>) -> tensor<1x96x56x56xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %padded = tensor.pad %arg0 low[0, 0, 3, 3] high[0, 0, 3, 3] {
    ^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
      tensor.yield %cst : f32
    } : tensor<1x96x56x56xf32> to tensor<1x96x62x62xf32>
    %0 = tensor.empty() : tensor<1x96x56x56xf32>
    %broadcasted = linalg.broadcast ins(%arg2 : tensor<96xf32>) outs(%0 : tensor<1x96x56x56xf32>) dimensions = [0, 2, 3] 
    %collapsed = tensor.collapse_shape %arg1 [[0, 1], [2], [3]] : tensor<96x1x7x7xf32> into tensor<96x7x7xf32>
    %1 = linalg.depthwise_conv_2d_nchw_chw {dilations = dense<1> : vector<2xi64>, strides = dense<1> : vector<2xi64>} ins(%padded, %collapsed : tensor<1x96x62x62xf32>, tensor<96x7x7xf32>) outs(%broadcasted : tensor<1x96x56x56xf32>) -> tensor<1x96x56x56xf32>
    return %1 : tensor<1x96x56x56xf32>
  }
}

generalizes to

#map = affine_map<(d0, d1, d2, d3) -> (d1)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
#map2 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d3, d1 + d4, d2 + d5)>
#map3 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d3, d4, d5)>
#map4 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d3, d1, d2)>
module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @torch_jit(%arg0: tensor<1x96x56x56xf32>, %arg1: tensor<96x1x7x7xf32>, %arg2: tensor<96xf32>) -> tensor<1x96x56x56xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %padded = tensor.pad %arg0 low[0, 0, 3, 3] high[0, 0, 3, 3] {
    ^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
      tensor.yield %cst : f32
    } : tensor<1x96x56x56xf32> to tensor<1x96x62x62xf32>
    %0 = tensor.empty() : tensor<1x96x56x56xf32>
    %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<96xf32>) outs(%0 : tensor<1x96x56x56xf32>) {
    ^bb0(%in: f32, %out: f32):
      linalg.yield %in : f32
    } -> tensor<1x96x56x56xf32>
    %collapsed = tensor.collapse_shape %arg1 [[0, 1], [2], [3]] : tensor<96x1x7x7xf32> into tensor<96x7x7xf32>
    %2 = linalg.generic {indexing_maps = [#map2, #map3, #map4], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction"]} ins(%padded, %collapsed : tensor<1x96x62x62xf32>, tensor<96x7x7xf32>) outs(%1 : tensor<1x96x56x56xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %3 = arith.mulf %in, %in_0 : f32
      %4 = arith.addf %out, %3 : f32
      linalg.yield %4 : f32
    } -> tensor<1x96x56x56xf32>
    return %2 : tensor<1x96x56x56xf32>
  }
}

For some reason, the channel dim d3 appears after the spatial dims (d1 and d2) for this particular op.

…, and report possible match failures before rewriting

Signed-off-by: zjgarvey <[email protected]>
@zjgarvey zjgarvey marked this pull request as ready for review December 3, 2024 22:35
@zjgarvey zjgarvey requested a review from hanhanW as a code owner December 3, 2024 22:35
@zjgarvey zjgarvey requested review from MaheshRavishankar, qedawkins and IanWood1 and removed request for hanhanW December 3, 2024 22:35
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and thanks for the fix! I left a few style nits, and I have a question.

loosens the check for identity map access to the output tensor.

Why? Is there a test for it?

Signed-off-by: zjgarvey <[email protected]>
@zjgarvey
Copy link
Contributor Author

zjgarvey commented Dec 4, 2024

Good catch, and thanks for the fix! I left a few style nits, and I have a question.

loosens the check for identity map access to the output tensor.

Why? Is there a test for it?

The op linalg.depthwise_conv_2d_nchw_chw has a non-identity indexing map for the output tensor, and this example is included in the lit test I added.

See the output tensor indexing map from the initial PR comment:

#map4 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d3, d1, d2)>

Did this answer your question? I also did verify the depthwise conv numerics are fixed with this change using the SHARK-TestSuite.

@zjgarvey zjgarvey requested a review from IanWood1 December 4, 2024 18:01
Signed-off-by: zjgarvey <[email protected]>
@zjgarvey zjgarvey merged commit d48071d into iree-org:main Dec 5, 2024
38 checks passed
@ScottTodd
Copy link
Member

Nice, this fixed mobilenet through onnx too: https://github.com/iree-org/iree/actions/runs/12166990284/job/33934898182#step:8:381

XPASS iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_mobilenet

(was xfail before this PR/commit: https://github.com/iree-org/iree/actions/runs/12179658638/job/33972710377#step:8:123)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[numeric] Numeric failures with Conv operator Incorrect Numerics for a f32 Depthwise Conv Op
4 participants