-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NpuSyncOp generation to AIEDmaToNpu #1114
base: main
Are you sure you want to change the base?
Conversation
Coverage ReportCreated: 2024-06-04 22:05Click here for information about interpreting this report.
Generated by llvm-cov -- llvm version 14.0.0 |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
for (auto dma : dmas) { | ||
if (auto infoOp = | ||
getAllocOpForSymbol(shimDmaAllocOps, dma.getMetadata())) { | ||
if (infoOp->getChannelDir() == AIE::DMAChannelDir::S2MM) { | ||
// Found dma op copying results to host | ||
OpBuilder builder(dma); | ||
auto col = builder.getI32IntegerAttr(infoOp->getCol()); | ||
auto row = builder.getI32IntegerAttr(0); | ||
auto dir = builder.getI32IntegerAttr(0); | ||
auto chan = builder.getI32IntegerAttr(infoOp->getChannelIndex()); | ||
auto col_num = builder.getI32IntegerAttr(1); | ||
auto row_num = builder.getI32IntegerAttr(1); | ||
builder.setInsertionPoint(returnOp); | ||
builder.create<AIEX::NpuSyncOp>(dma->getLoc(), col, row, dir, chan, | ||
col_num, row_num); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to unconditionally add a sync to every outgoing memcpy. For example, what if we are collecting N output tiles at the shim and only need to sync at the end? The N-1 extraneous syncs will have a performance penalty vs. the single (manually inserted) sync at the end.
@@ -59,7 +59,6 @@ def sequence(A, B, C): | |||
npu_dma_memcpy_nd( | |||
metadata="in", bd_id=1, mem=A, sizes=[1, K, M, 1], strides=[1, 1, K] | |||
) | |||
npu_sync(column=0, row=0, direction=0, channel=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that the tests keep the sync explicit, but using aiex.npu.dma_wait
instead of aiex.npu.sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that the tests keep the sync explicit, but using
aiex.npu.dma_wait
instead ofaiex.npu.sync
Should we close this PR in favor of #1791 ? Or is there still a desire to insert the sync/wait automatically?
Designs that use objectfifos can rely on the
ShimDMAAllocationOps
generated by the objectfifo lowering to produce the correspondingNpuSyncOps
. Other designs will still require that the sync be added explicitly.This PR works in relation with the MLIR-AIR channel-to-objectfifo lowering to ensure that the
NpuSyncOps
are added at the MIR-AIE level, after the mapping decisions done by the objectififo lowering.