-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about the number of Compute tiles? #1964
Comments
Hi! If you use the device type |
Hi @jgmelber, Thanks. I try it, but get the error shown in the following figure. Could you help me with it? Is there any example I can follow? Thanks a lot! |
@ngdymx can you share the entire design file, or at least the |
Hi @jgmelber, Sorry for the late reply. I write a passthrough kernel to test it. It works well when I set the dev is Case 1: dev = AIEDevice.npu1_1col
/*************************/
ShimTile = tile(0, 0)
ComputeTile = tile(0, 2)
# To/from AIE-array data movement
@runtime_sequence(tensor_ty, tensor_ty)
def sequence(A, C):
npu_dma_memcpy_nd(metadata=of_in, bd_id=1, mem=A, sizes=[1, 1, 1, N], issue_token=True)
npu_dma_memcpy_nd(metadata=of_out, bd_id=0, mem=C, sizes=[1, 1, 1, N], issue_token=True)
dma_wait(of_in, of_out) Case 2: dev = AIEDevice.npu1
/*************************/
# Tile declarations
ShimTile = tile(1, 0)
ComputeTile = tile(1, 2) Here is the project: #include <aie_api/aie.hpp>
template <typename T_in, typename T_out>
void passthrough_aie(const T_in *__restrict in0, T_out *__restrict out, const int N) {
for (int i = 0; i < N; i++) {
out[i] = in0[i];
}
}
extern "C" {
void passThrough(const int32_t *__restrict in0, int32_t *__restrict out, const int N) {
passthrough_aie<int32_t, int32_t>(in0, out, N);
}
} Then the working mlir code: import numpy as np
from aie.dialects.aie import *
from aie.dialects.aiex import *
from aie.helpers.dialects.ext.scf import _for as range_
from aie.extras.context import mlir_mod_ctx
import sys
N = 64
dev = AIEDevice.npu1_1col
def pass_loops():
with mlir_mod_ctx() as ctx:
@device(dev)
def device_body():
tensor_ty = np.ndarray[(N,), np.dtype[np.int32]]
# Tile declarations
ShimTile = tile(0, 0)
ComputeTile = tile(0, 2)
# AIE-array data movement with object fifos
of_in = object_fifo("in", ShimTile, ComputeTile, 2, tensor_ty)
of_out = object_fifo("out", ComputeTile, ShimTile, 2, tensor_ty)
# AIE Core Function declarations
passthrough = external_func(
"passThrough", inputs=[tensor_ty, tensor_ty, np.int32]
)
# Set up compute tiles
@core(ComputeTile, "passThrough.o")
def core_body():
for _ in range_(sys.maxsize):
elemIn = of_in.acquire(ObjectFifoPort.Consume, 1)
elemOut = of_out.acquire(ObjectFifoPort.Produce, 1)
passthrough(elemIn, elemOut, N)
of_out.release(ObjectFifoPort.Produce, 1)
of_in.release(ObjectFifoPort.Consume, 1)
# To/from AIE-array data movement
@runtime_sequence(tensor_ty, tensor_ty)
def sequence(A, C):
npu_dma_memcpy_nd(
metadata=of_in, bd_id=1, mem=A, sizes=[1, 1, 1, N], issue_token=True
)
npu_dma_memcpy_nd(metadata=of_out, bd_id=0, mem=C, sizes=[1, 1, 1, N], issue_token=True)
dma_wait(of_in, of_out)
print(ctx.module)
pass_loops() The host code: #include <cstdint>
#include <fstream>
#include <iostream>
#include <sstream>
#include "test_utils.h"
#include "xrt/xrt_bo.h"
#ifndef DATATYPES_USING_DEFINED
#define DATATYPES_USING_DEFINED
// ------------------------------------------------------
// Configure this to match your buffer data type
// ------------------------------------------------------
using DATATYPE = std::int32_t;
#endif
#define PASSTHROUGH_SIZE 64
namespace po = boost::program_options;
int main(int argc, const char *argv[]) {
// Program arguments parsing
po::options_description desc("Allowed options");
po::variables_map vm;
test_utils::add_default_options(desc);
test_utils::parse_options(argc, argv, desc, vm);
int verbosity = vm["verbosity"].as<int>();
int trace_size = vm["trace_sz"].as<int>();
std::cout << std::endl << "Running...\n";
// Load instruction sequence
std::vector<uint32_t> instr_v =
test_utils::load_instr_sequence(vm["instr"].as<std::string>());
if (verbosity >= 1)
std::cout << "Sequence instr count: " << instr_v.size() << "\n";
// Start the XRT context and load the kernel
xrt::device device;
xrt::kernel kernel;
test_utils::init_xrt_load_kernel(device, kernel, verbosity,
vm["xclbin"].as<std::string>(),
vm["kernel"].as<std::string>());
// set up the buffer objects
auto bo_instr = xrt::bo(device, instr_v.size() * sizeof(int),
XCL_BO_FLAGS_CACHEABLE, kernel.group_id(1));
auto bo_inA = xrt::bo(device, PASSTHROUGH_SIZE * sizeof(DATATYPE),
XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(3));
auto bo_out =
xrt::bo(device, PASSTHROUGH_SIZE * sizeof(DATATYPE) + trace_size,
XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(4));
if (verbosity >= 1)
std::cout << "Writing data into buffer objects.\n";
// Copy instruction stream to xrt buffer object
void *bufInstr = bo_instr.map<void *>();
memcpy(bufInstr, instr_v.data(), instr_v.size() * sizeof(int));
// Initialize buffer bo_inA
DATATYPE *bufInA = bo_inA.map<DATATYPE *>();
printf("Input:\n");
for (int i = 0; i < PASSTHROUGH_SIZE; i++){
bufInA[i] = i;
}
for (int i = 0; i < PASSTHROUGH_SIZE; i++){
printf("%d", bufInA[i]);
}
// Zero out buffer bo_out
DATATYPE *bufOut = bo_out.map<DATATYPE *>();
memset(bufOut, 0, PASSTHROUGH_SIZE * sizeof(DATATYPE) + trace_size);
// sync host to device memories
bo_instr.sync(XCL_BO_SYNC_BO_TO_DEVICE);
bo_inA.sync(XCL_BO_SYNC_BO_TO_DEVICE);
bo_out.sync(XCL_BO_SYNC_BO_TO_DEVICE);
printf("\n");
// Execute the kernel and wait to finish
std::cout << "Running Kernel.\n";
unsigned int opcode = 3;
auto run = kernel(opcode, bo_instr, instr_v.size(), bo_inA, bo_out);
run.wait();
// Sync device to host memories
bo_out.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
// Compare out to in
int errors = 0;
for (int i = 0; i < PASSTHROUGH_SIZE; i++) {
if (bufOut[i] != bufInA[i])
errors++;
}
printf("Out:\n");
for (int i = 0; i < PASSTHROUGH_SIZE; i++) {
printf("%d", bufOut[i]);
}
if (trace_size > 0) {
test_utils::write_out_trace(((char *)bufOut) +
(PASSTHROUGH_SIZE * sizeof(DATATYPE)),
trace_size, vm["trace_file"].as<std::string>());
}
// Print Pass/Fail result of our test
if (!errors) {
std::cout << std::endl << "PASS!" << std::endl << std::endl;
return 0;
} else {
std::cout << std::endl
<< errors << " mismatches." << std::endl
<< std::endl;
std::cout << std::endl << "fail." << std::endl << std::endl;
return 1;
}
} |
Thanks for the code, we are looking into it. |
Thanks! |
Hello! Just to follow up on this: after some investigating it turned out that there were two related issues here. One PR has been merged into the main branch, the other is well on its way but requires some last cleanups, which will happen after the holidays. Once that PR is in, your design should work! |
Hi, Great, thank you! |
Hi team,
I read the tutorial and got that the device is present in HawkPoint (e.g., 8040HS) SOCs. has 5 Columns and 6 Rows, as shown below:
However, I cannot control the leftmost column CTS. I want to confirm whether we cannot access it. Is there any way to access and control that column? Thanks a lot!
Additionally, the following is noted about device partitions:
The text was updated successfully, but these errors were encountered: