Refactors trace operations to be more self-contained, separates front…

…end/trace tensors more cleanly - Refactors Trace operation so that it reports how many outputs it generates instead of requiring the caller to know. The trace op is now also responsible for creating its own output trace tensors. Additionally, `build`/`build_internal` have been removed, meaning the trace does *not* create frontend tensors anymore. Frontend tensors no longer create trace tensors directly but instead only interface with ops and wrap their outputs as needed. - Consolidates and renames some frontend Tensor constructors to better reflect their purpose. For example, `create_directly` -> `fast_init`. - Temporarily removes the "how to add ops" guide. A new version of this will be written once we have switched to the TRT dialect, which will signficantly affect how ops are added.
NVIDIA · Feb 3, 2025 · ecb8238 · ecb8238
1 parent 6ece9ed
commit ecb8238
Show file tree

Hide file tree

Showing 50 changed files with 262 additions and 638 deletions.
diff --git a/tripy/CONTRIBUTING.md b/tripy/CONTRIBUTING.md
@@ -74,9 +74,6 @@ We've written developer guides to help you understand the codebase:
     [architecture](https://nvidia.github.io/TensorRT-Incubator/post0_developer_guides/architecture.html)
     documentation.
 
-- If you need to add a new operation, refer to
-    [this guide](https://nvidia.github.io/TensorRT-Incubator/post0_developer_guides/how-to-add-new-ops.html).
-
 
 ### Tests
 

diff --git a/tripy/docs/post0_developer_guides/how-to-add-new-ops.md b/tripy/docs/post0_developer_guides/how-to-add-new-ops.md
diff --git a/tripy/notebooks/resnet50.ipynb b/tripy/notebooks/resnet50.ipynb
@@ -23,7 +23,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!python3 -m pip install nvtripy -f https://nvidia.github.io/TensorRT-Incubator/packages.html"
+    "%pip install nvtripy -f https://nvidia.github.io/TensorRT-Incubator/packages.html"
    ]
   },
   {
@@ -39,7 +39,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install \"datasets==2.18.0\" \"matplotlib>=3.9.1\" \"pillow>=9.4.0\" \"transformers==4.46.2\" \"torch>=2.3.1\""
+    "%pip install \"datasets==2.18.0\" \"matplotlib>=3.9.1\" \"pillow>=9.4.0\" \"transformers==4.46.2\" \"torch>=2.3.1\""
    ]
   },
   {

diff --git a/tripy/nvtripy/backend/api/executable.py b/tripy/nvtripy/backend/api/executable.py
@@ -229,7 +229,7 @@ def add(a, b):
 
             raise
 
-        output_tensors = [Tensor.create_directly(output, fetch_stack_info=False) for output in executor_outputs]
+        output_tensors = [Tensor.fast_init(output) for output in executor_outputs]
         if len(output_tensors) == 1:
             output_tensors = output_tensors[0]
         return output_tensors

diff --git a/tripy/nvtripy/backend/mlir/executor.py b/tripy/nvtripy/backend/mlir/executor.py
@@ -91,7 +91,7 @@ def _get_output_tensor_info(self, outputs_runtime_shape, output_devices):
 
             output_device = output_devices[index]
             if not output_device:
-                output_device = device.create_directly(
+                output_device = device.fast_init(
                     "gpu" if memref.address_space == runtime.PointerType.device else "cpu", 0
                 )