Add ukernel selection logic + clean up KleidiAI integration #1652

metascroy · 2025-02-03T03:44:22Z

This is a draft to do ukernel selection based on cpu_info.

This relates to #1376

pytorch-bot · 2025-02-03T03:44:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1652

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 21cef83 with merge base 22d7d51 ():

NEW FAILURES - The following jobs have failed:

PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run Float8 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio --index-url https://download.pytor... / linux-job (gh)
RuntimeError: Command

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2025-02-03T03:45:22Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.cpp

@@ -98,7 +98,7 @@ LinearTilingParams get_default_linear_tiling_params(
  TORCHAO_CHECK(num_threads >= 1, "num_threads must be >= 1");

  tiling_params.mc_by_mr = 1;
-  int mc = tiling_params.mc_by_mr * ukernel_config.mr;
+  int mc = tiling_params.mc_by_mr * ukernel_config.kernels[0].mr;


ukernel_config now includes an array of kernels based on mr. Still need to add mr selection logic here, for now it just selects the first one.

metascroy · 2025-02-03T17:22:56Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+  static UKernelConfigCacheType ukernel_config_cache;
+
+  // Check cache
+  auto it = ukernel_config_cache.find(header);


If we want uarch specific kernel per core, we can add uarch to cache key and look up uarch before looking in cache, e.g.,

auto uarch = get_current_core_uarch(); auto it = ukernel_config_cache.find({header, uarch});

metascroy · 2025-02-03T23:17:16Z

torchao/experimental/CMakeLists.txt

@@ -22,7 +22,7 @@ if(NOT TORCHAO_INCLUDE_DIRS)
  set(TORCHAO_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/../..)
 endif()

-option(TORCHAO_BUILD_KLEIDIAI "Download, build, and link against Arm KleidiAI library (arm64 only)" OFF)
+option(TORCHAO_BUILD_KLEIDIAI "Download, build, and link against Arm KleidiAI library (arm64 only)" ON)


TODO: nocommit

metascroy · 2025-02-03T23:17:42Z

torchao/experimental/tests/test_packed_linear_int8_dynamic_activation_intx_weight_layout.py

+                    # print(f"actual_val={actual_val}, expected_val={expected_val}")
+                    # self.assertTrue(torch.allclose(actual_val, expected_val, atol=1e-6))
+
+                    self.assertTrue(torch.abs(actual_val - expected_val) < 0.05)


Do not commit change. This is because kleidi has bf16 instead of fp32.

metascroy · 2025-02-03T23:18:16Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/packed_weights_header.h

+       0});
+}
+
+struct KleidiAIPackingParams {


TODO: check if these packing params are sufficient for all kleidi.

metascroy · 2025-02-04T06:54:21Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+          ukernel_config_cache[key] = torchao::ops::linear_8bit_act_xbit_weight::UKernelConfig{
+          /*preferred_alignment*/16,
+          /*weight_packing*/
+          {


We can rework the kleidiai integration to share weight packing, rather than repeat in each namespace.

It is shared in code, but exposed along with the kernel so you don't have to map it back to the kernel at call sites.

It is in shared code, but not in a way that is convenient to access with shared mr kernels because the same packing function (indexed by nr, kr, sr) is given 4 different names (based on namespace).

So we could refactor it to make one packing function in kai_matmul_clamp_f32_qai8dxp_qsi4c32p, rather than have them in further specific namespaces?

metascroy · 2025-02-04T06:54:39Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+          /*kernels*/
+          {{
+            {
+            /*mr*/static_cast<int>(uk.get_m_step()),


List of methods index by mr.

digantdesai

Good start. If you can also think some more about code organization, for taking a lot more kernels, and scalability in general.

digantdesai · 2025-02-04T16:50:31Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/CMakeLists.txt

@@ -8,13 +8,23 @@ cmake_minimum_required(VERSION 3.19)

 include(${CMAKE_CURRENT_SOURCE_DIR}/../../Utils.cmake)

+add_compile_options(-Wno-unused-function -Wno-unused-variable) # For some reason cpuinfo package has unused functions/variables


Fix it upstream?

digantdesai · 2025-02-04T16:53:01Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+          ukernel_config_cache[key] = torchao::ops::linear_8bit_act_xbit_weight::UKernelConfig{
+          /*preferred_alignment*/16,
+          /*weight_packing*/
+          {


It is shared in code, but exposed along with the kernel so you don't have to map it back to the kernel at call sites.

digantdesai · 2025-02-04T16:57:10Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+          assert (sr == uk.get_sr());
+
+          ukernel_config_cache[key] = torchao::ops::linear_8bit_act_xbit_weight::UKernelConfig{
+          /*preferred_alignment*/16,


nit

Suggested change

/*preferred_alignment*/16,

/*preferred_alignment*/uk.get_preferred_alignment(),

bucket_size = get_bucket_size(uarch)
if bucket_size == 0 && cpu_info_has_i8mm() {

}

digantdesai · 2025-02-04T17:01:55Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+  #if defined(TORCHAO_ENABLE_KLEIDI)
+  if (!target || *target == "kleidi_ai") {
+    if (weight_nbit == 4 && !has_weight_zeros) {
+      return torchao::ops::linear_8bit_act_xbit_weight::get_packed_weights_format_kleidi_ai(weight_nbit, has_weight_zeros, /*has_bias*/true, /*nr*/8, /*kr*/16, /*sr*/2);


in the future we would have to make a choice for nr based on a cpu type (or some static choice for AOT-weight-packing like this), and register [mr] kernels, which you are already planning.

Yes, we can use any method in cpuinfo to select packed_weights_format, including any packing params like nr. This is not entirely static because universal is only selected if cpuinfo_has_arm_neon_dot is available. We could also use fields from uarch to select things here I guess?

I wonder if we should pass n and k as params in addition to target. Implementers can then take into account matrix size when selecting nr?

digantdesai · 2025-02-04T17:07:22Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.h

+  // ukernel must behave correctly no matter how buffers are aligned
+  size_t preferred_alignment{0}; 
+  weight_packing_config weight_packing;
+  std::array<kernel_config, 4> kernels;


Nit

Suggested change

std::array<kernel_config, 4> kernels;

std::array<kernel_config, MAX_MR_TYPES> kernels;

digantdesai · 2025-02-04T17:12:04Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.h

+    weight_data_size_fn_type weight_data_size_fn{nullptr};
+    prepare_weight_data_fn_type prepare_weight_data_fn{nullptr};
+  };
+  struct kernel_config {


This makes sense that you have one packing kernel, and for which N gemm kernels index by mr, but the naming makes this confusing to read i.e. ukernel->kernel[mr].mr

digantdesai · 2025-02-04T17:18:53Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.h

+  // preferred_alignment for activation and weight data
+  // Integration surfaces are not required to respect this alignment, and the
+  // ukernel must behave correctly no matter how buffers are aligned
+  size_t preferred_alignment{0}; 


we have to make sure this is same for all MRs, i.e. document, test

digantdesai · 2025-02-04T17:20:53Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+        /*kernels*/
+        {{
+          {
+          /*mr*/static_cast<int>(uk.get_m_step()),


in the future, when querying for mr(s), we should ensure their weight packing function pointer is same

This goes to the comment about reworking the kleidiAI integration I guess?

metascroy · 2025-02-04T19:59:25Z

Good start. If you can also think some more about code organization, for taking a lot more kernels, and scalability in general.

Let me give it some more thought about breaking some code out.

metascroy · 2025-02-04T21:40:12Z

Adding @kimishpatel because he was curious about the PR. kernel_selector.h is the main code to pay attention to for runtime kernel selection.

metascroy · 2025-02-07T01:05:00Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+torchao::ops::linear_8bit_act_xbit_weight::UKernelConfig select_ukernel_config(torchao::ops::PackedWeightsFormat format) {
+  static UKernelConfigRegistrationTable table;
+
+  // In future, we can populate this with the current thread's uarch


Added uarch to kernel selection cache, although it currently is just set to unknown, so cache is effectively based on format.

metascroy · 2025-02-07T06:03:08Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

+
+
+
+DEFINE_WEIGHT_DATA_FNS(/*nr*/8, /*kr*/16, /*sr*/2)


@digantdesai this file is my draft reworking of the kleidiai integration. Weight packing and activation functions are no longer in isa kernel-specific namespaces because many kernels share the same routines.

Kernel functions and uconfigs are defined using macros. I would like DEFINE_KERNEL_FNS to be defined by things like mr, nr, instruction (dotprod/i8mm), but I don't follow follow the kleidi naming convention. So now it is indexed by first/suffix.

This looks good. Alternative to this would be to code gen these wrappers at compile-time but this is clean enough.

digantdesai · 2025-02-12T21:35:20Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

+#define DEFINE_WEIGHT_DATA_FNS(nr, kr, sr)                                          \
+    size_t weight_data_size_nr##nr##_kr##kr##_sr##sr(int n, int k, int group_size) {  \
+        return weight_data_size(nr, kr, sr, n, k, group_size);          \
+    }                                                                               \
+    void prepare_weight_data_nr##nr##_kr##kr##_sr##sr(                                \
+        void* weight_data,                                                          \
+        int n,                                                                      \
+        int k,                                                                      \
+        int group_size,                             \
+        const int8_t* weight_qvals, \
+        const float* weight_scales, \
+        const int8_t* weight_zeros, \
+        const float* bias) { \
+        prepare_weight_data(nr, kr, sr, weight_data, n, k, group_size, weight_qvals, weight_scales, weight_zeros, bias); \
+    }


Torture!

Suggested change

#define DEFINE_WEIGHT_DATA_FNS(nr, kr, sr) \

size_t weight_data_size_nr##nr##_kr##kr##_sr##sr(int n, int k, int group_size) { \

return weight_data_size(nr, kr, sr, n, k, group_size); \

} \

void prepare_weight_data_nr##nr##_kr##kr##_sr##sr( \

void* weight_data, \

int n, \

int k, \

int group_size, \

const int8_t* weight_qvals, \

const float* weight_scales, \

const int8_t* weight_zeros, \

const float* bias) { \

prepare_weight_data(nr, kr, sr, weight_data, n, k, group_size, weight_qvals, weight_scales, weight_zeros, bias); \

}

#define DEFINE_WEIGHT_DATA_FN(nr, kr, sr)

\

size_t weight_data_size_nr##nr##_kr##kr##_sr##sr(int n, int k, int group_size) { \

return weight_data_size(nr, kr, sr, n, k, group_size); \

} \

void prepare_weight_data_nr##nr##_kr##kr##_sr##sr( \

void* weight_data, \

int n, \

int k, \

int group_size, \

const int8_t* weight_qvals, \

const float* weight_scales, \

const int8_t* weight_zeros, \

const float* bias) { \

prepare_weight_data(nr, kr, sr, weight_data, n, k, group_size, weight_qvals, weight_scales, weight_zeros, bias); \

}

digantdesai · 2025-02-12T21:41:28Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

+#define DEFINE_KERNEL_FNS(first, suffix) \
+  namespace impl_##suffix { \
+  const Ukernel get_ukernel() { \
+  return Ukernel{ \
+  .get_m_step = kai_get_m_step_matmul_clamp_f32_qai8dxp##first##_qsi4c32p##suffix, \


Passing in first ir lhs as qai8dxp1x8 instead of 1x8 is better for (1) meaningful, (2) can cover channel wise 4b quant i.e. QC4W as well.

Also suffix should be three different things, rhs + output tile x kacc + isa, where rhs is not 8x8 but qsi4c32p4x8.

Suggested change

#define DEFINE_KERNEL_FNS(first, suffix) \

namespace impl_##suffix { \

const Ukernel get_ukernel() { \

return Ukernel{ \

.get_m_step = kai_get_m_step_matmul_clamp_f32_qai8dxp##first##_qsi4c32p##suffix, \

#define DEFINE_KLEIDI_KERNEL_FN(lhs, suffix) \

namespace impl_##suffix { \

const Ukernel get_ukernel() { \

return Ukernel{ \

.get_m_step = kai_get_m_step_matmul_clamp_f32_##lhs##_##suffix, \

digantdesai · 2025-02-12T21:42:21Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

+
+
+
+DEFINE_WEIGHT_DATA_FNS(/*nr*/8, /*kr*/16, /*sr*/2)


This looks good. Alternative to this would be to code gen these wrappers at compile-time but this is clean enough.

digantdesai · 2025-02-12T21:46:01Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

+    }
+
+// TODO: first and suffix need to be better, e.g., parametrized by mr, nr, etc
+// But I don't quite follow the naming convention for KleidiAI


naming convention - kai_matmul_<fused_ops>_<dst_info>_<lhs_info>_<rhs_info>_<mr x nr x kacc>_<technology>_<feature>_<instruction>

metascroy · 2025-02-17T23:33:50Z

@digantdesai I can rebase this PR on #1723 (which contains formatting changes based on fbcode formatter). That should make it easier to review because many of the changes are just formatting.

This PR cleans up KleidiAI integration:

kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h contains all ISA specific variants (IS specific files are removed)
Support for odd n is added
KleidiAI tests are added to the CI

The purpose for cleaning up the KleidiAI integration is to make the ukernel selection logic cleaner, the main purpose of this PR.

digantdesai

Looks good to me. Thanks Scott. Left some comments.

digantdesai · 2025-02-20T00:04:51Z

.github/workflows/torchao_experimental_test.yml

+          sh build_and_run_tests.sh
+          rm -rf /tmp/cmake-out
+          popd
+      - name: Run torchao/experimental/ops/tests


digantdesai · 2025-02-20T00:05:40Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

-size_t roundup(size_t a, size_t b) { return ((a + b - 1) / b) * b; }
+namespace internal {
+
+inline size_t roundup(size_t a, size_t b) { return ((a + b - 1) / b) * b; }


Nit move somewhere in more shared utils?

Created issue: #1744

roundup isn't just used here, but in other places as well and I'd rather unify them as part of one effort .

digantdesai · 2025-02-20T00:06:57Z

torchao/experimental/kernels/cpu/aarch64/kleidi/kai_matmul_clamp_f32_qai8dxp_qsi4c32p.h

+    }                                                                          \
+  }
+
+DEFINE_KERNEL_STRUCT(


Ideally we should wrap these in TORCHAO_ENABLE_ARM_DOTPROD

Filed issue: #1743

digantdesai · 2025-02-20T00:12:45Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+#endif // TORCHAO_ENABLE_ARM_I8MM
+
+    if (cpuinfo_has_arm_neon_dot()) {
+      constexpr int n_step = 8;


unfortunate that we can't do get_n_step() or get_n_step(nr) here :\

digantdesai · 2025-02-20T00:17:50Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+  }
+}
+
+// Not thread safe


I think we can add thread safety when its needed. It's currently used on the main thread only.

digantdesai · 2025-02-20T00:19:00Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+  // Note, cpuinfo_get_current_core() is not currently implemeted outside of
+  // linux XNNPACK often uses non-core specific logic like
+  // cpuinfo_get_core(0)->uarch in configs


drop or move it to commit msg?

I'd rather leave it for now, especially if we plan to add uarch differentiation soon. Otherwise, someone might try to do: cpuinfo_get_current_core()->uarch, with bad results on Apple platforms.

digantdesai · 2025-02-20T00:23:14Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/kernel_selector.h

+                  (!has_weight_zeros)) { // TODO: add has_bias here
+      return PackedWeightsFormat(
+          torchao::ops::PackedWeightsType::kleidi_ai, weight_nbit,
+          has_weight_zeros, /*has_bias*/ true, /*nr*/ 8, /*kr*/ 16, /*sr*/ 2);


check has_bias == True and use that? Re. your TODO comment, do you mean the wiring from TorchAO to the Op for the bias?

Added comment on bias issue to address this: #1675

has_bias is always false right now, so this will never be selected if we reply on has_bias. But if a null bias ptr is passed to KleidiAI, we construct a bias of zeros and include it in the packed weights.

digantdesai · 2025-02-20T00:24:17Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.cpp

@@ -31,7 +31,7 @@ PackWeightDataTilingParams get_default_pack_weight_data_tiling_params(
  assert(nc >= 1);

  // Replace nc with the next number nr divides
-  nc = ((nc + ukernel_config.nr - 1) / ukernel_config.nr) * ukernel_config.nr;
+  nc = ((nc + nr - 1) / nr) * nr;


Suggested change

nc = ((nc + nr - 1) / nr) * nr;

nc = roundup(nc, nr);

I don't think roundup is defined here. Created issue on creating shared utils for things like roundup to live above.

digantdesai · 2025-02-20T00:25:33Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/linear_8bit_act_xbit_weight.cpp

    int weight_data_offset =
-        (n_idx / nr) * ukernel_config.weight_data_size_fn(nr, k, group_size);
+        (n_idx / nr) * ukernel_config.weight_packing_config.weight_data_size_fn(
+                           nr, k, group_size);


for the future, not sure if we want to assume this i.e. one can pack weights differently which can break this, ideally we should have an API for this which kernels can overwrite.

We could make it part of the config as a follow up: e.g., ukernel_config.weight_offset_fn(n_idx, nr, k, group_size).

Created feature request: #1745

digantdesai · 2025-02-20T00:27:52Z

torchao/experimental/ops/linear_8bit_act_xbit_weight/op_linear_8bit_act_xbit_weight-impl.h

-      /*bias*/ nullptr);
+  packed_weights_header.write(packed_weights.mutable_data_ptr<int8_t>());
+
+  // TODO: support passing in bias in future


There's already an issue for it: #1675

digantdesai

Looks good to me. Thanks Scott. Left some comments.

digantdesai

Looks good to me. Thanks Scott. Left some comments.

metascroy requested a review from digantdesai February 3, 2025 03:44

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2025

metascroy commented Feb 3, 2025

View reviewed changes

metascroy commented Feb 4, 2025

View reviewed changes

digantdesai reviewed Feb 4, 2025

View reviewed changes

metascroy mentioned this pull request Feb 4, 2025

[Feature Request] Add dynamic kernel selection to torchao/experimental #1376

Open

metascroy requested a review from kimishpatel February 4, 2025 21:39

metascroy force-pushed the kernel-sel branch from 7a43be4 to 5c45936 Compare February 7, 2025 01:02

metascroy commented Feb 7, 2025

View reviewed changes

digantdesai reviewed Feb 12, 2025

View reviewed changes

metascroy force-pushed the kernel-sel branch 2 times, most recently from 89b7b10 to 9d5e7c7 Compare February 17, 2025 23:23

metascroy marked this pull request as ready for review February 17, 2025 23:28

metascroy changed the title ~~draft ukernel selection logic~~ Add ukernel selection logic + clean up KleidiAI integration Feb 17, 2025

metascroy added 2 commits February 17, 2025 18:18

UKernel Selection, up, up, up, up

4082549

up

21cef83

metascroy force-pushed the kernel-sel branch from 9d5e7c7 to 21cef83 Compare February 18, 2025 02:18

digantdesai approved these changes Feb 20, 2025

View reviewed changes

metascroy merged commit f6f3322 into main Feb 20, 2025
48 of 50 checks passed

		@@ -8,13 +8,23 @@ cmake_minimum_required(VERSION 3.19)

		include(${CMAKE_CURRENT_SOURCE_DIR}/../../Utils.cmake)

		add_compile_options(-Wno-unused-function -Wno-unused-variable) # For some reason cpuinfo package has unused functions/variables

	/preferred_alignment/16,
	/preferred_alignment/uk.get_preferred_alignment(),

	std::array<kernel_config, 4> kernels;
	std::array<kernel_config, MAX_MR_TYPES> kernels;

Add ukernel selection logic + clean up KleidiAI integration #1652

Add ukernel selection logic + clean up KleidiAI integration #1652

Conversation

metascroy commented Feb 3, 2025 • edited Loading

pytorch-bot bot commented Feb 3, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1652

❌ 2 New Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digantdesai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metascroy commented Feb 4, 2025

metascroy commented Feb 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metascroy commented Feb 17, 2025

digantdesai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metascroy Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digantdesai left a comment

Choose a reason for hiding this comment

digantdesai left a comment

Choose a reason for hiding this comment

metascroy commented Feb 3, 2025 •

edited

Loading

pytorch-bot bot commented Feb 3, 2025 •

edited

Loading

metascroy Feb 20, 2025 •

edited

Loading