Improved test partitioning: keep namespaces grouped together and don't sort tests #31

camsaul · 2024-09-05T19:14:34Z

Test partition code is fancy and aims to keep test partition sizes as even as possible while keeping each namespace grouped together in the same partition and avoiding changing the order tests come in. Is smart about deciding where things go in order to keep partition sizes relatively even.

Also I updated the GH actions so it's more in line with how we do things for our other libs (consolidate repeated setup code and run Kondo from the JVM rather so we can pin a specific version in deps.edn)

crisptrutski

This solution is pretty clever, and I've just suggested minor tweaks. I didn't look at the actions changes at all, totally trust you there.

Zooming out a bit, it seems like our real goal is to minimize the size of the largest group, rather than produce the most even split, as closely related as these are. One name for this problem is the "longest processing time" in scheduling, with tons of well studied algos.

There's a nice greedy solution for this - to insert the namespaces in descending order of size into the smallest bucket. I think that approach would be simpler and might give better results. A nice property to have of the solution is that the actual assignments are not too sensitive to us adding a few tests, and it might be worse in that regard though.

crisptrutski · 2024-09-11T08:27:04Z

src/mb/hawk/partition.clj

+  (let [test-var->sort-position (into {}
+                                      (map-indexed
+                                       (fn [i varr]
+                                         [varr i]))
+                                      test-vars)]


There's a cuter way 🐱

Suggested change

(let [test-var->sort-position (into {}

(map-indexed

(fn [i varr]

[varr i]))

test-vars)]

(let [test-var->sort-position (zipmap test-vars (range))]

crisptrutski · 2024-09-11T08:29:45Z

src/mb/hawk/partition.clj

+  (reduce
+   (fn [m test-var]
+     (update m (namespace* test-var) (fnil inc 0)))
+   {}
+   test-vars))


Again appealing to cuteness 🎈

Suggested change

(reduce

(fn [m test-var]

(update m (namespace* test-var) (fnil inc 0)))

{}

test-vars))

(frequencies (map namespace* test-vars))

crisptrutski · 2024-09-11T08:32:40Z

src/mb/hawk/partition.clj

+    (into {}
+          (map-indexed (fn [i test-var]
+                         (let [ideal-partition (long (math/floor (/ i target-partition-size)))]
+                           (assert (<= 0 ideal-partition (dec num-partitions)))


Do we need to keep this? Seems like a mathematical certainty.

crisptrutski · 2024-09-11T08:34:13Z

src/mb/hawk/partition.clj

+  (let [target-partition-size (/ (count test-vars) num-partitions)]
+    (into {}
+          (map-indexed (fn [i test-var]
+                         (let [ideal-partition (long (math/floor (/ i target-partition-size)))]


nit: floor is implicit when casting positive numbers to long.

crisptrutski · 2024-09-11T08:41:04Z

src/mb/hawk/partition.clj

+  (let [test-var->ideal-partition (test-var->ideal-partition num-partitions test-vars)]
+    (reduce
+     (fn [m test-var]
+       (update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var))))


Really getting into the quest for cuteness / minimal nesting and punctuation.

Suggested change

(update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var))))

(update m (namespace* test-var) (fnil conj #{}) (test-var->ideal-partition test-var)))

crisptrutski · 2024-09-11T08:49:25Z

src/mb/hawk/partition.clj

+        multiple-possible-partitions? (fn [nmspace]
+                                        (> (count (namespace->possible-partitions nmspace))
+                                           1))
+        namespaces                     (concat (remove multiple-possible-partitions? namespaces)
+                                               (filter multiple-possible-partitions? namespaces))]


It could save a bit of ceremony and give close enough semantics to just sort by the size.

Suggested change

multiple-possible-partitions? (fn [nmspace]

(> (count (namespace->possible-partitions nmspace))

1))

namespaces (concat (remove multiple-possible-partitions? namespaces)

(filter multiple-possible-partitions? namespaces))]

namespaces (sort-by (comp count namespace->possible-partitions) namespaces)]

Speaking of sorting though, this reminds me of the old "rocks, pebbles, and sand" approach...

crisptrutski · 2024-09-11T09:01:54Z

Nothing like crafting a whole bunch of suggestions, then approving before you checked for automerge 🤦

camsaul added 2 commits September 5, 2024 19:09

Improved test partitioning

9577006

Update dox

8e474cd

camsaul requested a review from a team September 5, 2024 19:14

camsaul added 3 commits September 5, 2024 19:41

Code cleanup

828d000

Modernize GH Actions

b0d9a19

Appease Kondo

c131029

camsaul enabled auto-merge (squash) September 5, 2024 20:25

Ok partitioning does need to sort namespaces (but not vars)

8ecb623

camsaul mentioned this pull request Sep 10, 2024

🏎️🚀🏎️🚀 🏎️🚀 SHAVE 7 MINUTES OFF OF NON-CORE DRIVER TEST RUNS IN CI 🏎️🚀🏎️🚀 🏎️🚀 metabase/metabase#47681

Merged

crisptrutski approved these changes Sep 11, 2024

View reviewed changes

camsaul merged commit 199d5f5 into main Sep 11, 2024
2 checks passed

camsaul deleted the improved-test-partitioning branch September 11, 2024 09:01

camsaul mentioned this pull request Oct 7, 2024

🏎️🚀 🏎️🚀 🏎️🚀 Test partitioning for MySQL: shave ~5 minutes off of CI runs 🏎️🚀🏎️🚀 🏎️🚀 metabase/metabase#48422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved test partitioning: keep namespaces grouped together and don't sort tests #31

Improved test partitioning: keep namespaces grouped together and don't sort tests #31

camsaul commented Sep 5, 2024 •

edited

Loading

crisptrutski left a comment

crisptrutski Sep 11, 2024

crisptrutski Sep 11, 2024

crisptrutski Sep 11, 2024

crisptrutski Sep 11, 2024

crisptrutski Sep 11, 2024

crisptrutski Sep 11, 2024

crisptrutski commented Sep 11, 2024

	(update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var))))
	(update m (namespace* test-var) (fnil conj #{}) (test-var->ideal-partition test-var)))

Improved test partitioning: keep namespaces grouped together and don't sort tests #31

Improved test partitioning: keep namespaces grouped together and don't sort tests #31

Conversation

camsaul commented Sep 5, 2024 • edited Loading

crisptrutski left a comment

Choose a reason for hiding this comment

crisptrutski Sep 11, 2024

Choose a reason for hiding this comment

crisptrutski Sep 11, 2024

Choose a reason for hiding this comment

crisptrutski Sep 11, 2024

Choose a reason for hiding this comment

crisptrutski Sep 11, 2024

Choose a reason for hiding this comment

crisptrutski Sep 11, 2024

Choose a reason for hiding this comment

crisptrutski Sep 11, 2024

Choose a reason for hiding this comment

crisptrutski commented Sep 11, 2024

camsaul commented Sep 5, 2024 •

edited

Loading