Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved test partitioning: keep namespaces grouped together and don't sort tests #31

Merged
merged 6 commits into from
Sep 11, 2024

Conversation

camsaul
Copy link
Member

@camsaul camsaul commented Sep 5, 2024

Test partition code is fancy and aims to keep test partition sizes as even as possible while keeping each namespace grouped together in the same partition and avoiding changing the order tests come in. Is smart about deciding where things go in order to keep partition sizes relatively even.

Also I updated the GH actions so it's more in line with how we do things for our other libs (consolidate repeated setup code and run Kondo from the JVM rather so we can pin a specific version in deps.edn)

@camsaul camsaul requested a review from a team September 5, 2024 19:14
@camsaul camsaul enabled auto-merge (squash) September 5, 2024 20:25
Copy link
Contributor

@crisptrutski crisptrutski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is pretty clever, and I've just suggested minor tweaks. I didn't look at the actions changes at all, totally trust you there.

Zooming out a bit, it seems like our real goal is to minimize the size of the largest group, rather than produce the most even split, as closely related as these are. One name for this problem is the "longest processing time" in scheduling, with tons of well studied algos.

There's a nice greedy solution for this - to insert the namespaces in descending order of size into the smallest bucket. I think that approach would be simpler and might give better results. A nice property to have of the solution is that the actual assignments are not too sensitive to us adding a few tests, and it might be worse in that regard though.

Comment on lines +17 to +21
(let [test-var->sort-position (into {}
(map-indexed
(fn [i varr]
[varr i]))
test-vars)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a cuter way 🐱

Suggested change
(let [test-var->sort-position (into {}
(map-indexed
(fn [i varr]
[varr i]))
test-vars)]
(let [test-var->sort-position (zipmap test-vars (range))]

Comment on lines +30 to +34
(reduce
(fn [m test-var]
(update m (namespace* test-var) (fnil inc 0)))
{}
test-vars))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again appealing to cuteness 🎈

Suggested change
(reduce
(fn [m test-var]
(update m (namespace* test-var) (fnil inc 0)))
{}
test-vars))
(frequencies (map namespace* test-vars))

(into {}
(map-indexed (fn [i test-var]
(let [ideal-partition (long (math/floor (/ i target-partition-size)))]
(assert (<= 0 ideal-partition (dec num-partitions)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this? Seems like a mathematical certainty.

(let [target-partition-size (/ (count test-vars) num-partitions)]
(into {}
(map-indexed (fn [i test-var]
(let [ideal-partition (long (math/floor (/ i target-partition-size)))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: floor is implicit when casting positive numbers to long.

(let [test-var->ideal-partition (test-var->ideal-partition num-partitions test-vars)]
(reduce
(fn [m test-var]
(update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really getting into the quest for cuteness / minimal nesting and punctuation.

Suggested change
(update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var))))
(update m (namespace* test-var) (fnil conj #{}) (test-var->ideal-partition test-var)))

Comment on lines +79 to +83
multiple-possible-partitions? (fn [nmspace]
(> (count (namespace->possible-partitions nmspace))
1))
namespaces (concat (remove multiple-possible-partitions? namespaces)
(filter multiple-possible-partitions? namespaces))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could save a bit of ceremony and give close enough semantics to just sort by the size.

Suggested change
multiple-possible-partitions? (fn [nmspace]
(> (count (namespace->possible-partitions nmspace))
1))
namespaces (concat (remove multiple-possible-partitions? namespaces)
(filter multiple-possible-partitions? namespaces))]
namespaces (sort-by (comp count namespace->possible-partitions) namespaces)]

Speaking of sorting though, this reminds me of the old "rocks, pebbles, and sand" approach...

@camsaul camsaul merged commit 199d5f5 into main Sep 11, 2024
2 checks passed
@camsaul camsaul deleted the improved-test-partitioning branch September 11, 2024 09:01
@crisptrutski
Copy link
Contributor

Nothing like crafting a whole bunch of suggestions, then approving before you checked for automerge 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants