-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved test partitioning: keep namespaces grouped together and don't sort tests #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is pretty clever, and I've just suggested minor tweaks. I didn't look at the actions changes at all, totally trust you there.
Zooming out a bit, it seems like our real goal is to minimize the size of the largest group, rather than produce the most even split, as closely related as these are. One name for this problem is the "longest processing time" in scheduling, with tons of well studied algos.
There's a nice greedy solution for this - to insert the namespaces in descending order of size into the smallest bucket. I think that approach would be simpler and might give better results. A nice property to have of the solution is that the actual assignments are not too sensitive to us adding a few tests, and it might be worse in that regard though.
(let [test-var->sort-position (into {} | ||
(map-indexed | ||
(fn [i varr] | ||
[varr i])) | ||
test-vars)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a cuter way 🐱
(let [test-var->sort-position (into {} | |
(map-indexed | |
(fn [i varr] | |
[varr i])) | |
test-vars)] | |
(let [test-var->sort-position (zipmap test-vars (range))] |
(reduce | ||
(fn [m test-var] | ||
(update m (namespace* test-var) (fnil inc 0))) | ||
{} | ||
test-vars)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again appealing to cuteness 🎈
(reduce | |
(fn [m test-var] | |
(update m (namespace* test-var) (fnil inc 0))) | |
{} | |
test-vars)) | |
(frequencies (map namespace* test-vars)) |
(into {} | ||
(map-indexed (fn [i test-var] | ||
(let [ideal-partition (long (math/floor (/ i target-partition-size)))] | ||
(assert (<= 0 ideal-partition (dec num-partitions))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep this? Seems like a mathematical certainty.
(let [target-partition-size (/ (count test-vars) num-partitions)] | ||
(into {} | ||
(map-indexed (fn [i test-var] | ||
(let [ideal-partition (long (math/floor (/ i target-partition-size)))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: floor is implicit when casting positive numbers to long.
(let [test-var->ideal-partition (test-var->ideal-partition num-partitions test-vars)] | ||
(reduce | ||
(fn [m test-var] | ||
(update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really getting into the quest for cuteness / minimal nesting and punctuation.
(update m (namespace* test-var) #(conj (set %) (test-var->ideal-partition test-var)))) | |
(update m (namespace* test-var) (fnil conj #{}) (test-var->ideal-partition test-var))) |
multiple-possible-partitions? (fn [nmspace] | ||
(> (count (namespace->possible-partitions nmspace)) | ||
1)) | ||
namespaces (concat (remove multiple-possible-partitions? namespaces) | ||
(filter multiple-possible-partitions? namespaces))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could save a bit of ceremony and give close enough semantics to just sort by the size.
multiple-possible-partitions? (fn [nmspace] | |
(> (count (namespace->possible-partitions nmspace)) | |
1)) | |
namespaces (concat (remove multiple-possible-partitions? namespaces) | |
(filter multiple-possible-partitions? namespaces))] | |
namespaces (sort-by (comp count namespace->possible-partitions) namespaces)] |
Speaking of sorting though, this reminds me of the old "rocks, pebbles, and sand" approach...
Nothing like crafting a whole bunch of suggestions, then approving before you checked for automerge 🤦 |
Test partition code is fancy and aims to keep test partition sizes as even as possible while keeping each namespace grouped together in the same partition and avoiding changing the order tests come in. Is smart about deciding where things go in order to keep partition sizes relatively even.
Also I updated the GH actions so it's more in line with how we do things for our other libs (consolidate repeated setup code and run Kondo from the JVM rather so we can pin a specific version in
deps.edn
)