-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reconfigurator] Add rendezvous_debug_dataset table #7341
Conversation
Also adds datastore methods and a library crate to populate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done, this structure makes sense to me
// We want to insert any in-service datasets (according to the blueprint) | ||
// that are also present in `inventory_datasets` but that are not already | ||
// present in the database (described by `existing_datasets`). | ||
let datasets_to_insert = inventory_datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let datasets_to_insert = inventory_datasets | |
let new_inventory_datasets = inventory_datasets |
This is a nitpick, but I think it's worthwhile. We don't necessarily want to insert these debug datasets unless they're also in the blueprint as in-service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this in 08315cf to not even construct this intermediate set.
* Hard deletion of tombstoned datasets will require some care with respect | ||
* to the problem above. For now we keep tombstoned datasets around forever. | ||
*/ | ||
time_tombstoned TIMESTAMPTZ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth adding a comment here or in the corresponding rust type that we should not rely on the fact that time_tombstoned is later than time_created for correctness. While it's highly unlikely that is going to not be true, it still shouldn't be relied upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Do you think I should add a blueprint_when_tombstoned
column? That's what we're really operating on; the time is only useful to a human to narrow down when things might have happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would indeed be useful. I don't think you want to get rid of the timestamps, just add that additional column and a comment. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 8372bb6
// This is a minor performance optimization. If we removed this fetch, the | ||
// code below would still be correct, but it would issue a bunch of | ||
// do-nothing inserts for already-existing datasets. | ||
let existing_datasets = datastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize we don't have multi-rack support yet, but I'm a little leery of this optimization. In our max expected scale out to 1k racks, this could result in over 300,000 datasets being pulled each time, if there is a debug dataset on each disk in a sled. I was thinking that we could instead diff this blueprint with it's parent instead, and then add or remove those datasets as needed. This could also end up reading in all these datasets, but that may also be true already if those blueprints are loaded.
Probably nothing to worry about right now, but long term these blueprints might get huge and we may not want to access them all at once even...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it looks like I'm probably completely wrong about a debug dataset per disk. In that case, ignore me (at least for debug datasets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every U.2 gets a Debug dataset and a Transient Zone Root dataset.
Source:
ensure_disk
, in nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you @smklein. Well, I'm still concerned, but it's probably not something we can worry about now. Scale will have more issues we can worry about later ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this function actually has both cases and neither of them seem good (but I'm not sure what to do about it):
- For in-service datasets, we do one (or technically "a small number", since it's paginated into batches) big query up front to list all the things, then only issue individual "insert if not exists" queries for datasets that weren't in that big list. This means we almost always only issue the one big query, since we only have to issue the small inserts when new datasets are added (which is very rare).
- For expunged datasets, we don't do that: instead we always issue a "mark this tombstoned" individual query for every single expunged dataset.
The second one seems worse to me; that means any time we expunge a debug dataset, we're now issuing 1 more query on every execution of this RPW.
After writing this down, maybe the expunge case should also have a one-big-query-up-front-to-avoid-extra-work thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second one seems worse to me; that means any time we expunge a debug dataset, we're now issuing 1 more query on every execution of this RPW.
This is only true until we prune the expunged nodes from the blueprint though, right? In theory those could be limited, while the total set size will remain at the the number of disks in the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true until we prune the expunged nodes from the blueprint though, right?
Yes, but given pruning expunged nodes from the blueprint is a "don't need to solve any time soon" problem, in practice those will stick around for quite a while, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I assume we'll solve it before we have multirack :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when I wrote this function and my comment above, I was thinking about the original implementation of debug_dataset_list_all_batched()
which only returned non-tombstoned rows. But I changed that because we know planning input is going to care about tombstoned rows too, so I changed the "expunged" side of things to reuse the results of that query too in 08315cf. Doesn't address the problem of "is it okay to fetch all debug datasets in a multirack world", but as you point out we can sort that out later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few small comments, but this looks really good. Thanks for getting it out so quickly @jgallagher!
Also adds datastore methods and a library crate to populate it.
This is PR 1 of 2; the second PR will add both an RPW that calls this library crate and adds a consumer of this table.