Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] Add rendezvous_debug_dataset table #7341

Merged
merged 5 commits into from
Jan 15, 2025

Conversation

jgallagher
Copy link
Contributor

Also adds datastore methods and a library crate to populate it.

This is PR 1 of 2; the second PR will add both an RPW that calls this library crate and adds a consumer of this table.

Also adds datastore methods and a library crate to populate it.
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, this structure makes sense to me

// We want to insert any in-service datasets (according to the blueprint)
// that are also present in `inventory_datasets` but that are not already
// present in the database (described by `existing_datasets`).
let datasets_to_insert = inventory_datasets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let datasets_to_insert = inventory_datasets
let new_inventory_datasets = inventory_datasets

This is a nitpick, but I think it's worthwhile. We don't necessarily want to insert these debug datasets unless they're also in the blueprint as in-service.

Copy link
Contributor Author

@jgallagher jgallagher Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this in 08315cf to not even construct this intermediate set.

* Hard deletion of tombstoned datasets will require some care with respect
* to the problem above. For now we keep tombstoned datasets around forever.
*/
time_tombstoned TIMESTAMPTZ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a comment here or in the corresponding rust type that we should not rely on the fact that time_tombstoned is later than time_created for correctness. While it's highly unlikely that is going to not be true, it still shouldn't be relied upon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Do you think I should add a blueprint_when_tombstoned column? That's what we're really operating on; the time is only useful to a human to narrow down when things might have happened.

Copy link
Contributor

@andrewjstone andrewjstone Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would indeed be useful. I don't think you want to get rid of the timestamps, just add that additional column and a comment. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 8372bb6

// This is a minor performance optimization. If we removed this fetch, the
// code below would still be correct, but it would issue a bunch of
// do-nothing inserts for already-existing datasets.
let existing_datasets = datastore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize we don't have multi-rack support yet, but I'm a little leery of this optimization. In our max expected scale out to 1k racks, this could result in over 300,000 datasets being pulled each time, if there is a debug dataset on each disk in a sled. I was thinking that we could instead diff this blueprint with it's parent instead, and then add or remove those datasets as needed. This could also end up reading in all these datasets, but that may also be true already if those blueprints are loaded.

Probably nothing to worry about right now, but long term these blueprints might get huge and we may not want to access them all at once even...

Copy link
Contributor

@andrewjstone andrewjstone Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it looks like I'm probably completely wrong about a debug dataset per disk. In that case, ignore me (at least for debug datasets).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every U.2 gets a Debug dataset and a Transient Zone Root dataset.

Source:
ensure_disk, in nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you @smklein. Well, I'm still concerned, but it's probably not something we can worry about now. Scale will have more issues we can worry about later ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this function actually has both cases and neither of them seem good (but I'm not sure what to do about it):

  • For in-service datasets, we do one (or technically "a small number", since it's paginated into batches) big query up front to list all the things, then only issue individual "insert if not exists" queries for datasets that weren't in that big list. This means we almost always only issue the one big query, since we only have to issue the small inserts when new datasets are added (which is very rare).
  • For expunged datasets, we don't do that: instead we always issue a "mark this tombstoned" individual query for every single expunged dataset.

The second one seems worse to me; that means any time we expunge a debug dataset, we're now issuing 1 more query on every execution of this RPW.

After writing this down, maybe the expunge case should also have a one-big-query-up-front-to-avoid-extra-work thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one seems worse to me; that means any time we expunge a debug dataset, we're now issuing 1 more query on every execution of this RPW.

This is only true until we prune the expunged nodes from the blueprint though, right? In theory those could be limited, while the total set size will remain at the the number of disks in the system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true until we prune the expunged nodes from the blueprint though, right?

Yes, but given pruning expunged nodes from the blueprint is a "don't need to solve any time soon" problem, in practice those will stick around for quite a while, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I assume we'll solve it before we have multirack :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when I wrote this function and my comment above, I was thinking about the original implementation of debug_dataset_list_all_batched() which only returned non-tombstoned rows. But I changed that because we know planning input is going to care about tombstoned rows too, so I changed the "expunged" side of things to reuse the results of that query too in 08315cf. Doesn't address the problem of "is it okay to fetch all debug datasets in a multirack world", but as you point out we can sort that out later!

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few small comments, but this looks really good. Thanks for getting it out so quickly @jgallagher!

@jgallagher jgallagher merged commit fbe043a into main Jan 15, 2025
18 checks passed
@jgallagher jgallagher deleted the john/debug-dataset-rendezvous-1 branch January 15, 2025 21:26
jgallagher added a commit that referenced this pull request Jan 16, 2025
…7342)

This is PR 2 of 2 and builds on #7341; it adds an RPW that calls the
library added in that PR to actually reconcile blueprint+inventory and
update the debug dataset rendezvous table, and changes the support
bundle query that picks a debug dataset to use this new table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants