-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix firstBucketSize for CPU histogram #7554
base: master
Are you sure you want to change the base?
Conversation
Welcome @glemsom! |
Hi @glemsom. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: glemsom The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hey @glemsom thanks for the PR! A change in CheckpointVersion means that everyone using checkpoints as a history provider will lose their entire history, so wdyt @raywainman, @kwiesmueller, @adrianmoisey , should we be making this conditionally enabled behind a feature flag? I'm kind of on the edge here: for CPU, I don't consider history samples to be very useful, the confidence for scaling up (which is the most critical concern) increases pretty quickly, such that this shouldn't cause big problems. For memory, we only record one peak per half-life period, so this is definitely a bigger impact. |
/ok-to-test |
🤔 I'm wondering if there's a way to give users a clean upgrade path. |
Right, the buckets won't cleanly carryover from v3 to v4 here and cause users to silently lose history which could cause VPA to go a bit wonky. Putting this behind a flag seems like a good first step... If we wanted to change this to the default then we'd need to have some sort of "migration" code. |
What's slightly annoying is that the VerticalPodAutoscalerCheckpoint object itself doesn't seem to be able to handle multiple versions at once. ie: there is no list or similar data structure containing each version. So I guess we would need a live migration. If the VPA was set to use v4, and a v3 checkpoint existed, it would need to read that checkpoint, migrate and write to v4. May be another option is maintaining two code paths for a few releases. New checkpoints are on v4, but the recommender supports v3 and v4. |
I will have a look at putting this behind a flag, and having the |
From my side, I'd prefer a plan on what to do after the feature flag has been added. |
Wonder if you could do this "live" migration when reading the
Then subsequent checkpointing will be done using |
We could iterate over the old checkpoint and re-add all samples of Looking at what this change would do to the bucket boundaries, there would be some movement, leading to pretty much everything getting new recommendations (because the bucket boundaries have changed). See the example prints below, with start bucket values in millicores. Before this change:
After this change
In the end, we'd probably rather do this and indicate to users that they can expect a massive eviction instead of hiding this behind a feature flag forever? |
Looking at last time we lowered the starting bucket, we decreased it from I do not feel comfortable writing a migration for For what it is worth, what we did in our clusters was to simply wipe all |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Adjust
firstBucketSize
for CPU histograms to support less than 10m buckets.Which issue(s) this PR fixes:
Fixes #6415
Special notes for your reviewer:
Does this PR introduce a user-facing change?
NONE