-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottlerocket-update-operator not updating the labeled nodes #706
Comments
Hi @NishanthNaniReddy , thanks for opening this issue. May I know what your brupop setup is like?
I was not able to reproduce this with the My brupop agent was able to detect the 1.29.0 version and bump to it
|
Update to the above comment. I later used the same
It would also be helpful to share your |
Hi @ytsssun , My cluster is on eu-west-2 and with 1.30 K8s version . It has got 2 nodes and I have added the label to only one node. Below are the logs screenshots of agent, api-server and controller pods.. ![]() ![]() ![]() ![]() |
Ah, Looks like it's failing to fetch the updates .... apiclient update check 12:02:34 [INFO] Refreshing updates... Its able to make connection too : curl -kv https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0
|
@NishanthNaniReddy I noticed that you only have 1 node labeled for brupop. It is recommended to have at least 3 nodes labeled if you want to use brupop. The reason is during node upgrade, k8s would evict pods, which would evict the controller pod which manages the update across the nodes. If you only have 1 labeled node, the controller pod will not be able to get rescheduled since no other node would be available for it. 2-node setup may work, but it is likely to cause failure because it is more likely to hit capacity limitation for the node. The TLS failure is an interesting one that we might need to take a further look. The tuf tool would try iteratively looking for the root JSONs and ideally it would find the actual root json. The 7.root.json does not exist, which is why it failed. This works for me.
Is your apiclient update check not finding any available update? |
I was chatting with @ytsssun and he mentioned that this is an expected output related to the way that the TUF protocol works -- the protocol iteratively attempts to find the latest root for the repository. @NishanthNaniReddy are you configuring Bottlerocket using userdata or the settings API? In particular, I'm wondering if there's any chance that you set the If you can share any |
Image I'm using:
bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4
Issue or Feature Request:
Hi Team,
I tried installing bottle rocket-update-operator on my cluster and I could see all the pods ( agent, api-server and the controller) up and running.
I used the v1.28 image(bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4) when I created the cluster hoping that the brupop would update the node ( I labeled ) with v1.29 as this is the latest as of now.
But for some reason the agent is considering the target version same as the current version i.e. 1.28 and it says "No action detected" and waiting for the next scheduled time . The target version should be the latest i.e. 1.29 as per my understanding .
( I also tried with BR image with sub-version v.127 and the issue is same even in this case . agent logs says the below
"{ current_version: "1.27.0", target_version: "1.27.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }")
Logs:
{ api_version: "v1", block_owner_deletion: None, controller: None, kind: "Node", name: "ip-10-168-39-166.eu-west-2.compute.internal", uid: "6c865f3c-310a-4fcc-bd63-491d5b89e957" }]), resource_version: Some("18763727"), self_link: None, uid: Some("66f464ca-8f05-4bbe-9fce-5d71873838e0") }, spec: BottlerocketShadowSpec { state: Idle, state_transition_timestamp: None, version: None }, status: Some(BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.28.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }, state: Idle, shadow_error_info: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }
Could you please let me know if my understanding on this is correct or am I missing something?
Thanks.
The text was updated successfully, but these errors were encountered: