Bottlerocket-update-operator not updating the labeled nodes #706

NishanthNaniReddy · 2025-01-10T11:05:26Z

Image I'm using:
bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4

Issue or Feature Request:

Hi Team,

I tried installing bottle rocket-update-operator on my cluster and I could see all the pods ( agent, api-server and the controller) up and running.
I used the v1.28 image(bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4) when I created the cluster hoping that the brupop would update the node ( I labeled ) with v1.29 as this is the latest as of now.

But for some reason the agent is considering the target version same as the current version i.e. 1.28 and it says "No action detected" and waiting for the next scheduled time . The target version should be the latest i.e. 1.29 as per my understanding .

( I also tried with BR image with sub-version v.127 and the issue is same even in this case . agent logs says the below
"{ current_version: "1.27.0", target_version: "1.27.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }")

Logs:
{ api_version: "v1", block_owner_deletion: None, controller: None, kind: "Node", name: "ip-10-168-39-166.eu-west-2.compute.internal", uid: "6c865f3c-310a-4fcc-bd63-491d5b89e957" }]), resource_version: Some("18763727"), self_link: None, uid: Some("66f464ca-8f05-4bbe-9fce-5d71873838e0") }, spec: BottlerocketShadowSpec { state: Idle, state_transition_timestamp: None, version: None }, status: Some(BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.28.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }, state: Idle, shadow_error_info: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }

Could you please let me know if my understanding on this is correct or am I missing something?

Thanks.

ytsssun · 2025-01-10T23:40:03Z

Hi @NishanthNaniReddy , thanks for opening this issue. May I know what your brupop setup is like?

How many bottlerocket nodes are there in the cluster that has brupop installed?
What is the output of apiclient update check when you log into the bottlerocket node?

I was not able to reproduce this with the bottlerocket-aws-k8s-1.29-x86_64-v1.28.0-0ab4fab4 AMI. My setup is on us-west-2 and I am using 1.29 k8s version. I will try the exact setup you have and report back.

My brupop agent was able to detect the 1.29.0 version and bump to it

  2025-01-10T23:35:26.434844Z  INFO agent::agentclient: Brs status has been updated., brs_name: "ip-192-168-74-140.us-west-2.compute.internal", brs_status: BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.29.0", current_state: StagedAndPerformedUpdate, crash_count: 0, state_transition_failure_timestamp: None }

kubectl get brs --namespace brupop-bottlerocket-aws       
NAME                                               STATE                      VERSION   TARGET STATE         TARGET VERSION   CRASH COUNT
brs-ip-192-168-71-160.us-west-2.compute.internal   Idle                       1.28.0    Idle                 <no value>       0
brs-ip-192-168-74-140.us-west-2.compute.internal   StagedAndPerformedUpdate   1.28.0    RebootedIntoUpdate   1.29.0           0
brs-ip-192-168-88-44.us-west-2.compute.internal    Idle                       1.29.0    Idle                 1.29.0           0

ytsssun · 2025-01-11T00:07:22Z

Update to the above comment. I later used the same bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4 in eu-west-2 and test the update. It also worked for me.

  2025-01-11T00:04:36.322804Z  INFO agent::agentclient: Brs status has been updated., brs_name: "ip-192-168-173-228.eu-west-2.compute.internal", brs_status: BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.29.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }

It would also be helpful to share your scheduler_cron_expression.

NishanthNaniReddy · 2025-01-13T10:33:23Z

Hi @ytsssun ,
Thanks for your response !

My cluster is on eu-west-2 and with 1.30 K8s version . It has got 2 nodes and I have added the label to only one node.

Below are the logs screenshots of agent, api-server and controller pods..

NishanthNaniReddy · 2025-01-13T12:15:51Z

Ah, Looks like it's failing to fetch the updates ....
There is a netpol to allow-all in brupop-bottlerocket-aws namespace and "https://updates.bottlerocket.aws" is in allowed egress list. Not sure what is blocking here to fetch the data ...

apiclient update check

12:02:34 [INFO] Refreshing updates...
Failed to check for updates: refresh attempt failed with status 'Failed' (-1): Metadata error: Failed to fetch https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json: Transport 'other' error fetching 'https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0': error sending request for url (https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0)

Its able to make connection too :

curl -kv https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0

Trying 18.165.201.101:443...
Connected to updates.bottlerocket.aws (18.165.201.101) port 443
ALPN: curl offers h2,http/1.1
Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@strength
TLSv1.2 (OUT), TLS handshake, Client hello (1):
Recv failure: Connection reset by peer
OpenSSL SSL_connect: Connection reset by peer in connection to updates.bottlerocket.aws:443
Closing connection
curl: (35) Recv failure: Connection reset by peer

ytsssun · 2025-01-17T21:43:42Z

@NishanthNaniReddy I noticed that you only have 1 node labeled for brupop. It is recommended to have at least 3 nodes labeled if you want to use brupop. The reason is during node upgrade, k8s would evict pods, which would evict the controller pod which manages the update across the nodes. If you only have 1 labeled node, the controller pod will not be able to get rescheduled since no other node would be available for it. 2-node setup may work, but it is likely to cause failure because it is more likely to hit capacity limitation for the node.

The TLS failure is an interesting one that we might need to take a further look. The tuf tool would try iteratively looking for the root JSONs and ideally it would find the actual root json. The 7.root.json does not exist, which is why it failed. This works for me.

curl -v https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/1.root.json\?seed\=1322\&version\=1.28.0

Is your apiclient update check not finding any available update?

cbgbt · 2025-02-03T22:44:50Z

The TLS failure is an interesting one that we might need to take a further look

I was chatting with @ytsssun and he mentioned that this is an expected output related to the way that the TUF protocol works -- the protocol iteratively attempts to find the latest root for the repository.

@NishanthNaniReddy are you configuring Bottlerocket using userdata or the settings API? In particular, I'm wondering if there's any chance that you set the settings.updates.version-lock field. Brupop uses these settings to determine which update to take, so that would influence the operator.

If you can share any settings.updates settings you've configured, that would be helpful!

NishanthNaniReddy changed the title ~~Bottlerocket-update-operator agent pod considering the "target version" incorrectly~~ Bottlerocket-update-operator not updating the labeled nodes Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bottlerocket-update-operator not updating the labeled nodes #706

Bottlerocket-update-operator not updating the labeled nodes #706

NishanthNaniReddy commented Jan 10, 2025

ytsssun commented Jan 10, 2025

ytsssun commented Jan 11, 2025 •

edited

Loading

NishanthNaniReddy commented Jan 13, 2025

NishanthNaniReddy commented Jan 13, 2025 •

edited

Loading

ytsssun commented Jan 17, 2025

cbgbt commented Feb 3, 2025

Bottlerocket-update-operator not updating the labeled nodes #706

Bottlerocket-update-operator not updating the labeled nodes #706

Comments

NishanthNaniReddy commented Jan 10, 2025

ytsssun commented Jan 10, 2025

ytsssun commented Jan 11, 2025 • edited Loading

NishanthNaniReddy commented Jan 13, 2025

NishanthNaniReddy commented Jan 13, 2025 • edited Loading

ytsssun commented Jan 17, 2025

cbgbt commented Feb 3, 2025

ytsssun commented Jan 11, 2025 •

edited

Loading

NishanthNaniReddy commented Jan 13, 2025 •

edited

Loading