fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333

christosarvanitis · 2025-01-17T14:56:34Z

A bad account or any exception currently stops the CleanupAlarmsAgent.

Catching the exception and logging it since the purpose of this agent is to cleanup the stale cloudwatch alarms from the aws/ecs accounts in Spinnaker and not to verify if the accounts are valid or not.

...er-aws/src/main/groovy/com/netflix/spinnaker/clouddriver/aws/agent/CleanupAlarmsAgent.groovy

dbyron-sf · 2025-01-17T19:04:45Z

...er-aws/src/main/groovy/com/netflix/spinnaker/clouddriver/aws/agent/CleanupAlarmsAgent.groovy

          }
+        } catch (Exception e) {


Is there a catch block to add "higher up" (where the run method is called) that would (also) help? Like, what if there exceptions in other agents?

Higher up there is already a try/catch block in the RunnableAgent initial implementation but i have pushed a try/catch block in the CleanupDetachedInstancesAgent as well which implements a RunnableAgent

Seems like run is called from RunnableAgentExecution.executeAgent and all the places that call executeAgent are similar to DefaultAgentScheduler. There's a try/catch, but it only updates metrics. So what we get here is some extra logging, but I don't see how it's going to help with agent scheduling.

This cleanup agents are going through each account and trying to do a cleanup of any stale alarms in cloudwatch or any detached instances etc. When a bad account is present in the credentials repository the agent stops and doesnt go through the rest of the accounts.
By adding a try/catch here we are logging that there was a problem with X account and continue to the next account.

Aaah yes, the try/catch here and in CleanupDetachedInstancesAgent is inside getAccounts().each. Seems like the consequences of this fix are good...that instead of dying on the first error, we'll continue to clean up other accounts.

dbyron-sf · 2025-01-21T21:21:59Z

@christosarvanitis if you bring this branch up to date, I think it's ready to go.

christosarvanitis · 2025-01-22T14:49:49Z

@christosarvanitis if you bring this branch up to date, I think it's ready to go.

Done! Thanks @dbyron-sf

christosarvanitis requested review from jeyrschabu and ajordens as code owners January 17, 2025 14:56

dbyron-sf reviewed Jan 17, 2025

View reviewed changes

...er-aws/src/main/groovy/com/netflix/spinnaker/clouddriver/aws/agent/CleanupAlarmsAgent.groovy Outdated Show resolved Hide resolved

dbyron-sf reviewed Jan 17, 2025

View reviewed changes

christosarvanitis force-pushed the fix-CleanupAlarmsAgentException branch from 7b366a7 to 9b0e768 Compare January 20, 2025 08:17

fix(aws): CleanupAlarmsAgent cycle to catch exceptions

7d77512

christosarvanitis force-pushed the fix-CleanupAlarmsAgentException branch from 9b0e768 to 7d77512 Compare January 20, 2025 10:14

dbyron-sf approved these changes Jan 21, 2025

View reviewed changes

Merge branch 'spinnaker:master' into fix-CleanupAlarmsAgentException

4cf1da0

dbyron-sf added the ready to merge Approved and ready for a merge label Jan 22, 2025

mergify bot added the auto merged Merged automatically by a bot label Jan 22, 2025

mergify bot merged commit c4df136 into spinnaker:master Jan 22, 2025
23 checks passed

christosarvanitis added a commit to armory-io/clouddriver that referenced this pull request Jan 22, 2025

fix(aws): CleanupAlarmsAgent cycle to catch exceptions (spinnaker#6333)

ab103ae

spinnakerbot added the target-release/1.37 label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333

fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333

christosarvanitis commented Jan 17, 2025

dbyron-sf Jan 17, 2025

christosarvanitis Jan 20, 2025

dbyron-sf Jan 20, 2025

christosarvanitis Jan 21, 2025

dbyron-sf Jan 21, 2025

dbyron-sf commented Jan 21, 2025

christosarvanitis commented Jan 22, 2025

fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333

fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333

Conversation

christosarvanitis commented Jan 17, 2025

dbyron-sf Jan 17, 2025

Choose a reason for hiding this comment

christosarvanitis Jan 20, 2025

Choose a reason for hiding this comment

dbyron-sf Jan 20, 2025

Choose a reason for hiding this comment

christosarvanitis Jan 21, 2025

Choose a reason for hiding this comment

dbyron-sf Jan 21, 2025

Choose a reason for hiding this comment

dbyron-sf commented Jan 21, 2025

christosarvanitis commented Jan 22, 2025