-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333
fix(aws): CleanupAlarmsAgent cycle to catch exceptions #6333
Conversation
...er-aws/src/main/groovy/com/netflix/spinnaker/clouddriver/aws/agent/CleanupAlarmsAgent.groovy
Outdated
Show resolved
Hide resolved
} | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a catch block to add "higher up" (where the run method is called) that would (also) help? Like, what if there exceptions in other agents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Higher up there is already a try/catch block in the RunnableAgent
initial implementation but i have pushed a try/catch block in the CleanupDetachedInstancesAgent
as well which implements a RunnableAgent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like run
is called from RunnableAgentExecution.executeAgent and all the places that call executeAgent
are similar to DefaultAgentScheduler. There's a try/catch, but it only updates metrics. So what we get here is some extra logging, but I don't see how it's going to help with agent scheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cleanup agents are going through each account and trying to do a cleanup of any stale alarms in cloudwatch or any detached instances etc. When a bad account is present in the credentials repository the agent stops and doesnt go through the rest of the accounts.
By adding a try/catch here we are logging that there was a problem with X account and continue to the next account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah yes, the try/catch here and in CleanupDetachedInstancesAgent is inside getAccounts().each
. Seems like the consequences of this fix are good...that instead of dying on the first error, we'll continue to clean up other accounts.
7b366a7
to
9b0e768
Compare
9b0e768
to
7d77512
Compare
@christosarvanitis if you bring this branch up to date, I think it's ready to go. |
Done! Thanks @dbyron-sf |
A bad account or any exception currently stops the CleanupAlarmsAgent.
Catching the exception and logging it since the purpose of this agent is to cleanup the stale cloudwatch alarms from the aws/ecs accounts in Spinnaker and not to verify if the accounts are valid or not.