Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve time window to become cron expression based scheduler #351

Merged
merged 1 commit into from
Mar 2, 2023

Conversation

gthao313
Copy link
Member

@gthao313 gthao313 commented Dec 5, 2022

Right now, we only support time window scheduler to allow brupop update on specific a time slot around a day. Cron expression would be a better choice to enhance time window scheduler, so users are more flexible to schedule maintenance time window on their purpose.

Issue number:
Closes: #343

Description of changes:

Author: Tianhao Geng <[email protected]>
Date:   Thu Dec 1 01:43:17 2022 +0000

    improve time window to become cron expression based scheduler

    Right now, we only support time window scheduler to allow brupop update
    on specific a time slot around a day. Cron expression would be a better
    choice to enhance time window scheduler, so users are more flexible to
    schedule maintenance time window on their purpose.

cron expression is a time-based job scheduler user to run a specified job periodically at fixed times or periods of time (time window).
fixed times: 0 10 * * * */At 10:00 on Monday.
periods of time (time window): * 10-12 * * mon */At every minute on every Monday from 10 through 12

brupop controller needs to use different logics to deal with fixed times or periods of time (time window)
=> fixed times: trigger brupop update and complete all waitingForUpdate nodes.
=> periods of time (time window): trigger brupop update within time window. If current time isn't within the time window,
controller shouldn't have any action on it.

Testing done:

  • integration test on fix times
  • integration test on time window

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@gthao313 gthao313 force-pushed the scheduler branch 2 times, most recently from f9e8397 to 3681b7e Compare December 5, 2022 20:43
@gthao313 gthao313 marked this pull request as ready for review December 13, 2022 19:00
@gthao313
Copy link
Member Author

Push above address Sean's comments.

Todo: Still working on backward compatibility.

Copy link
Contributor

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is amazing!! Nice work!

Were there planned changes to the bottlerocketshadow CRD? I thought I heard mention of that somewhere, but with this change (and what makes the most sense to me) is the controller handles all of it.

@cbgbt
Copy link
Contributor

cbgbt commented Jan 18, 2023

Were there planned changes to the bottlerocketshadow CRD? I thought I heard mention of that somewhere, but with this change (and what makes the most sense to me) is the controller handles all of it.

This was something I said aloud, but at the time I think I had forgotten some of the details of how this is implemented. My mistake! Shouldn't need changes to the CRD.

@gthao313
Copy link
Member Author

gthao313 commented Jan 24, 2023

Push above address backwards compatibility and deprecation notice of time window functionality.
Major update:

  • implement two paths run_with_update_time_window and run_with_cron_scheduler to run differnernt logics when users set update time window or cron expression scheduler.
  • give cron expression scheduler a higher priority if users set update time window and cron expression scheduler at same time.

Future work: deprecate update time window

@gthao313 gthao313 requested review from cbgbt, webern and jpmcb January 24, 2023 00:44
Copy link
Contributor

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks awesome!! Getting super close!!


Testing done

Built and tested with new image using the old time window:

- name: UPDATE_WINDOW_START
  value: 22:10:0
- name: UPDATE_WINDOW_STOP
  value: 22:30:0

where my current time was 22:00:0 UTC.

I saw the following logs:

2023-01-24T22:02:49.449903Z  INFO controller::controller: Calculating if current time is within update time window.
at controller/src/controller.rs:611
in controller::controller::within_time_window with start: "23:10:0", stop: "23:30:0"

and at 22:10:0 UTC I saw updates performed:

❯ k get brs -A
NAMESPACE                 NAME                                                STATE                VERSION   TARGET STATE       TARGET VERSION   CRASH COUNT
brupop-bottlerocket-aws   brs-ip-192-168-146-85.us-west-2.compute.internal    Idle                 1.11.1    Idle                                0

Built and tested with a new image using the cron expression:

- name: SCHEDULER_CRON_HOURS
  value: '23'

where my current time was just before 23:00:0 UTC. Once it rolled over to that time, the updates started and completed:

❯ k get brs -A
NAMESPACE                 NAME                                                STATE                VERSION   TARGET STATE       TARGET VERSION   CRASH COUNT
brupop-bottlerocket-aws   brs-ip-192-168-136-18.us-west-2.compute.internal    Idle                 1.11.1    Idle                                0

@gthao313 gthao313 force-pushed the scheduler branch 2 times, most recently from f191892 to 82d2833 Compare January 30, 2023 00:34
Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and tests are happy.

Comment on lines 767 to 774
- name: UPDATE_WINDOW_START
value: 0:0:0
- name: UPDATE_WINDOW_STOP
value: 0:0:0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for us to unset one of these. See reasoning around comparison's to the default value.

Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a dumb question. A chron expression can tell us what time the windows starts, but what ends the maintenance window? i.e. what is the duration of the maintenance window?

@gthao313
Copy link
Member Author

gthao313 commented Feb 3, 2023

Possibly a dumb question. A chron expression can tell us what time the windows starts, but what ends the maintenance window? i.e. what is the duration of the maintenance window?

@webern That's a good question. A cron expression can tell us what time the windows starts and ALSO can tell a maintenance window. For example, if you specify cron expression to * * 10-12 * * MON * which means Every Monday between 10:00 and 12:00, this actually is similar to the concept of maintenance window.

@webern
Copy link
Contributor

webern commented Feb 4, 2023

and ALSO can tell a maintenance window

I didn't know that. If the cron expression doesn't specify a duration, then how do you know when the maintenance closes. Are you defaulting to e.g. a 1-hour maintenance window?

@gthao313
Copy link
Member Author

gthao313 commented Feb 4, 2023

then how do you know when the maintenance closes. Are you defaulting to e.g. a 1-hour maintenance window?

If users specify a fix time on cron expression like running brupop on 1am every monday. Brupop will start update on that time and complete all waitingForUpdate nodes.
If users specify cron expression - maintenance window, we have mechanism to find users currently are using maintenance window and then trigger the function within_time_window() to monitor when the maintenance closes. If the maintenance closes, brupop controller will stop to find new nodes to progress.

Copy link
Contributor

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor! I think this approach makes sense. Overall, I'd like to see us get this across the line soon since the context on this PR has gotten really deep. I'm ok merging it and iterating further as we go!

("0:0:0", "5:0:0", "* * 0-5 * * * *"),
("21:0:0", "8:30:0", "* * 21-23,0-8 * * * *"),
("15:0:0", "3:30:34", "* * 15-23,0-3 * * * *"),
("0:0:0", "0:0:0", "* * 0-0 * * * *"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is "always"? The full 24 hours?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what do you mean by "always" here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to wrap my head around 0-0 in the hour part 😬

* * 0-0 * * * * means 24/7?

Copy link
Contributor

@cbgbt cbgbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought up a suggestion on how we handle the former UPDATE_WINDOW_* values -- I think we should continue to support them at runtime.

I agree with John though, nice work on the refactor here. I'm happy to followup with fixes to that separately with a fresh context.

@gthao313
Copy link
Member Author

Push above

  • continue support UPDATE_WINDOW_* settings on runtime.

@gthao313 gthao313 requested review from jpmcb and cbgbt February 22, 2023 22:11
@gthao313
Copy link
Member Author

discuss with sean offline. we agree that We should consider .env as tool that only brupop developers work with, and we should expect users directly modify yaml file. Therefore, we should add support them at runtime. We should expect various situations like users specify the update time window and scheduler at same time, no update time window and scheduler variables are provided, and else.

Major changes

  • remove update time window from .env and yaml since we suggest to use scheduler but keep on README for guiding the users who still want to use update time window.
  • add legacywindow struct and variable handling logic on scheduler
  • address matt's comments.

Test done

  • specify update time window and scheduler at same time
- name: SCHEDULER_CRON_EXPRESSION
  value: '* * 3 * * * *'
- name: UPDATE_WINDOW_START
  value: "09:00:00"
- name: UPDATE_WINDOW_STOP
  value: "21:00:00"


  2023-02-24T08:56:40.903479Z ERROR controller: controller exited
    at controller/src/main.rs:110

Error: ControllerError { source: GetCronSchedule { source: DisallowSetTimeWindowAndScheduler } }
  • use update time window but missing one of them
- name: UPDATE_WINDOW_START
  value: "09:00:00"
Error: ControllerError { source: GetCronSchedule { source: MissingTimeWindowVarible { message: "update window start variable is provided but missing update time stop variable" } } }

- name: UPDATE_WINDOW_STOP
   value: "21:00:00"
Error: ControllerError { source: GetCronSchedule { source: MissingTimeWindowVarible { message: "update window stop variable is provided but missing update time start variable" } } }
  • only have update time window or cron expression scheduler
    those worked as expected.
brs-ip-192-168-101-132.us-west-2.compute.internal   Idle                       1.10.0    Idle                 <no value>       0
brs-ip-192-168-109-217.us-west-2.compute.internal   StagedAndPerformedUpdate   1.10.0    RebootedIntoUpdate   1.12.0           0
brs-ip-192-168-99-103.us-west-2.compute.internal    Idle                       1.10.0    Idle                 <no value>       0
  • missing both of them
    brupop did work by following default scheduler * * * * * * *

@gthao313 gthao313 force-pushed the scheduler branch 2 times, most recently from d79cea5 to 11d54ec Compare February 24, 2023 09:30
@gthao313 gthao313 requested review from webern and jpmcb February 24, 2023 09:31
Comment on lines 80 to 82
fn cron_expression_converter(&self) -> String {
let start_hour: u8 = self.start_time.split(":").next().unwrap().parse().unwrap();
let stop_hour: u8 = self.end_time.split(":").next().unwrap().parse().unwrap();
Copy link
Contributor

@cbgbt cbgbt Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using unwrap(), it's not appropriate to panic on malformed input.

I think if the validate statements above pass, then this should never fail, but I don't see where we ever call validate(). We should probably call validate() at the start of this method. Then there are two ways we could improve the use of unwrap():

  • (preferred) Modify the function to return a Result and remove any unwraps(). This is the most future-proof -- leaving unwrap() leaves room for the function to panic if our preconditions ever change in the future.
  • Add a comment to this function explaining that the unwrap() code panicking is not possible due to the validation.

Comment on lines 37 to 39
match env::var(UPDATE_WINDOW_START_ENV_VAR) {
Ok(update_window_start) => {
match env::var(UPDATE_WINDOW_STOP_ENV_VAR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent you some tips on how to improve the readability of this and drastically reduce nesting. To summarize here, I think it's a good idea to handle these nested match statements all at once:

let start_env_var = env::var(UPDATE_WINDOW_START_ENV_VAR);
let stop_env_var = env::var(UPDATE_WINDOW_STOP_ENV_VAR);
match (start_env_var, stop_env_var) {
    (Err(_), Ok(_)) | (Ok(_), Err(_)) => {
    },
    // etc
}

// regex format: HH:MM:SS
lazy_static! {
pub(crate) static ref VALID_UPDATE_TIME_WINDOW_VARIABLE: Regex =
Regex::new(r"^((?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$)").unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have to continue to support the single-digit timestamps here for compatibility reasons.

README.md Outdated
Comment on lines 147 to 148
#### Set an Update Time Window - DEPRECATED
Note!! that these settings are deprecated and will be removed in a future release. You should use the Scheduler settings below. If you still decide to use these settings, please use "hour:00:00" format only instead of "HH:MM:SS".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match the code style used elsewhere in this document.

Suggested change
#### Set an Update Time Window - DEPRECATED
Note!! that these settings are deprecated and will be removed in a future release. You should use the Scheduler settings below. If you still decide to use these settings, please use "hour:00:00" format only instead of "HH:MM:SS".
#### Set an Update Time Window - DEPRECATED
Note!! that these settings are deprecated and will be removed in a future release.
You should use the Scheduler settings below.
If you still decide to use these settings, please use "hour:00:00" format only instead of "HH:MM:SS".

`SCHEDULER_CRON_EXPRESSION` can be used to specify the scheduler in which updates are permitted.
When `SCHEDULER_CRON_EXPRESSION` is "* * * * * * *" (default), the feature is disabled.

To enable this feature, go to `bottlerocket-update-operator.yaml` and change `SCHEDULER_CRON_EXPRESSION` to a valid cron expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instruction will change once helm is merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. Maybe @jpmcb has more context on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I'm planning to rebase all this on my PR once this gets in 😄

Right now, we only support time window scheduler to allow brupop update
on specific a time slot around a day. Cron expression would be a better
choice to enhance time window scheduler, so users are more flexible to
schedule maintenance time window on their purpose.
Copy link
Contributor

@cbgbt cbgbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@gthao313 gthao313 merged commit e6bd7fe into bottlerocket-os:develop Mar 2, 2023
@gthao313 gthao313 deleted the scheduler branch March 2, 2023 22:01
@gthao313 gthao313 mentioned this pull request Jul 7, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve time window to use cron expression.
5 participants