Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility tests for determinism from restart (as well as cold start) #277

Open
aekiss opened this issue Feb 19, 2025 · 5 comments
Open
Assignees

Comments

@aekiss
Copy link
Contributor

aekiss commented Feb 19, 2025

Before we can make sense of any other reproducibility checks we need to establish that the model is deterministic, i.e. two runs of the same config from the same initial condition will produce the same result. This is not guaranteed: #40

As discussed in today's OSIT catchup, I think we actually need two tests here:

  1. two identical runs starting from a cold start
  2. two identical runs starting from the same restart

I think these are distinct, because cold starts and restarts exercise differing parts of the code, so passing 1 does not guarantee passing 2.

I understand we're already testing 1 (#41), but we're not testing 2. I think we should also test 2 so we have a solid basis for interpreting other types of reproducibility tests (e.g. reproducibility across restarts #266).

@dougiesquire
Copy link
Collaborator

I've decided to expedite the overhaul of the ACCESS-OM3 CI repro tests because, despite what I said in the meeting, the current test actually isn't fit-for-purpose for restart repro. I'll try to add your test 2 as part of this.

@dougiesquire dougiesquire self-assigned this Feb 19, 2025
@aekiss
Copy link
Contributor Author

aekiss commented Feb 19, 2025

Thanks @dougiesquire, the overhaul might also be an opportunity to see whether md5 restart hashes can be used #278

@anton-seaice
Copy link
Contributor

2. two identical runs starting from the same restart

There is a test to compare the output from 2 consecutive one day runs with the output from a 2 day run that will never pass if this option 2 test fails.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 19, 2025

I think that's a slightly different situation.

If case 1 passes but case 2 fails, that indicates non-deterministic code in initialisation from restart (but not a cold start).
Conversely, if 1 fails but 2 passes, that indicates non-deterministic code in initialisation from cold start (but not a restart).
If they both fail, non-determinism could be anywhere in the code.

If case 1 passes but case 2 fails, you'd also expect a failure of restart repro (2x1 day vs 1x2day), but having case 1 pass and case 2 fail also eliminates the possibility that the restart repro failure was due to the restarts storing an incomplete/incorrect model state at the end of day 1, and suggests it could be a lack of determinism in initialising from restarts at the start of day 2. [note: this statement is only correct if "passing case 1" means we check md5 hashes of restarts to confirm an identical model state at the end of day 1]

(Also note that 1 and 2 would not be covered by having two identical experiments starting from rest, each consisting of two 1-day runs, and checking the reproducibility at the end of day 1 (case 1) and day 2 (case 2). This is because the restarts would differ if case 1 fails. So to test case 2 independently of case 1, the case 2 runs need to start from identical restarts.)

@dougiesquire
Copy link
Collaborator

I'm not sure 2 is something we need/want to check with every CI test since, as @anton-seaice points out, the 2x1d-vs-1x2d test should fail if when don't have 2. It is definitely something we need to check if the 2x1d-vs-1x2d test fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants