Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the Tsavorite checkpoint state machine #1059

Merged
merged 41 commits into from
Mar 7, 2025

Conversation

badrishc
Copy link
Collaborator

@badrishc badrishc commented Feb 28, 2025

  • Use a single task (StateMachineDriver.RunStateMachine) to drive the state machine instead of letting user threads cooperatively run it. This can also reduce tail latency of unlucky user operations that need to drive the state machine.
  • Do not let threads work on prepare and in_progress phases at once. The "checkpoint version switch barrer" option is consequently removed as well. This removes the CPR_SHIFT_DETECTED and LatchDestination.Retry code paths.
  • State machines no longer have a OnThreadState component. We also eliminate the ThreadStateMachine step of individual threads. This aspect of checkpointing was complex and errorprone.
  • Remove LightEpoch's Mark and CheckIsComplete as a result of the thread-level simplifications mentioned above.
  • Checkpoints and the state machine driver sit outside the store now, making it possible to drive multiple stores with the same state machine driver. This will allow us to have a single (v) -> (v+1) switch across the string and object stores in Garnet.
  • Removed atomic switch of session context and the maintenance of two session contexts, since sessions no longer have any work/state associated with checkpoint version switches.
  • Currently, as in main, (v) transactions have to end in prepare, before (v+1) transactions can start in in_progress. However, this PR simplifies the future optimization of allowing (v+1) transactions to start as soon as (v) transactions have acquired all their locks. This will significantly reduce the overhead of the barrier during the checkpoint.
  • Moved EPVS code to test, for use in SimpleVersionSchemeTest and SimulatedFlakyDevice

@badrishc badrishc marked this pull request as ready for review March 1, 2025 01:22
@badrishc badrishc requested a review from TedHartMS March 2, 2025 19:13
badrishc added 6 commits March 5, 2025 16:00
…whether to spin or operate in PREPARE phase. Moved isAcquiredLockable to Ctx. This commit also removes a race that will re-establish the invariant that no threads are operating in PREPARE while any thread is operating in IN_PROGRESS phase.
@badrishc badrishc merged commit 1f58916 into main Mar 7, 2025
18 checks passed
@badrishc badrishc deleted the badrishc/state-machine-v2 branch March 7, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants