Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Replication Docs #1055

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions website/docs/cluster/replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,10 @@ In addition, newly configured replicas, added to the cluster, could face longer
The cluster operator can choose between various replication options to achieve a trade-off between performance and durability.
A summary of these options is shown below:

- Main Memory Replication (MMR)
- Fast AOF Truncation (FAT)
This option forces the primary to aggressively truncate the AOF so it does not spill into disk. It can be used in combination with aof-memory option which determines the maximum AOF memory buffer size.
When a replica attaches to a primary with MMR turned on, the AOF is not guaranteed to be truncated which may result in writes being lost.
To overcome this issue MMR should be used with ODC.
When a replica attaches to a primary with FAT turned on, the AOF is not guaranteed to be truncated which may result in writes being lost.
To overcome this issue FAT should be used with ODC.
- On Demand Checkpoint (ODC)
This option forces the primary to take a checkpoint if no checkpoint is available when replica tries to attach and recover. If a checkpoint becomes or was availalbe and the CCRO has not been truncated, then
the primary will lock it to prevent truncation while a replica is recovering. In this case, they AOF log could spill to disk as the AOF in memory buffer becomes full.
Expand Down Expand Up @@ -226,6 +226,25 @@ replica_announced:1
192.168.1.26:7001>
```

# Diskless Replication

When AOF gets truncated, full synchronization requires taking a checkpoint and sending that checkpoint over to the attaching replica.
This operation can be expensive because it involves multiple I/O operations at the primary and replica.
For this reason, we added a variant of full synchronization called diskless replication.
This is implemented using a streaming checkpoint that allows clients to continue issuing read and writes at the primary while attaching replicas synchronize.
To enable diskless replication the server needs to be started with the following flags

--repl-diskless-sync=true
This is used to enable diskless replication

--repl-diskless-sync-delay=\<seconds\>.
This is used to determine how many seconds to wait before starting the full sync, in order to give the opportunity to multiple replicas to attach and receive the streaming checkpoint.

There is no additional requirements to that of using the aforementioned flags in order to leverage diskless replication.
The APIs for mapping replicas remains the same (i.e. CLUSTER REPLICATE, REPLICAOF etc.).

Comment on lines +244 to +245
Copy link
Preview

Copilot AI Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust the verb to 'remain' to agree with the plural subject 'APIs'.

Suggested change
The APIs for mapping replicas remains the same (i.e. CLUSTER REPLICATE, REPLICAOF etc.).
The APIs for mapping replicas remain the same (i.e. CLUSTER REPLICATE, REPLICAOF etc.).

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Note that diskless replication does not take an actual checkpoint.
Hence every time a full sync is performed, the AOF is not automatically truncated (unless FAT flag is used).
This happens to ensure durability in the event of a failure which will not be possible if the AOF gets truncated without a persitent checkpoint.
Copy link
Preview

Copilot AI Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the spelling 'persitent' to 'persistent'.

Suggested change
This happens to ensure durability in the event of a failure which will not be possible if the AOF gets truncated without a persitent checkpoint.
This happens to ensure durability in the event of a failure which will not be possible if the AOF gets truncated without a persistent checkpoint.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
However, the store version gets incremented to ensure consistency across different instances that may be fully synced at different times.
Users can still utilize SAVE/BGSAVE commands or --aof-size-limit to periodically take a checkpoint and safely truncate the AOF.