Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Scylla API for restore #4192

Open
wants to merge 14 commits into
base: ml/scylla-api
Choose a base branch
from

Conversation

Michal-Leszczynski
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented Jan 8, 2025

This PR starts using the native Scylla restore API. The main focus points are as following:

  1. We needed to drop and re-create restore run progress table, because we needed to include Scylla task ID as the clustering key. We can't reuse the agent job ID field for this purpose, because its an integer and Scylla task ID is a string. Changing clustering key via alter statement is not allowed, so that's why the table was re-created. This will need to be clearly communicated in the release notes. See 39011de for more info.
  2. This PR introduces batch types. They aim to simplify the decision of which API should be used for restoring given batch, as the integer and versioned SSTables are still restored with the Rclone API. See 2efd8b4 for more info.
  3. This PR also adds new restore metrics: restored_bytes and restore_duration as a counterpart to the already existing download/stream metrics. See 63d029a for more info.

Fixes #4144
Fixes #4137
Fixes #4142
Fixes #4194

@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/restore-scylla-api branch 4 times, most recently from 3475e2f to c735fb2 Compare January 9, 2025 16:18
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/restore-scylla-api branch 2 times, most recently from 55cb381 to a572b27 Compare January 30, 2025 08:19
We need to extend run progress with Scylla task ID.
Moreover, we need it to be a part of the clustering key
Otherwise, we won't be able to store multiple run progresses
with the same remote sstable dir and host in the DB.
That's why we need to re-create the table.
It also allows us to rename manifest path and versioned
progress to more suitable names.
The downside is that users will lose their restore progress
history on upgrade, but this shouldn't harm anybody
if communicated properly.
It's good to have it specified in our codebase.
E.g. the knowledge about the TOC component will
be necessary for native Scylla restore.
This commit extends SSTable structure with
its TOC component, which is needed when using
Scylla restore API.
Moreover, it introduces batch types, which are
also needed for deciding, whether given batch
can be restored with Scylla restore API or
the Rclone API.
It also makes sure that all SSTables within
the same batch belong to the same batch type.
It's better to do it at the start so that:
- we don't need to do it per restore type
- we don't create and immediately return the batch
This commit adds code for using Scylla restore API.
Luckily for us, handling pause/resume
is analogous to the Rclone API handling.

Fixes #4144
Fixes #4137
With native Scylla restore, we can gradually update
restored bytes progress.
This commit adds new metrics:
- restored_bytes
- restore_duration
which are counterparts of the similar
downloaded and streamed metrics.

It also adds new restore state (RestoreStateNativeRestore)
which describes the usage of native Scylla restore
This commit makes TestRestoreTablesPreparationIntegration
check for the proper API used for restore.
ALTER TABLE backup_run_progress ADD scylla_task_id text;

DROP TABLE restore_run_progress;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative to dropping progress history we can try to do smth like:

  1. create temp table
  2. copy data from restore_run_progress to temp table
  3. drop restore_run_progress
  4. create restore_run_progress with new primary key
  5. copy data from temp table to restore_run_progress
  6. drop temp table

It's an additional work, but then we can keep user history and we can safe some time trying to communicate why we need to drop table :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a backup/repair task, I would try harder to preserve task history, but I thought that dropping restore task history is fine for the users. Anyway, I will try to preserve it.

We even have some pkg/schema/migrate pkg.
I'm not familiar with it, but it has some functions for rewriting the table, so it should be rather easy!

)

// batchDispatcher is a tool for batching SSTables from
// Workload across different hosts during restore.
// It follows a few rules:
//
// - all SSTables within a batch have the same have the same batchType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

// ValidateAllDispatched returns error if not all SSTables were dispatched.
func (bd *batchDispatcher) ValidateAllDispatched() error {
bd.mu.Lock()
defer bd.mu.Unlock()

for i, rdp := range bd.workloadProgress.remoteDir {
if rdp.RemainingSize != 0 || len(rdp.RemainingSSTables) != 0 {
failed := rdp.RemainingSize != 0
for _, ssts := range rdp.RemainingSSTables {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly, we can have a situation when RemaningSize is 0, but there are still some RemaningSSTables? How this can happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't. I just wanted to keep on checking both size and sstables count just to be on the safe side (as it was done before).

To be honest, checking size seems less robust (maybe 0 size sstable is not realistic, but perhaps we might encounter some bug where Rclone reports 0 size by mistake).

I think that we can get rid of relying on remaining size and use sstable count instead.
How do you find this idea?

Copy link
Collaborator

@VAveryanov8 VAveryanov8 Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think either one of these will work. if checking the sstables count is more safe, then we can use only it.

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michal-Leszczynski !
I have no comments.

I see that latest scylla-enterprise:nightly-build supports backup/restore API and it's tested against all integration tests, what is v good.

@@ -31,6 +33,8 @@ func NewRestoreMetrics() RestoreMetrics {
remainingBytes: g("Remaining bytes of backup to be restored yet.", "remaining_bytes",
"cluster", "snapshot_tag", "location", "dc", "node", "keyspace", "table"),
state: g("Defines current state of the restore process (idle/download/load/error).", "state", "cluster", "location", "snapshot_tag", "host"),
restoredBytes: g("Restored bytes", "restored_bytes", "cluster", "host"),
restoreDuration: g("Restore duration in ms", "restore_duration", "cluster", "host"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you need or want to use this metric for? What is the purpose of it ?

w.metrics.IncreaseRestoreStreamDuration(w.run.ClusterID, pr.Host, timeSub(pr.RestoreStartedAt, pr.RestoreCompletedAt, timeutc.Now()))
w.metrics.IncreaseRestoreStreamDuration(w.run.ClusterID, pr.Host, timeSub(pr.DownloadCompletedAt, pr.RestoreCompletedAt, timeutc.Now()))
w.metrics.IncreaseRestoredBytes(w.run.ClusterID, pr.Host, b.Size)
w.metrics.IncreaseRestoreDuration(w.run.ClusterID, pr.Host, timeSub(pr.RestoreStartedAt, pr.RestoreCompletedAt, timeutc.Now()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr *RunProgress defines the progress of a single batch (set of SSTables), right ?

w.metrics.IncreaseRestoreDuration(w.run.ClusterID, pr.Host, timeSub(pr.RestoreStartedAt, pr.RestoreCompletedAt, timeutc.Now()))`

Increases the duration after every batch completion, but the context of the metric itself is the restore task executed on a given node.

Do you want to keep this metric to measure the time single node spent on the exact restore (downloading + l&s) ?

That's fine, but what information it provides at the end ? You want to show the "real-time" bandwidth per node ?
Right now, Scylla Manager shows the bandwidth when the sctool is called. What is the benefit of having it in metrics as well ?

On the other hand, it doesn't harm anyone...

OK, Actually it make sense to have possibility of for monitoring the ongoing restore without the need for calling sctool.

And it's done the same for RClone restore.

EDIT: I think it was added for "debug" reason. To see how much time node spent on downloading and how much on streaming, right ? Never the less, it's completely fine to keep them.

)

// batchDispatcher is a tool for batching SSTables from
// Workload across different hosts during restore.
// It follows a few rules:
//
// - all SSTables within a batch have the same have the same batchType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// - all SSTables within a batch have the same have the same batchType
// - all SSTables within a batch have the same batchType

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants