Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum parallel replications for pull #20532

Closed
Joseph94m opened this issue May 31, 2024 · 5 comments · May be fixed by #21347
Closed

Maximum parallel replications for pull #20532

Joseph94m opened this issue May 31, 2024 · 5 comments · May be fixed by #21347

Comments

@Joseph94m
Copy link

Joseph94m commented May 31, 2024

Is your feature request related to a problem? Please describe.
It's annoying and causes an avalanche effect when my pull based replications for the same Job start before the previous one is over.
image

It starts off by taking 2 minutes, then it slowly starts piling up until it reaches hours...
image

Describe the solution you'd like
The ability to specify, in the replication configuration:

  • Maximum of replications waiting for an inProgress replication to finish
  • Maximum replications that can be inProgress

Describe the main design/architecture of your solution

image

What do you guys think before we discuss implementation

@stonezdj
Copy link
Contributor

stonezdj commented Jun 3, 2024

The cron string is 0 */2 * * * *, it means that the replication is start every 2 minutes, you should always keep the schedule interval longer than a single job complete time.

@Joseph94m
Copy link
Author

Hello, that's the fallback approach indeed.
But you cannot always guarantee a replication's time due to occasional bulk insertions in the source registry and network variability.

@stonezdj
Copy link
Contributor

stonezdj commented Jun 3, 2024

In your case, most of the InProgress replication job is actually Pending, you could check the job in the job service dashboard, they should in the job queue. you could adjust the job service worker count to control the max Parallel job.

@bupd
Copy link
Contributor

bupd commented Dec 21, 2024

But at this point we can clearly see. It creates huge load on system and lot of wasted bandwidth.
It is better to have a replication skip option.

@bupd
Copy link
Contributor

bupd commented Jan 9, 2025

The cron string is 0 */2 * * * *, it means that the replication is start every 2 minutes, you should always keep the schedule interval longer than a single job complete time.

The workaround of setting a longer replication interval, like once a day, fails to address the need for timely synchronization across registries. For users who rely on Harbor to maintain identical registries at different locations, frequent replication (e.g., every 5 minutes) is necessary to ensure minimal discrepancies between registries. By suggesting a longer interval, users may end up with outdated or inconsistent images, undermining the core functionality of replication.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants