Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs not being worked on #476

Closed
thisIsLoading opened this issue Jan 2, 2025 · 9 comments
Closed

Jobs not being worked on #476

thisIsLoading opened this issue Jan 2, 2025 · 9 comments

Comments

@thisIsLoading
Copy link

i have (i believe after one of the latest updates, but cant tell) blocked jobs like this

image

they are being scheduled like this:

production:
  predicted_funding_data:
    command: Funding::FundingRate.new.sync_all_predicted_funding_rates
    schedule: '* * * * *'

which basically just schedules a bunch of jobs of these:

class Funding::SyncPredictedFundingRatesJob < ApplicationJob
  limits_concurrency to:       10,
                     duration: 1.second,
                     key:      ->(exchange_name, perp_pair) { "Exchange_#{exchange_name}_PredictedFundingRate" }

  def perform(exchange_name, perp_pair)
    exchange = Exchange.find_by(name: exchange_name)

    # Initialize the appropriate service
    funding_rates_service = Funding::FundingRate.new(exchange.name)

    # Synchronize funding rates
    funding_rates_service.sync_predicted_funding_rate(exchange, perp_pair)
  end
end

the thing is, the first couple hundred jobs are being executed just fine but after a while it just stops and my workers are idleing:

image

until the next run which makes the workers wake up and process a couple hundred jobs (not all) and then go back idle again, after a couple hundred.

this was working just fine until like 2 or so weeks ago.

i am using solid queue v1.1.2

is there anything i do wrong?

@rosa
Copy link
Member

rosa commented Jan 2, 2025

What are you setting for concurrency_maintenance_interval in your dispachers configuration?

@thisIsLoading
Copy link
Author

hi @rosa , thank you for coming back to me that quick!

i dont have that set, this is my queue.yml

default: &default
  dispatchers:
    - polling_interval: 1
      batch_size: 500
  workers:
    - queues: "*"
      threads: 5
      processes: 5
      polling_interval: 0.1

development:
  <<: *default
  workers:
    - queues: "*"
      threads: 5
      processes: 5 # Set to 5 processes in development
      polling_interval: 0.1

test:
  <<: *default

production:
  <<: *default

@rosa
Copy link
Member

rosa commented Jan 2, 2025

Ahh, then that means you're using the default, which is 300 seconds. This means that if there are blocked jobs that didn't get unblocked by previous jobs for any reason (and the reason very well could be this race condition #456 (comment)), it might be 300 seconds until they're unblocked by the dispatcher, as explained in the docs. That's very long for a concurrency duration of 1 second, and it's even longer than each scheduled run of the recurring task, so it doesn't have any effect.

I imagine you're using concurrency controls there to throttle these jobs. I'd recommend against that, and just using the number of workers adjusted to the number of jobs you want to run simultaneously, it's going to be much more efficient in your case, with that short duration. Instead of having 5 workers with 5 threads doing everything, you could have a dedicated single worker for just those jobs, and adjust the thread number so that ~10 jobs per second are run.

@thisIsLoading
Copy link
Author

yeah, see the job above, i had it limited to like 10 per 1.seconds - which was working just fine but it might very well be the race condition you mentioned.

not sure if i am willing to have one worker dedicated to this just yet. it feels a little hacky trying to limit concurrency by basically closing the tap so that it just drips. ;)

with that said, i commented out concurency definition for a test and can confirm that it now goes full steam ahead, so it certainly seems as if that is the culprit.

@rosa
Copy link
Member

rosa commented Jan 2, 2025

Oh, I don't think it's hacky at all; I think it's the simplest and most efficient way to do it without all the overhead from concurrency controls, which were initially intended for cases where you really don't want jobs to overlap, normally for logic reasons, not resource allocation reasons.

If you want to use concurrency controls, you'll need to lower concurrency_maintenance_interval significantly.

@thisIsLoading
Copy link
Author

could i set this all the way down to 1 second?

@rosa
Copy link
Member

rosa commented Jan 2, 2025

Yes, you can set it to 1 second, but bear in mind you'll be making queries to unblock jobs of this kind every second:

SELECT
  DISTINCT `solid_queue_blocked_executions`.`concurrency_key`
FROM
  `solid_queue_blocked_executions`
WHERE
  `solid_queue_blocked_executions`.`expires_at` < '2025-01-02 13:57:03.398830'
LIMIT
  500

@thisIsLoading
Copy link
Author

thisIsLoading commented Jan 2, 2025

thanks so much @rosa . really appreciating your help here, especially the super quick response. thats unparalleled

feel free to close this if you think its basically just a dupe of #456

thanks again

@rosa
Copy link
Member

rosa commented Jan 2, 2025

Yes! I'll close this one to work on #456 as I'm certain the cause for the jobs not being unblocked by previous jobs is that race condition, especially with jobs being all enqueued at the same time by your recurring job.

Thank you!

@rosa rosa closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants