Jobs not being worked on #476

thisIsLoading · 2025-01-02T14:51:01Z

i have (i believe after one of the latest updates, but cant tell) blocked jobs like this

they are being scheduled like this:

production:
  predicted_funding_data:
    command: Funding::FundingRate.new.sync_all_predicted_funding_rates
    schedule: '* * * * *'

which basically just schedules a bunch of jobs of these:

class Funding::SyncPredictedFundingRatesJob < ApplicationJob
  limits_concurrency to:       10,
                     duration: 1.second,
                     key:      ->(exchange_name, perp_pair) { "Exchange_#{exchange_name}_PredictedFundingRate" }

  def perform(exchange_name, perp_pair)
    exchange = Exchange.find_by(name: exchange_name)

    # Initialize the appropriate service
    funding_rates_service = Funding::FundingRate.new(exchange.name)

    # Synchronize funding rates
    funding_rates_service.sync_predicted_funding_rate(exchange, perp_pair)
  end
end

the thing is, the first couple hundred jobs are being executed just fine but after a while it just stops and my workers are idleing:

until the next run which makes the workers wake up and process a couple hundred jobs (not all) and then go back idle again, after a couple hundred.

this was working just fine until like 2 or so weeks ago.

i am using solid queue v1.1.2

is there anything i do wrong?

The text was updated successfully, but these errors were encountered:

rosa · 2025-01-02T14:53:00Z

What are you setting for concurrency_maintenance_interval in your dispachers configuration?

thisIsLoading · 2025-01-02T14:55:11Z

hi @rosa , thank you for coming back to me that quick!

i dont have that set, this is my queue.yml

default: &default
  dispatchers:
    - polling_interval: 1
      batch_size: 500
  workers:
    - queues: "*"
      threads: 5
      processes: 5
      polling_interval: 0.1

development:
  <<: *default
  workers:
    - queues: "*"
      threads: 5
      processes: 5 # Set to 5 processes in development
      polling_interval: 0.1

test:
  <<: *default

production:
  <<: *default

rosa · 2025-01-02T15:00:27Z

Ahh, then that means you're using the default, which is 300 seconds. This means that if there are blocked jobs that didn't get unblocked by previous jobs for any reason (and the reason very well could be this race condition #456 (comment)), it might be 300 seconds until they're unblocked by the dispatcher, as explained in the docs. That's very long for a concurrency duration of 1 second, and it's even longer than each scheduled run of the recurring task, so it doesn't have any effect.

I imagine you're using concurrency controls there to throttle these jobs. I'd recommend against that, and just using the number of workers adjusted to the number of jobs you want to run simultaneously, it's going to be much more efficient in your case, with that short duration. Instead of having 5 workers with 5 threads doing everything, you could have a dedicated single worker for just those jobs, and adjust the thread number so that ~10 jobs per second are run.

thisIsLoading · 2025-01-02T15:16:33Z

yeah, see the job above, i had it limited to like 10 per 1.seconds - which was working just fine but it might very well be the race condition you mentioned.

not sure if i am willing to have one worker dedicated to this just yet. it feels a little hacky trying to limit concurrency by basically closing the tap so that it just drips. ;)

with that said, i commented out concurency definition for a test and can confirm that it now goes full steam ahead, so it certainly seems as if that is the culprit.

rosa · 2025-01-02T15:19:13Z

Oh, I don't think it's hacky at all; I think it's the simplest and most efficient way to do it without all the overhead from concurrency controls, which were initially intended for cases where you really don't want jobs to overlap, normally for logic reasons, not resource allocation reasons.

If you want to use concurrency controls, you'll need to lower concurrency_maintenance_interval significantly.

thisIsLoading · 2025-01-02T15:20:23Z

could i set this all the way down to 1 second?

rosa · 2025-01-02T15:24:29Z

Yes, you can set it to 1 second, but bear in mind you'll be making queries to unblock jobs of this kind every second:

SELECT
  DISTINCT `solid_queue_blocked_executions`.`concurrency_key`
FROM
  `solid_queue_blocked_executions`
WHERE
  `solid_queue_blocked_executions`.`expires_at` < '2025-01-02 13:57:03.398830'
LIMIT
  500

thisIsLoading · 2025-01-02T15:26:27Z

thanks so much @rosa . really appreciating your help here, especially the super quick response. thats unparalleled

feel free to close this if you think its basically just a dupe of #456

thanks again

rosa · 2025-01-02T15:27:54Z

Yes! I'll close this one to work on #456 as I'm certain the cause for the jobs not being unblocked by previous jobs is that race condition, especially with jobs being all enqueued at the same time by your recurring job.

Thank you!

rosa closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs not being worked on #476

Jobs not being worked on #476

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025 •

edited

Loading

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025 •

edited

Loading

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025

thisIsLoading commented Jan 2, 2025 •

edited

Loading

rosa commented Jan 2, 2025

Jobs not being worked on #476

Jobs not being worked on #476

Comments

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025 • edited Loading

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025 • edited Loading

thisIsLoading commented Jan 2, 2025

rosa commented Jan 2, 2025

thisIsLoading commented Jan 2, 2025 • edited Loading

rosa commented Jan 2, 2025

rosa commented Jan 2, 2025 •

edited

Loading

rosa commented Jan 2, 2025 •

edited

Loading

thisIsLoading commented Jan 2, 2025 •

edited

Loading