Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runs jobs twice for a couple of times after deployment #376

Open
rctneil opened this issue Oct 9, 2024 · 11 comments
Open

Runs jobs twice for a couple of times after deployment #376

rctneil opened this issue Oct 9, 2024 · 11 comments

Comments

@rctneil
Copy link

rctneil commented Oct 9, 2024

I'm getting so frustrated with this.

EVERY single time i deploy my app and schedule a job, it adds it twice. This typically happens for the first 2 - 3 scheduled jobs after a deployment.

Please help me to get this to stop.

Sooooo annoying

Screenshot 2024-10-09 at 11 57 35

and also, only does this in production. This last deploy, it keeps scheduling twice, I can't get it to only run it once. It's causing mahyem. How do I only get it to run once!?

@rosa
Copy link
Member

rosa commented Oct 9, 2024

Could you provide more details? With only those, I have no idea what's going on. Where are the jobs enqueued, how are they enqueued, what is the code that enqueues them, how does your deployment work... have you actually try to check logs to trace down where these jobs are enqueued and by what?

@rctneil
Copy link
Author

rctneil commented Oct 9, 2024

I'll get further information to post. I admit my post was written with an angry state of mind so let me dig for further info and get back t you.

@lostboy
Copy link

lostboy commented Nov 29, 2024

@rctneil did you find the cause here? We're seeing duplicates when running in production but they're for all jobs

@rosa
Copy link
Member

rosa commented Nov 29, 2024

@lostboy could you provide more information? Do the duplicates have the same Active Job ID (this is an UUID)? Can you see where/how they are enqueued?

@lostboy
Copy link

lostboy commented Nov 29, 2024

@rosa we're running rails 7.2 on ruby 3.2 with solid_queue 1.0.2. It seems not to relate directly to queuing but will also run twice when using perform_now.

production > lwe > SystemStatusJob.perform_now
Performing SystemStatusJob (Job ID: 3e14c80b-16b4-4ada-b702-97dafce0521c) from SolidQueue(default)
ETHON: Libcurl initialized
ETHON: performed EASY effective_url=https://vpc-api-staging-789c-xxxxxx.eu-west-2.es.amazonaws.com/ response_code=200 return_code=ok total_time=0.07849
Performed SystemStatusJob (Job ID: 3e14c80b-16b4-4ada-b702-97dafce0521c) from SolidQueue(default) in 91.22ms
Performing SystemStatusJob (Job ID: 3e14c80b-16b4-4ada-b702-97dafce0521c) from SolidQueue(default)
ETHON: performed EASY effective_url=https://vpc-api-staging-789c-xxxxxx.eu-west-2.es.amazonaws.com/ response_code=200 return_code=ok total_time=0.038962
Performed SystemStatusJob (Job ID: 3e14c80b-16b4-4ada-b702-97dafce0521c) from SolidQueue(default) in 43.55ms
=> [nil, nil]

And when run as perform_later

production > lwe > SystemStatusJob.perform_later
  TRANSACTION (1.2ms)  BEGIN
  SolidQueue::Job Create (3.6ms)  INSERT INTO "solid_queue_jobs" ("queue_name", "class_name", "arguments", "priority", "active_job_id", "scheduled_at", "finished_at", "concurrency_key", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) RETURNING "id"  [["queue_name", "default"], ["class_name", "SystemStatusJob"], ["arguments", "{\"job_class\":\"SystemStatusJob\",\"job_id\":\"f467235f-f7f0-4e64-95e6-45836196f375\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[],\"executions\":0,\"exception_executions\":{},\"locale\":\"en-LWE\",\"timezone\":\"UTC\",\"enqueued_at\":\"2024-11-29T13:21:37.639045656Z\",\"scheduled_at\":\"2024-11-29T13:21:37.638973173Z\"}"], ["priority", 0], ["active_job_id", "f467235f-f7f0-4e64-95e6-45836196f375"], ["scheduled_at", "2024-11-29 13:21:37.638973"], ["finished_at", nil], ["concurrency_key", nil], ["created_at", "2024-11-29 13:21:37.724474"], ["updated_at", "2024-11-29 13:21:37.724474"]]
  TRANSACTION (0.9ms)  SAVEPOINT active_record_1
  SolidQueue::Job Load (2.2ms)  SELECT "solid_queue_jobs".* FROM "solid_queue_jobs" WHERE "solid_queue_jobs"."id" = $1 LIMIT $2  [["id", 321], ["LIMIT", 1]]
  SolidQueue::ReadyExecution Create (1.8ms)  INSERT INTO "solid_queue_ready_executions" ("job_id", "queue_name", "priority", "created_at") VALUES ($1, $2, $3, $4) RETURNING "id"  [["job_id", 321], ["queue_name", "default"], ["priority", 0], ["created_at", "2024-11-29 13:21:37.747478"]]
  TRANSACTION (1.0ms)  RELEASE SAVEPOINT active_record_1
  TRANSACTION (4.1ms)  COMMIT
Enqueued SystemStatusJob (Job ID: f467235f-f7f0-4e64-95e6-45836196f375) to SolidQueue(default)
  TRANSACTION (1.0ms)  BEGIN
  SolidQueue::Job Create (5.7ms)  INSERT INTO "solid_queue_jobs" ("queue_name", "class_name", "arguments", "priority", "active_job_id", "scheduled_at", "finished_at", "concurrency_key", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) RETURNING "id"  [["queue_name", "default"], ["class_name", "SystemStatusJob"], ["arguments", "{\"job_class\":\"SystemStatusJob\",\"job_id\":\"f467235f-f7f0-4e64-95e6-45836196f375\",\"provider_job_id\":321,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[],\"executions\":0,\"exception_executions\":{},\"locale\":\"en-LWE\",\"timezone\":\"UTC\",\"enqueued_at\":\"2024-11-29T13:21:37.758216231Z\",\"scheduled_at\":\"2024-11-29T13:21:37.638973236Z\"}"], ["priority", 0], ["active_job_id", "f467235f-f7f0-4e64-95e6-45836196f375"], ["scheduled_at", "2024-11-29 13:21:37.638973"], ["finished_at", nil], ["concurrency_key", nil], ["created_at", "2024-11-29 13:21:37.758636"], ["updated_at", "2024-11-29 13:21:37.758636"]]
  TRANSACTION (1.0ms)  SAVEPOINT active_record_1
  SolidQueue::Job Load (1.2ms)  SELECT "solid_queue_jobs".* FROM "solid_queue_jobs" WHERE "solid_queue_jobs"."id" = $1 LIMIT $2  [["id", 322], ["LIMIT", 1]]
  SolidQueue::ReadyExecution Create (1.3ms)  INSERT INTO "solid_queue_ready_executions" ("job_id", "queue_name", "priority", "created_at") VALUES ($1, $2, $3, $4) RETURNING "id"  [["job_id", 322], ["queue_name", "default"], ["priority", 0], ["created_at", "2024-11-29 13:21:37.769884"]]
  TRANSACTION (1.2ms)  RELEASE SAVEPOINT active_record_1
  TRANSACTION (4.4ms)  COMMIT
Enqueued SystemStatusJob (Job ID: f467235f-f7f0-4e64-95e6-45836196f375) to SolidQueue(default) at 2024-11-29 13:21:37 UTC
=> #<SystemStatusJob:0x00007fd18b3dc8b8
 @_halted_callback_hook_called=nil,
 @arguments=[],
 @exception_executions={},
 @executions=0,
 @job_id="f467235f-f7f0-4e64-95e6-45836196f375",
 @priority=nil,
 @provider_job_id=322,
 @queue_name="default",
 @scheduled_at=2024-11-29 13:21:37 167503/262144 UTC,
 @successfully_enqueued=true,
 @timezone="UTC">

SystemStatusJob is a very simple class that isnt throwing or anything

@lostboy
Copy link

lostboy commented Nov 29, 2024

@rosa we've narrowed it down to it being an issue with AWS Postgres Aurora. We don't know why that would be yet or whether we can work around.

@rosa
Copy link
Member

rosa commented Nov 29, 2024

Oh wow, what an odd issue! Would you be up to share more details in case it helps someone else?

@lostboy
Copy link

lostboy commented Nov 29, 2024

Oh wow, what an odd issue! Would you be up to share more details in case it helps someone else?

So we're running Postgres 15.4 on Aurora with pg 1.5.9 now on rails 8.0.0 and ruby 3.3.6. By connecting locally to the cluster we can recreate locally. As before perform_later and perform_now ends up putting 2 jobs on the queue that look almost identical. The addition of a timestamp on the 2nd maybe a clue

Enqueued SystemStatusJob (Job ID: f467235f-f7f0-4e64-95e6-45836196f375) to SolidQueue(default)
vs
Enqueued SystemStatusJob (Job ID: f467235f-f7f0-4e64-95e6-45836196f375) to SolidQueue(default) at 2024-11-29 13:21:37 UTC

@rosa
Copy link
Member

rosa commented Nov 29, 2024

Huh, that's indeed really strange! Also perform_now shouldn't touch Solid Queue. It appears in the logs, but that's just what Active Job does in its log subscriber (see the lack of Active Record logs in that case). So this might be something to do with Active Job and how you're running it? Any weird callbacks you have defined?

@lostboy
Copy link

lostboy commented Dec 2, 2024

@rosa you're right, its not solid queue. It seems to be an issue with us broadcasting to a tagged logger for appsignal in an initializer.

if Rails.env.production?
  appsignal_logger = ActiveSupport::TaggedLogging.new(Appsignal::Logger.new("rails"))
  Rails.logger.broadcast_to(appsignal_logger)
end

This caused ActiveJob::Logging#tag_logger to enter here: https://github.com/rails/rails/blob/main/activejob/lib/active_job/logging.rb#L38-L39
Which then triggers an enqueue followed by an enqueue_at. I have not made the full connection yet but I believe this is related to this open issue rails/rails#49745 where the block (in this case enqueues) are trigged as many times as the number of tagged loggers.

Obviously this has nothing to do with solid_queue so from my side this can be closed but I did hijack the thread so perhaps @rctneil is still having issues?

@rosa
Copy link
Member

rosa commented Dec 2, 2024

Oh, interesting! Thanks for letting us know 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants