Missed jobs scheduler not working for retries #21

StoneFrog · 2023-06-15T06:34:16Z

According to docs:

Missed Jobs Periodical Handler
Recommended especially when you don't use Sidekiq Pro's super_fetch. If you dequeue job from Redis and the process is killed (by OOM, for example) then good luck with having the job finished. However, if the job is stored in Postgres, this is not an issue. You can just look for the jobs that look as if they were missed and re-enqueue them. Periodically.

That being said it works only for the case when OOM would happen on first attempt.
If first attempt fails, then failed_at will be set. Then on retry OOM happens. Missed Jobs Scheduler uses:

def missed_jobs(missed_job_policy:)
      jobs_database
        .where(completed_at: nil, dropped_at: nil, failed_at: nil)
        .select { |potentially_missed_job| missed_job_policy.call(potentially_missed_job) }
    end

to find missed jobs, so as failed_at was assigned - it's never rescheduled.

As I understand the biggest challenge is to figure out which missed job qualifies due to backoff, so perhaps this scenario should be addressed on specific application knowing it's internals. On the other hand whenever it's supposed to be retried we could bump execute_at to match next attempt. That way we could easily find jobs that were supposed to be executed already?
That being said - we should at least mention that in documentation as right now it can give false confidence in reliability of this gem.

The text was updated successfully, but these errors were encountered:

Azdaroth · 2023-06-28T14:39:56Z

@StoneFrog How about explicitly assigning "missed" label when rescued for the first time, so that we could use or query, like jobs_database.where(completed_at: nil, dropped_at: nil, failed_at: nil).or(jobs_database.where(completed_at: nil, dropped_at: nil, missed_job: true))? looking for missed jobs is executed every few hours, so having some potential conflicts and risk of duplication and processing the same job twice etc. is very low

StoneFrog · 2023-06-29T06:01:59Z

@Azdaroth what do you mean with "when rescued"? When failed_at is set for the first time?

Assuming it was actually a legit non-recoverable failure won't that interfere with retry? i.e. even though regular sidekiq would push that to retry queue with exponential backoff, we will be executing that every time missed job scheduler is executed?

Azdaroth · 2023-06-29T07:01:09Z

@StoneFrog By "rescued" I mean the one that were picked Missed Jobs Handler.

Maybe it would interfere, but what's the risk here? Some jobs might be executed a couple of times and fail each time I guess (otherwise, it would be executed just once) and it would happen only to the one that were labeled as such.

This of course depends on the frequency of the OOMs but they should not be that common to make more than few % of all jobs (which would already be massive but probably we could live with that).

StoneFrog · 2023-06-29T07:12:22Z

@Azdaroth But that problem is that Missed Job Handler never picked them.

Azdaroth · 2023-06-29T07:31:55Z

@StoneFrog that's why we would need to introduce some extra labelling and change the query. Wouldn't that be enough ?

StoneFrog · 2023-06-29T07:37:42Z

@Azdaroth Maybe I don't fully get you proposal, but you have said that:
you want to label it when "rescued" and by rescued you mean the one that were picked Missed Jobs Handler.

But in described case they will never be picked by handler, so they will never be labeled and this query won't change a thing 😛

StoneFrog added bug Something isn't working documentation Improvements or additions to documentation labels Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed jobs scheduler not working for retries #21

Missed jobs scheduler not working for retries #21

StoneFrog commented Jun 15, 2023 •

edited

Loading

Azdaroth commented Jun 28, 2023

StoneFrog commented Jun 29, 2023

Azdaroth commented Jun 29, 2023

StoneFrog commented Jun 29, 2023

Azdaroth commented Jun 29, 2023

StoneFrog commented Jun 29, 2023

Missed jobs scheduler not working for retries #21

Missed jobs scheduler not working for retries #21

Comments

StoneFrog commented Jun 15, 2023 • edited Loading

Azdaroth commented Jun 28, 2023

StoneFrog commented Jun 29, 2023

Azdaroth commented Jun 29, 2023

StoneFrog commented Jun 29, 2023

Azdaroth commented Jun 29, 2023

StoneFrog commented Jun 29, 2023

StoneFrog commented Jun 15, 2023 •

edited

Loading