Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kill messages from cron #265

Open
oddnoc opened this issue Oct 7, 2019 · 10 comments
Open

kill messages from cron #265

oddnoc opened this issue Oct 7, 2019 · 10 comments

Comments

@oddnoc
Copy link

oddnoc commented Oct 7, 2019

The ProcessJobQueueTask (run through cron) is occasionally but persistently sending emails saying kill: 47765: No such process (different process ID each time). This seems to happen only for the immediate queue. Sorry for the lack of details; I don't see in the code where the task even tries to invoke kill.

Version info

  • symbiote/silverstripe-queuedjobs 4.4.0
  • silverstripe/recipe-cms 4.4.4
@Cheddam
Copy link

Cheddam commented Oct 8, 2019

Hi @oddnoc, thanks for raising this issue! Can you confirm whether this has been occurring for a while, in previous versions of SilverStripe CMS / Queued Jobs, or if it has just started after an update?

@oddnoc
Copy link
Author

oddnoc commented Oct 11, 2019

I'm not sure when this started, sorry. But I did manage to capture a snapshot of what's happening. I have a vagrant box with zfs, so I can roll the database back at will. Using this platform, I captured the output of ps as fast as possible whilst running the queue.

The command I ran, with its output:

./vendor/bin/sake dev/tasks/ProcessJobQueueTask queue=immediate                                    vagrant@diodes
Running Task Symbiote\QueuedJobs\Tasks\ProcessJobQueueTask

[2019-10-11 10:31:57] Running Clean up old jobs from the database and others from queue 1.
[2019-10-11 10:32:00] No new jobs on queue 1
[2019-10-11 10:32:01] No new jobs on queue 1
[2019-10-11 10:32:02] No new jobs on queue 1
[2019-10-11 10:32:03] No new jobs on queue 1
kill: 20615: No such process

I was capturing ps like so:

while true; do  ps aux >> processes;  done

grep 20615 processes yielded many lines, ending with these:

vagrant       20615 12.0  1.7  119584   70808  0  R+   17:32     0:00.79 /usr/local/bin/php /usr/home/vagrant/Project/docroot/vendor/silverstripe/framework/cli-script.php dev/tasks/ProcessJobQue                                                                                                     
vagrant       21338  0.0  0.0     440     352  0  R+   17:32     0:00.00 sh -c kill -9 '20615'

Additional version info:

  • OS: FreeBSD 11.3-RELEASE-p3
  • PHP: 7.2.23

In this environment, the message from kill happens only when a very specific set of QueuedJobDescriptor records exists in the database. Deleting any one of these records prevents the message from appearing. On this box with this set of records, the issue is 100% reproducible. I suspect this is a race condition, because it's happening intermittently in production with quite a different set of records.

+------+-----------------------------------------------------------------------------+---------+---------------------------------------------------------------+---------------------+
| ID   | JobTitle                                                                    | JobType | Implementation                                                | StartAfter          |
+------+-----------------------------------------------------------------------------+---------+---------------------------------------------------------------+---------------------+
| 5249 | Export a CSV of a Gridfield                                                 | 1       | CLIENT\Web\Report\CustomReportSupport\GenerateCSVImmediateJob | NULL                |
| 5250 | Upload parts for /products/power-management/led-drivers/flash-led-drivers/. | 1       | CLIENT\Web\Queuedjob\ProductFolderUploadJob                   | NULL                |
| 5251 | Clean up old jobs from the database                                         | 1       | Symbiote\QueuedJobs\Jobs\CleanupJob                           | NULL                |
| 5252 | Clean up old jobs from the database                                         | 1       | Symbiote\QueuedJobs\Jobs\CleanupJob                           | 2019-08-20 13:07:55 |
+------+-----------------------------------------------------------------------------+---------+---------------------------------------------------------------+---------------------+

@oddnoc
Copy link
Author

oddnoc commented Feb 3, 2020

We've started occasionally receiving more verbose output from cron. I hope this is helpful in tracking this issue down.

kill: 25326: No such process
[2020-01-23 00:05:03] error-log.ERROR: Uncaught Exception Error: "Call to a member function hasMethod() on null" at /var/silverstripe/live/releases/A/docroot/vendor/symbiote/silverstripe-queuedjobs/src/Jobs/DoormanQueuedJobTask.php line 161 {"exception":"[object] (Error(code: 0): Call to a member function hasMethod() on null at /var/silverstripe/live/releases/A/docroot/vendor/symbiote/silverstripe-queuedjobs/src/Jobs/DoormanQueuedJobTask.php:161)"} []

@mfendeksilverstripe
Copy link
Contributor

mfendeksilverstripe commented Jun 14, 2020

As a first thing, I recommend upgrading to symbiote/silverstripe-queuedjobs 4.5 latest.

I couldn't reproduce the issue after applying these two fixes:

@oddnoc Could you please try reproduce the issue after applying those two fixes?

@oddnoc
Copy link
Author

oddnoc commented Jun 16, 2020

@mfendeksilverstripe When I try to merge both fixes, I get conflicts. Could you resolve those into an integration branch?

@mfendeksilverstripe
Copy link
Contributor

That's because one goes to 4.5 and the other one goes to 4. I suggest to cherry pick the commits instead of merging.

@oddnoc
Copy link
Author

oddnoc commented Jun 16, 2020

I had to do more than just cherry-pick. See the pu branch on my fork. I have deployed that branch and will know whether it works in a few days, to allow enough time for some jobs to be run from cron.

@oddnoc
Copy link
Author

oddnoc commented Jun 17, 2020

We got a result sooner than expected:

[2020-06-16 22:40:01] error-log.DEBUG: [2020-06-16 22:40:01] - Queued Jobs - Failed to acquire job lock Failed to read job lock 854 {"file":"/var/silverstripe/live/releases/6/vendor/symbiote/silverstripe-queuedjobs/src/Services/QueuedJobService.php","line":762} []

@mfendeksilverstripe
Copy link
Contributor

This isn't necessarily an issue - see comments near the line where the exception gets throw:

// note that error here may not be an issue as failing to acquire a job lock is a valid state
// which happens when other process claimed the job lock first

This is why this message is debug level. Apart from this, have you noticed any issues?

@oddnoc
Copy link
Author

oddnoc commented Jun 30, 2020

Since that one message, cron has been silent for days and queued jobs have been running normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants