Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR timeout can cause infinite loop? #75

Open
patdunlavey opened this issue Jan 11, 2023 · 8 comments
Open

OCR timeout can cause infinite loop? #75

patdunlavey opened this issue Jan 11, 2023 · 8 comments
Assignees
Labels
bug Something isn't working Post processor Plugins The ones with a ->run() method queue FIFO
Milestone

Comments

@patdunlavey
Copy link
Contributor

We have found that if an ocr process fails, it will throw new RequeueException('I am not done yet. Will re-enqueu myself');, and continue to retry and fail, creating an infinite loop, as seen in this sample watchdog output:
strawberry_runner_ocr_looping.csv

In our case, I believe the issue is that the command is timing out and may be solved by increasing the ocr processor's timeout setting.

Might the general solution, in part at least, be to switch to using DelayedRequeueException?

@patdunlavey
Copy link
Contributor Author

I was able to get around this by increasing the ocr strawberry runner's timeout (to 180 seconds). I had to kill all the duplicate tesseract processes (which I did by restarting the containers) first.

I see that the timed out procs are supposed to be killed here, so I'm not sure why I was seeing a huge stack of identical tesseract processes. Might there be a bug in the kill function?

@DiegoPino
Copy link
Member

@patdunlavey you using SBR 0.5.0?

See https://github.com/esmero/strawberry_runners/commits/0.5.0
A lot of this was fixed (open issues/closed issues)

The number of attempts of failed OCR probably needs to be revisited but killing/etc should work

@patdunlavey
Copy link
Contributor Author

We are running 0.5.0. I agree that it seems like killing the old process should work, so I'll have to see if I can sort out what was going on with all those unkilled tesseract processes. As for endlessly retrying the failed process, that's clearly something to fix. Does DelayedRequeueException seem like it might help? As I understand it (very limited), it simply puts the retry at a later point, rather than immediate. I.e. it doesn't stop the endless retrying, rather, it just lets other queue items run while the item is delayed.

For now, it seems that setting the timeout generously gets us out of this problem. So not a huge priority to deal with this from our vantage point.

@DiegoPino
Copy link
Member

Bc 0.5.0 has been evolving please double check your last commit. Also, can you make sure you are using /usr/local/bin/tesseract and not /usr/bin/tesseract (5 better versus 4, slower)

also notice that reducing the size of the exported PDF might help a lot (see the gs options there)

image

@patdunlavey
Copy link
Contributor Author

Ah, we are using /usr/bin/tesseract, not /usr/local/bin/tesseract. I'll make that change. And the -r150 argument to ghostscript should help a lot.

I think we're on the most recent commit of strawberry_runners:0.5.0.x-dev (dbac9cf07d910dcef3f26d115ced1d5bd774e377)

Thanks!

@DiegoPino
Copy link
Member

@patdunlavey a solution would be to add a catch exception here:
https://github.com/esmero/strawberry_runners/blob/0.5.0/src/Plugin/QueueWorker/AbstractPostProcessorQueueWorker.php#L420

Similar to:

catch (Exception $exception) {
$message_params = [
'@file_id' => $data->fid,
'@entity_id' => $data->nid,
'@message' => $exception->getMessage(),
];
if (!isset($data->extract_attempts)) {
$data->extract_attempts = 0;
$this->logger->log(LogLevel::ERROR, 'Strawberry Runners Processing failed with message: @message File id @file_id at ADO Node ID @entity_id.', $message_params);
}
if ($data->extract_attempts < 3) {
$data->extract_attempts++;
Drupal::queue('strawberryrunners_process_index', TRUE)
->createItem($data);
}
else {
$message_params = [
'@file_id' => $data->fid,
'@entity_id' => $data->nid,
];
$this->logger->log(LogLevel::ERROR, 'Strawberry Runners Processing failed after 3 attempts File Id @file_id at ADO Node ID @entity_id.', $message_params);
}
}

To avoid the queue itself using the "re enqueue" on exception automatically.

A even better solution would be to have a "failed" queue to re enqueue there

that way that queue could be run / inspected manually and the issue would not be lost after 3 attempts

@patdunlavey
Copy link
Contributor Author

That's very helpful @DiegoPino. Would you like to assign this task to me to take a gander at it?

@DiegoPino DiegoPino added bug Something isn't working queue FIFO Post processor Plugins The ones with a ->run() method labels Jan 12, 2023
@DiegoPino DiegoPino added this to the 0.5.0 milestone Jan 12, 2023
@DiegoPino
Copy link
Member

Done, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Post processor Plugins The ones with a ->run() method queue FIFO
Projects
None yet
Development

No branches or pull requests

2 participants