-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too restrictive file mode for mets.xml #403
Comments
The problem is caused by |
https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s25.html shows a portable alternative solution. Do we need atomic writes here at all? Just using I doubt that
|
You can check the code here, it's really short https://github.com/untitaker/python-atomicwrites/blob/master/atomicwrites/__init__.py I don't want to move away from
There's some entry points in atomicwrites to influence this, I'll have a look. See also untitaker/python-atomicwrites#42 |
What is required? Example: Should it be possible to run several Without trying it I would say it cannot work: Task 1 reads the data, task 2 reads / modifies / writes the data, task 1 modifies / writes the data and overwrites the data which was written by task 2. I did not see locks to prevent that scenario, and atomic_write would not help here. |
I checked it before writing that I have doubts. |
One of the reasons for this was safe SIGINT (ctrl+c) IIRC. |
SIGINT handling could be done with Maybe it would even make sense to block SIGINT while processing a page and handle it only after a page was processed. |
That was my initial approach but I was happy to delegate this to a third-party library that handles the nitty-gritty details. |
I see. So the have several requirements to address:
|
Could also be SIGABRT, SIGTERM, SIGSEGV, SIGIO, SIGPIPE etc.
No, ctrl+c should have immediate effect! Think of a badly written processor... I don't think parallel access is really the issue here – as you said yourself, with the current API, which allows reading METS separate/independent from the modifications to be made, this is impossible, and we could make use of that kind of parallelization only in non-linear workflows anyway. |
Of course only few of those can occur during a file write.
If we agree that parallel access is not an issue but simply unsupported for the moment, that simplifies things a lot and we only have to document it clearly. But some day people will ask whether on |
Yes, page pipelining is another desideratum for a better API. But for certain kinds of GPU-dependent processors, document/book-parallel processing (with a data generator that queues entire books into batches) might be the only efficient option. Right now all that processors can do is parallelize across the line or page (which is not possible for stateful/contiguous processors). I would also like to see this recurring theme documented and accessible (perhaps as a permament issue in spec)... |
This is still an issue:
The strace protocol shows that first |
See pull request #608 for a possible fix. |
As the discussion on untitaker/python-atomicwrites#42 shows, that might create another race IIUC. |
As far as I see that race condition is only relevant for massiv parallel writes to the METS file. I currently don't have such use cases, and I am afraid that |
Ah, right, I remember now. We only use this for atomic signal handling, not against potential races. |
Fixed by #625 |
When
mets.xml
is created or modified byocrd
, the new file mode only allows access by the owner which is typically not sufficient for practical use:So it is always necessary to add a
chmod
command after processing with OCR-D.The code should use the default file mode which respects umask.
The text was updated successfully, but these errors were encountered: