-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.13.16 scour attempts to lstat
a file deleted via another thread, race condition
#103
Comments
Hi Daryl, This looks indeed like a race condition. Your scour.conf has overlapping directory entries (lines 3 and 4 in this case). The scour program launches a thread for each line. Therefore, by the time one thread reads a file under one directory that file may have already been seen in the other thread and deleted. Hence, leaving the first thread wondering and displaying the ERROR above. Note that the age for each directory entry is 2 and 8. Therefore, the missing file under /data/gempak/nexrad must have been age 8 or older. One way of preventing this rare case from happening is to lock the resource. Best regards, |
Greetings, thanks for the response. The age of the file is less than 8, you can see that by the filename timestamp. So yeah, the |
lstat
a file deleted via another thread, race condition
I have seen this occur in the "old" way of scouring as well in LDM 6.13.10 and earlier, but it's now a moot point. |
Perhaps a command line switch could be offered to disable threaded scouring? Or maybe this particular error could be sent to a lower priority log level? |
The new scour program spawns as many threads as there are directory entries (in scour.conf.) Therefore, to make it mono-threaded (without code change) it suffices to provide one directory entry at a time (to ensure non-concurrency.) It is also possible to enforce sequentiality with minor code change and a switch if warranted. Setting this error to a lower priority log level is also possible and only requires minimum code change. |
@akrherz Or one could modify their scour(1) configuration-file to avoid overlapping entries. |
Agreed, but that is brittle as I may add a new folder and forget to add a custom entry for it and very annoying as I have to add one entry for each sub-folder. Additionally, overlapping entries make total sense in my mind. I have a blanket policy for anything in |
I am using LDM 6.13.16 on Centos 8 Stream 64 bit. I've noticed that since the upgrade to this release, I sometimes get errors like the following from
ldmadmin scour
Out of deleting thousands of files, I only see one or two errors reported on some days, but not all. I know that you recently updated scour to use c code and not perl, perhaps there is some threading / race condition with how files are deleted?
The /data path is NFS mounted, so perhaps there is troubles there. I verified that I am only running 1 scour process from cron and this is my scour.conf
Thanks.
The text was updated successfully, but these errors were encountered: