Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock likely being released on close of other FDs (Unix) #6

Open
brunoczim opened this issue Oct 17, 2021 · 6 comments
Open

Lock likely being released on close of other FDs (Unix) #6

brunoczim opened this issue Oct 17, 2021 · 6 comments

Comments

@brunoczim
Copy link
Owner

Test other_process_but_curr_reads fails.

It attempts to read the locked file into a string (in the same process, with std::fs::read_to_string) and then spawns another process to make it try to lock the file. It would be expected that the process would fail to lock, but it actually succeeds.

I've made other tests, and without opening the file again and reading it, the lock works. I believe the bug is not in reading nor opening, but closing the file.

Manual page for lock seems to specify that it is implemented on top of fnctl on Linux. Also, manual page for close seems to specify that close will revoke any locks via fnctl on the file, no matter if a different FD to that file was closed.

@brunoczim
Copy link
Owner Author

It seems like we could have some control over other LockFiles, but not on other APIs, such as std::fs.

@brunoczim
Copy link
Owner Author

Perhaps we could use flock?

The problem is that is seems to be non-POSIX. However, libc exports it unconditionally.

@brunoczim
Copy link
Owner Author

I'm likely publishing a new version of this library with flock instead of lockf. However, I'd like to mature this idea before.

Is 'flock` not-by-process in every platform? Or is it just Linux and MacOS?

Does libc supported platforms really support flock?

@soulmachine
Copy link

soulmachine commented Oct 19, 2021

lockf is POSIX compliant and works on NFS, see https://gavv.github.io/articles/file-locks/#lockf-function

@brunoczim
Copy link
Owner Author

Yeah, but lockf has this "per-process issue", where I lock the file "foo" with one FD and loses it because someone else in the same process closed the same file but with another (unrelated) FD.

I intend to replace lockf by flock because flock does not seem to have such problem.

What makes me think twice about flock is that it's not POSIX, but rather implemented by quite a few Unix-like systems (like Linux, BSDs, MacOS, etc).

But yeah as I said earlier, libc crate exposes flock unconditionally for Unix targets. So, I believe flock is supported by all Unix targets supported by Rust. I am likely using it.

@polarathene
Copy link

polarathene commented Nov 16, 2021

I don't know this stuff too well to contribute much to the discussion, but while searching for Rust crates I did come across a resource on the topic that might be helpful for this issue/decision.

Maybe you've already seen it? 😅 (EDIT: I'm only now looking over the earlier link (https://gavv.github.io/articles/file-locks/#lockf-function) and noticing it covers similar details pretty well, so the information below probably doesn't contribute much, apologies!)


Expand for previous content

This Dec 2010 article (with 2014 and 2015/2019 updates) details 3 types of locking and their pro/cons as well as cross-platform issues (some since resolved):

Other than simple lockfiles, which I won't go into (but which you might just want to use from now on, after reading this article :)), there are three Unix file locking APIs: flock(), fcntl(), and lockf().

POSIX is also, apparently, unclear on whether lockf() locks are the same thing as fcntl() locks or not.
On Linux and MacOS, they are documented to be the same thing. (In fact, lockf() is a libc call on Linux, not a system call, so we can assume it makes the same system calls as fcntl().)
I recommend you avoid lockf().

fcntl() locks are much more powerful than flock() locks.

fcntl() locks have two very strange behaviours. The first is merely an inconvenience:
fcntl() locks are not shared across fork().

The second strange behaviour of fcntl() locks is this:
The lock doesn't belong to a file descriptor, it belongs to a (pid,inode) pair.
Worse, if you close any fd referring to that inode, it releases all your locks on that inode.

Also beware of flock(), which on some systems is implemented as a call to fcntl().
Note that since flock() doesn't work on NFS, and fcntl() doesn't work on SMB fs, that there is no locking method that works reliably on all remote filesystems.

Using the logic above, I settled on fcntl() locks.
...
Super Short Summary:

  • Don't use flock() (python: fcntl.flock()) because it's not in POSIX and it doesn't work over NFS.
  • Don't use lockf() (python: does not exist) because it's not in BSD, and probably just wraps fcntl().
  • Don't use fcntl() (python: fcntl.lockf()) because it doesn't work over SMB on MacOS, and actually, on MacOS it doesn't work right at all. (see 2019 update below)
    ...
    Are we having fun yet? I guess lockfiles are the answer after all.

Update 2019/01/02:

  • Progress! In 2015, Linux gained a new kind of fcntl() lock, F_OFD_SETLK ("Open File Description" locks), which work the way you might have expected fcntl() locks to work in the first place.
    Hopefully other OSes will copy this feature eventually. You can read about it here: https://lwn.net/Articles/640404/.
  • Meanwhile, Windows 10 WSL (Windows Subsystem for Linux) has come out, and it "implements" fcntl() locks by just always returning success, leading to file corruption everywhere.
  • And, some time after this article was written, MacOS was fixed, so now fcntl() locks seem to work as expected.

Those are the highlights that stood out to me, but there's plenty of info on gotchas the author experienced between C and Python versions of their code across platforms. Things may have improved since then however (eg: Python 3 and Windows 10 WSL). The SMB gotcha cited was specific to macOS IIRC.


So while switching from fcntl to flock may resolve some issues, you might be trading them for others.

It seems you might want to instead stick with fcntl() and use F_OFD_SETLK like the article update at the end mentions. Where the caveat about compatibility with older systems is probably less of a concern these days and can be clearly stated upfront in docs/README? I don't know how well that works with macOS and Windows though.

The referenced LWN article also provided some good insights, and describes the OFD addition as blending the benefits of fcntl()(byte range locks, not whole file locks) with flock() (lock released when all open references / FDs are closed, not by any single reference alone). When that's not an option, I guess flock() is fine, especially if you only need/expect to perform a whole file lock anyway?


EDIT: The earlier comment providing a different detailed resource on the topic is also quite nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants