-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Fix dentry of open file handles related to a rename #1874
Conversation
a524a85
to
abf2ff7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @g2flyer)
-- commits
line 2 at r1:
You should add the prefix [LibOS]
.
libos/src/sys/libos_file.c
line 361 at r1 (raw file):
new_dent->inode = old_dent->inode; old_dent->inode = NULL; // also update dentry of any potentially open fd pointing to old_dent
I would suggest adding a new line before the comment.
libos/src/sys/libos_file.c
line 362 at r1 (raw file):
old_dent->inode = NULL; // also update dentry of any potentially open fd pointing to old_dent struct libos_handle_map* handle_map = get_thread_handle_map(NULL);
When we acquire the handle_map we actually assert most of the time that the handle_map was returned. Do we want to add assert here as well?
libos/test/regression/rename_unlink.c
line 185 at r1 (raw file):
int fd = create_file(path1, message1, message1_len);
No need for extra space here.
libos/test/regression/rename_unlink.c
line 224 at r1 (raw file):
// NOTE: below will _not_ run correctly when directly executed unless you run as root. // But it should run properly in gramine when executed as normal user.
Gramine.
Code quote:
gramine
libos/test/regression/rename_unlink.c
line 228 at r1 (raw file):
printf("%s...\n", __func__); int fd = create_file(path1, message1, message1_len);
We don't need to check fd
?
libos/test/regression/rename_unlink.c
line 233 at r1 (raw file):
err(1, "fchown before rename"); if (fchmod(fd, S_IRWXU | S_IRWXG) != 0) // Note: no other! err(1, "fchmod before rename");
I would suggest new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 22 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @g2flyer and @oshogbo)
a discussion (no related file):
For other reviewers: fchown()
fails with ENOENT because it operates through the dentry of the handle, and if it doesn't find the inode associated with the dentry, it returns with ENOENT.
And since during rename()
, the inode of the old dentry is NULLified (see the surrounding code in this PR), we end up in this error path:
gramine/libos/src/sys/libos_file.c
Line 297 in 80377ca
ret = -ENOENT; |
a discussion (no related file):
While I consider this PR generally correct (we must indeed rewire the dentry of the handle), I think the problem with fchmod()
lies elsewhere.
In particular, fchmod()
operates on an FD, i.e. on the LibOS handle, and it's not supposed to be associated with the dentry. I created a PR that fixes this: #1875
Moreover, as explained in the comments, the hdl->dentry
field is a historic artifact which should be actually completely removed, in favor of hdl->inode
. That's also how the Linux kernel works (and how any sane kernel should work). So in general, if our Gramine codebase is ready to get rid of hdl->dentry
, I would very gladly do that instead, and this PR won't be necessary.
libos/src/sys/libos_file.c
line 361 at r1 (raw file):
new_dent->inode = old_dent->inode; old_dent->inode = NULL; // also update dentry of any potentially open fd pointing to old_dent
Please use /*
comments style.
libos/src/sys/libos_file.c
line 362 at r1 (raw file):
Previously, oshogbo (Mariusz Zaborski) wrote…
When we acquire the handle_map we actually assert most of the time that the handle_map was returned. Do we want to add assert here as well?
+1
libos/src/sys/libos_file.c
line 363 at r1 (raw file):
// also update dentry of any potentially open fd pointing to old_dent struct libos_handle_map* handle_map = get_thread_handle_map(NULL); rwlock_write_lock(&handle_map->lock);
Why write lock? Looks like we can do a read lock here.
libos/src/sys/libos_file.c
line 370 at r1 (raw file):
continue; struct libos_handle* handle = fd_handle->handle; if ((handle->dentry != old_dent) || (handle->inode != new_dent->inode))
handle->dentry
must be under the handle->lock
. Otherwise two parallel renames will have a data race: T1 will be preempted right after this check, T2 will update handle->dentry
to a new dentry, and then T1 is scheduled back and updates again. So we loose one update.
But generally, I have a feeling that our codebase was written under the implicit assumption that handle->dentry
is immutable. So there can be more places that don't grab a lock but simply read handle->dentry
field, and these also must be modified -- since this PR introduces a new property that the dentry field is mutable.
libos/test/regression/rename_unlink.c
line 3 at r1 (raw file):
/* SPDX-License-Identifier: LGPL-3.0-or-later */ /* Copyright (C) 2021 Intel Corporation * Paweł Marczewski <pawel@invisiblethingslab.com>
If you want to, add your name to this copyright (and update the year to 2024)
libos/test/regression/rename_unlink.c
line 159 at r1 (raw file):
should_not_exist(path1);
Looks like a random change. Please revert.
libos/test/regression/rename_unlink.c
line 181 at r1 (raw file):
} static void test_rename_follow(const char* path1, const char* path2) {
Why _follow
? Wouldn't it be better to name it _write
or _overwrite
?
libos/test/regression/rename_unlink.c
line 224 at r1 (raw file):
// NOTE: below will _not_ run correctly when directly executed unless you run as root. // But it should run properly in gramine when executed as normal user.
Can you replace //
comments with /*
comments?
We mainly use /*
comment style, but in some files you may see the old convention... But new files should be written with /*
, and new code in files that previously used /*
should also continue using this style.
libos/test/regression/rename_unlink.c
line 228 at r1 (raw file):
Previously, oshogbo (Mariusz Zaborski) wrote…
We don't need to check
fd
?
+1, please check the creation of the file
libos/test/regression/rename_unlink.c
line 230 at r1 (raw file):
int fd = create_file(path1, message1, message1_len); if (fchown(fd, 1, 1))
You should compare it with != 0
, for uniformity with the rest of the code
libos/test/regression/rename_unlink.c
line 230 at r1 (raw file):
int fd = create_file(path1, message1, message1_len); if (fchown(fd, 1, 1))
Can you add more info here:
if (fchown(fd, /*owner=*/1, /*group=*/1) != 0) /* dummy owner/group just for testing */
libos/test/regression/rename_unlink.c
line 232 at r1 (raw file):
if (fchown(fd, 1, 1)) err(1, "fchown before rename"); if (fchmod(fd, S_IRWXU | S_IRWXG) != 0) // Note: no other!
/* note: no "other users" mode bits */
libos/test/regression/rename_unlink.c
line 243 at r1 (raw file):
if (fd < 0) err(1, "open %s", path1);
Looks like you accidentally moved this check way after create_file()
to which it belongs.
libos/test/regression/rename_unlink.c
line 251 at r1 (raw file):
should_exist(path2, message1_len); if (fchown(fd, 2, 2))
ditto (add more comments)
libos/test/regression/rename_unlink.c
line 253 at r1 (raw file):
if (fchown(fd, 2, 2)) err(1, "fchown after rename"); if (fchmod(fd, S_IRWXU | S_IRWXG | S_IRWXO) != 0) // Note: with other now!
/* note: now with "other users" mode bits */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 22 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @g2flyer and @oshogbo)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
While I consider this PR generally correct (we must indeed rewire the dentry of the handle), I think the problem with
fchmod()
lies elsewhere.In particular,
fchmod()
operates on an FD, i.e. on the LibOS handle, and it's not supposed to be associated with the dentry. I created a PR that fixes this: #1875Moreover, as explained in the comments, the
hdl->dentry
field is a historic artifact which should be actually completely removed, in favor ofhdl->inode
. That's also how the Linux kernel works (and how any sane kernel should work). So in general, if our Gramine codebase is ready to get rid ofhdl->dentry
, I would very gladly do that instead, and this PR won't be necessary.
@g2flyer Can you check whether #1875 fixes your issues? Then we could extract your two sub-tests in a separate PR, and then think what to do with your change of "update dentry of handle" separately -- since this is very non-trivial (as should be clear from my comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the good comments; hope next time i sleep over pushing a PR to already fix a number of things a night and a clear head in the morning would have fixed
Reviewable status: 0 of 2 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @oshogbo)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@g2flyer Can you check whether #1875 fixes your issues? Then we could extract your two sub-tests in a separate PR, and then think what to do with your change of "update dentry of handle" separately -- since this is very non-trivial (as should be clear from my comments).
@dimakuv Oh, i didn't see any issue with fchmod
. I just added that test case also for fchmod
as it seemed somewhat conceptually related to fchown
in that it changes metadata of a renamed file with an open handle. More importantly, to (a) test first whether it really works or not in old code-- which from looking at code i thought and indeed it worked -- and (b) to make sure that my changes didn't screw up anything related to it.
In any case, i did successfully run and tested #1875 as-is and then merged my new test-vectors with your version and that also did pass all tests. So your change certainly solves the fchown
rename issue more minimally (although for my (PR #1856 / issue #1835) case i probably still will have to do the dentry fixup (but as you say needs much more thorough update regarding locking). Regarding pulling my new tests in a separate PR, assuming your PR #1875 is merged first, wouldn't it be better to move them (after cleanup) to that PR? Would seem to me best to keep the vectors identifying the problem together with the (immediate) fix?
libos/src/sys/libos_file.c
line 361 at r1 (raw file):
Previously, oshogbo (Mariusz Zaborski) wrote…
I would suggest adding a new line before the comment.
Done.
libos/src/sys/libos_file.c
line 361 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Please use
/*
comments style.
Done.
libos/src/sys/libos_file.c
line 362 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
+1
Done.
libos/src/sys/libos_file.c
line 363 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why write lock? Looks like we can do a read lock here.
copy/paste from walk_.. but using weaker lock definitely better
libos/src/sys/libos_file.c
line 370 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
handle->dentry
must be under thehandle->lock
. Otherwise two parallel renames will have a data race: T1 will be preempted right after this check, T2 will updatehandle->dentry
to a new dentry, and then T1 is scheduled back and updates again. So we loose one update.But generally, I have a feeling that our codebase was written under the implicit assumption that
handle->dentry
is immutable. So there can be more places that don't grab a lock but simply readhandle->dentry
field, and these also must be modified -- since this PR introduces a new property that the dentry field is mutable.
Yep, i realized that also during the night that (a) the scope of lock here would have to cover also the checks and (b) i would have to find other cases of use of handle->dentry and potentially add locks there as well. I guess would have been a good idea to sleep over it first instead of eagerly pushing a PR in the evening ;-)
For now i just due (a), add a TODO for myself for (b) and will downgrade the PR to a draft PR ...
libos/test/regression/rename_unlink.c
line 3 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
If you want to, add your name to this copyright (and update the year to 2024)
Done.
libos/test/regression/rename_unlink.c
line 159 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Looks like a random change. Please revert.
Aargh, relied too much on color highlights in git diff
and overlooked this single '+'. I guess note to self to more carefully look through git diffs. Anyway should be fixed
libos/test/regression/rename_unlink.c
line 181 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why
_follow
? Wouldn't it be better to name it_write
or_overwrite
?
The thought that instead of on open (destination) fd being replaced by the rename (as the earlier test, here an open (source) fd "follows" the rename. Follow is certainly not the most intuitive name but i have to admit the generic (over)write seems even less so to me. So far now leave it as-is to hear other opinions/suggestions.
libos/test/regression/rename_unlink.c
line 185 at r1 (raw file):
Previously, oshogbo (Mariusz Zaborski) wrote…
No need for extra space here.
i kept the format from test_rename_replace
but i agree removing the space seems more intuitive. Fixed now.
Should i also remove the space in test_rename_replace? (That said, i guess once you start making stuff consistent one would also have to add the test to
test_ranme_same_file` and alike which seems probably beyond what should be done in this PR).
libos/test/regression/rename_unlink.c
line 224 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Can you replace
//
comments with/*
comments?We mainly use
/*
comment style, but in some files you may see the old convention... But new files should be written with/*
, and new code in files that previously used/*
should also continue using this style.
Done.
libos/test/regression/rename_unlink.c
line 224 at r1 (raw file):
Previously, oshogbo (Mariusz Zaborski) wrote…
Gramine.
Done.
libos/test/regression/rename_unlink.c
line 228 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
+1, please check the creation of the file
Done.
libos/test/regression/rename_unlink.c
line 230 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
You should compare it with
!= 0
, for uniformity with the rest of the code
Done.
libos/test/regression/rename_unlink.c
line 230 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Can you add more info here:
if (fchown(fd, /*owner=*/1, /*group=*/1) != 0) /* dummy owner/group just for testing */
Done.
libos/test/regression/rename_unlink.c
line 232 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
/* note: no "other users" mode bits */
Done.
libos/test/regression/rename_unlink.c
line 243 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Looks like you accidentally moved this check way after
create_file()
to which it belongs.
yep :-(
libos/test/regression/rename_unlink.c
line 251 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto (add more comments)
Done.
libos/test/regression/rename_unlink.c
line 253 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
/* note: now with "other users" mode bits */
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @g2flyer and @oshogbo)
a discussion (no related file):
Previously, g2flyer (Michael Steiner) wrote…
@dimakuv Oh, i didn't see any issue with
fchmod
. I just added that test case also forfchmod
as it seemed somewhat conceptually related tofchown
in that it changes metadata of a renamed file with an open handle. More importantly, to (a) test first whether it really works or not in old code-- which from looking at code i thought and indeed it worked -- and (b) to make sure that my changes didn't screw up anything related to it.In any case, i did successfully run and tested #1875 as-is and then merged my new test-vectors with your version and that also did pass all tests. So your change certainly solves the
fchown
rename issue more minimally (although for my (PR #1856 / issue #1835) case i probably still will have to do the dentry fixup (but as you say needs much more thorough update regarding locking). Regarding pulling my new tests in a separate PR, assuming your PR #1875 is merged first, wouldn't it be better to move them (after cleanup) to that PR? Would seem to me best to keep the vectors identifying the problem together with the (immediate) fix?
- It was a typo when I mentioned
fchmod()
in my previous comment. I meantfchown()
. - I agree that I can move your sub-tests from this PR into [common] Encrypted Files: keep root hash for whole runtime of Gramine instance #1835. I will do it now.
- After that, those tests in your PR can be reverted, keeping only the logic of
hdl->dentry
updates.
- After that, those tests in your PR can be reverted, keeping only the logic of
libos/src/sys/libos_file.c
line 362 at r1 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
Done.
The assert is now on the handle_libos_call
which seems unrelated?
libos/src/sys/libos_file.c
line 370 at r1 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
Yep, i realized that also during the night that (a) the scope of lock here would have to cover also the checks and (b) i would have to find other cases of use of handle->dentry and potentially add locks there as well. I guess would have been a good idea to sleep over it first instead of eagerly pushing a PR in the evening ;-)
For now i just due (a), add a TODO for myself for (b) and will downgrade the PR to a draft PR ...
I'll keep this as blocking comment then.
libos/test/regression/rename_unlink.c
line 185 at r1 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
i kept the format from
test_rename_replace
but i agree removing the space seems more intuitive. Fixed now.
Should i also remove the space intest_rename_replace? (That said, i guess once you start making stuff consistent one would also have to add the test to
test_ranme_same_file` and alike which seems probably beyond what should be done in this PR).
I don't have a strong opinion, but our general practice for tests is:
- Try to use the same style as in the core Gramine codebase
- But if the test is already written in another style, conform to that style
- Or first submit a PR that refactors the test to a better style, and only then modify the test
- But generally -- these are just tests, we don't care that much. If you would say "No, there is an empty line in other already existing sub-tests, so I'll do that as well", no one would object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @g2flyer and @oshogbo)
libos/test/regression/rename_unlink.c
line 2 at r2 (raw file):
/* SPDX-License-Identifier: LGPL-3.0-or-later */ /* Copyright (C) 2024 Intel Corporation
These changes were copied to #1875
e36a532
to
b25266a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the embarrassing copy/paste bug, removed test files as now in #1875 but also rebased on that to retain the tests and fix from there for further evolution of this (draft) PR to get locking right
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @oshogbo)
Previously, oshogbo (Mariusz Zaborski) wrote…
You should add the prefix
[LibOS]
.
Done.
libos/src/sys/libos_file.c
line 362 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
The assert is now on the
handle_libos_call
which seems unrelated?
gosh, what was i thinking (or more likely, i guess i was thinking nothing and some copy/paste got wrong although hard to see why this would have been in the copy-buffer)
libos/test/regression/rename_unlink.c
line 185 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I don't have a strong opinion, but our general practice for tests is:
- Try to use the same style as in the core Gramine codebase
- But if the test is already written in another style, conform to that style
- Or first submit a PR that refactors the test to a better style, and only then modify the test
- But generally -- these are just tests, we don't care that much. If you would say "No, there is an empty line in other already existing sub-tests, so I'll do that as well", no one would object.
Done.
libos/test/regression/rename_unlink.c
line 233 at r1 (raw file):
Previously, oshogbo (Mariusz Zaborski) wrote…
I would suggest new line.
forgot to mark as done in my previous update; but now moved (in intended form) in #1875 (still showing up in this PR though through the rebase on #1875)
libos/test/regression/rename_unlink.c
line 2 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
These changes were copied to #1875
removed here now from my commits (although still visible as i rebased on top of 1875 so i still have the tests (as well as your fix which also affects locking strategy by removing some cases where otherwise there would have had to be additional locks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @g2flyer and @oshogbo)
libos/include/libos_handle.h
line 168 at r3 (raw file):
struct libos_fs* fs; /* dentry can change due to rename, so to secure read-access, aquire g_dcache_lock and/or
acquire
libos/include/libos_handle.h
line 170 at r3 (raw file):
/* dentry can change due to rename, so to secure read-access, aquire g_dcache_lock and/or * handle->lock; to protect updates (unless during creation and deletion where unique use is * guaranteed), acquire first g_dcache_lock and then handle-Lock */
Last word: I guess you meant handle->lock
libos/include/libos_handle.h
line 170 at r3 (raw file):
/* dentry can change due to rename, so to secure read-access, aquire g_dcache_lock and/or * handle->lock; to protect updates (unless during creation and deletion where unique use is * guaranteed), acquire first g_dcache_lock and then handle-Lock */
Is such an order easy to guarantee in our codebase? I would imagine that we have many funcs that operate on the handle and thus perform handle->lock
, and then in some helper funcs we may want to operate on this dentry
. This would mean that we have to check all callers of such helper funcs and make sure they take (perhaps unnecessarily) the g_dcache_lock
...
libos/src/fs/libos_namei.c
line 747 at r3 (raw file):
get_dentry(hdl->dentry); *dir = hdl->dentry; unlock(&hdl->lock);
While this additional locking doesn't seem too bad (it's only three cases uncovered in this PR -- I wonder if that's really all), this feels very brittle.
I really think it's time to try to remove hdl->dentry
completely and only keep the hdl->inode
. Maybe with one exception of flocks/POSIX locks -- where we can do something hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after this update the locking should be fine -- main exception is that i'm not 100% sure of semantics of g_process.fs_lock and depending on it some additional handle locks might not be necessary ... - As part of that, though, i wonder whether dentry->inode also wouldn't have to be lock protected given that (at least) rename and unlink do modify it but we have code like if (!dent->inode || dent->inode->type != S_IFDIR)
in libos_syscall_fchndir
? Also, i guess fchdir
has the same issue as fcntl/flock
that it is here path-based whereas in posix it is inode based, so wouldn't work with renames? (not that my changes should have really changed anything in what works or does not work afaict)
Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @oshogbo)
libos/include/libos_handle.h
line 168 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
acquire
Done.
libos/include/libos_handle.h
line 170 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Last word: I guess you meant
handle->lock
Done.
libos/include/libos_handle.h
line 170 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Is such an order easy to guarantee in our codebase? I would imagine that we have many funcs that operate on the handle and thus perform
handle->lock
, and then in some helper funcs we may want to operate on thisdentry
. This would mean that we have to check all callers of such helper funcs and make sure they take (perhaps unnecessarily) theg_dcache_lock
...
i want through the dereferences of dentry
(via vscode's intellisense) and then looked at caller and callees of these locations. Turns out intellisense was leaky and missing a few cases which i found now with help of good old grep -r
and alike. So there are a few additional locks required.
The order g_cache_lock > hdl->lock is actually already assumed by (callers of) populate_directory_handle
in libnos_namei.c
as that functions requires both locks to be held and both callers in libos_open.c
lock g_dcache_lock
before hdl->lock
. So what the comment says (and what the code changes still should garantuee) should be safe
libos/src/fs/libos_namei.c
line 747 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
While this additional locking doesn't seem too bad (it's only three cases uncovered in this PR -- I wonder if that's really all), this feels very brittle.
I really think it's time to try to remove
hdl->dentry
completely and only keep thehdl->inode
. Maybe with one exception of flocks/POSIX locks -- where we can do something hacky.
I didn't find too many places where hdl->dentry
was used, so probably could be removed without too much changes. That said, with this fix here, i can get #1856 working in the sense that it is a very simple test to see whether a closed file is still visible in the filesystem (directory) or not; if `hdl->dentry' disappears the only alternative i see is to keep in the valid-root-hash-map an additional list with all open hdls pointing to a particular path and that seems more messy and complex than the current approach (and definitely requires overall more changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r4, all commit messages.
Reviewable status: all files reviewed, 16 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @g2flyer and @oshogbo)
libos/include/libos_handle.h
line 170 at r3 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
i want through the dereferences of
dentry
(via vscode's intellisense) and then looked at caller and callees of these locations. Turns out intellisense was leaky and missing a few cases which i found now with help of good oldgrep -r
and alike. So there are a few additional locks required.The order g_cache_lock > hdl->lock is actually already assumed by (callers of)
populate_directory_handle
inlibnos_namei.c
as that functions requires both locks to be held and both callers inlibos_open.c
lockg_dcache_lock
beforehdl->lock
. So what the comment says (and what the code changes still should garantuee) should be safe
Yes, I agree, the caller of populate_directory_handle()
already imposes such an order:
gramine/libos/src/sys/libos_open.c
Lines 181 to 204 in 25056d0
lock(&g_dcache_lock); | |
maybe_lock_pos_handle(hdl); | |
lock(&hdl->lock); | |
file_off_t ret; | |
/* Refresh the directory handle, so that after `lseek` the user sees an updated listing. */ | |
clear_directory_handle(hdl); | |
if ((ret = populate_directory_handle(hdl)) < 0) | |
goto out; | |
struct libos_dir_handle* dirhdl = &hdl->dir_info; | |
file_off_t pos; | |
ret = generic_seek(hdl->pos, dirhdl->count, offset, origin, &pos); | |
if (ret < 0) | |
goto out; | |
hdl->pos = pos; | |
ret = pos; | |
out: | |
unlock(&hdl->lock); | |
maybe_unlock_pos_handle(hdl); | |
unlock(&g_dcache_lock); |
libos/include/libos_handle.h
line 169 at r4 (raw file):
struct libos_fs* fs; /* dentry can change due to rename, so to secure read-access, acquire g_dcache_lock and/or * handle->lock; to protect updates (unless during creation and deletion where unique use is
Why, for read access, you recommend to acquire any of the two locks?
I would be specific and ask to acquire handle->lock
. I understand that both locks work, but this creates ambiguity for future devs. And technically handle->lock
is an appropriate one in this case. So I would just change to acquire handle->lock
libos/src/fs/libos_fs_pseudo.c
line 271 at r4 (raw file):
lock(&handle->lock); struct libos_dentry* dent = handle->dentry; unlock(&handle->lock);
This looks wrong. If the dent
becomes unused, it may be garbage-collected before we call pseudo_istat()
, and that function will work on a dangling pointer.
You should do this instead:
int ret = -ENOENT;
lock(&handle->lock);
if (handle->dentry)
ret = pseudo_istat(handle->dentry, handle->inode, buf);
unlock(&handle->lock);
return ret;
Looks like this function assumes that handle->dentry
is initialized here, so the IF in my code snippet may be useless. One needs to analyze the callers of this func to understand if we can remove that.
libos/src/fs/libos_namei.c
line 747 at r3 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
I didn't find too many places where
hdl->dentry
was used, so probably could be removed without too much changes. That said, with this fix here, i can get #1856 working in the sense that it is a very simple test to see whether a closed file is still visible in the filesystem (directory) or not; if `hdl->dentry' disappears the only alternative i see is to keep in the valid-root-hash-map an additional list with all open hdls pointing to a particular path and that seems more messy and complex than the current approach (and definitely requires overall more changes)
Ok, I agree that removing hdl->dentry
will be a big task, and we should take small steps. I'm unblocking this discussion, as we seem to have fixed all hdl->dentry
places in the codebase.
libos/src/fs/proc/thread.c
line 34 at r4 (raw file):
lock(&g_process.exec->lock); dent = g_process.exec->dentry; unlock(&g_process.exec->lock);
Same problem as discussed in another comment -- dent
may be garbage-collected. So you need to move the unlock()
two lines below. This is because if get_dentry(dent)
happens under the lock, then we are guaranteed that dentry won't be freed -- Gramine will know that there is at least one caller of this object still processing.
libos/src/sys/libos_fcntl.c
line 204 at r4 (raw file):
lock(&hdl->lock); struct libos_dentry* dent = hdl->dentry; unlock(&hdl->lock);
That's wrong because file_lock_set(dent ...)
may work on a dangling pointer.
This whole code snippet looks wrong, it doesn't have any locking or get_dentry()
. Hmm, or no, this code snippet is fine -- it takes a reference of hdl
and assumes that hdl->dentry
is immutable.
So yes, in this PR, we must either keep the hdl->lock
until the very end of this case, or get_dentry()
.
libos/src/sys/libos_fcntl.c
line 243 at r4 (raw file):
lock(&hdl->lock); struct libos_dentry* dent = hdl->dentry; unlock(&hdl->lock);
That's wrong because file_lock_get(dent ...)
may work on a dangling pointer.
Same notes as in the comment above.
libos/src/sys/libos_fcntl.c
line 353 at r4 (raw file):
.handle_id = hdl->id, };
Please remove spaces
libos/src/sys/libos_fcntl.c
line 356 at r4 (raw file):
lock(&hdl->lock); struct libos_dentry* dent = hdl->dentry; unlock(&hdl->lock);
ditto, cannot unlock here, otherwise dangling pointer.
libos/src/sys/libos_getcwd.c
line 87 at r4 (raw file):
lock(&hdl->lock); struct libos_dentry* dent = hdl->dentry; unlock(&hdl->lock);
ditto. Here you should get_dentry()
under the lock and at the end of the func put_dentry()
libos/src/sys/libos_stat.c
line 217 at r4 (raw file):
lock(&hdl->lock); struct libos_mount* mount = hdl->dentry ? hdl->dentry->mount : NULL; unlock(&hdl->lock);
FYI: here we are fine because mount
objects are immutable and are never freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 16 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @oshogbo)
libos/include/libos_handle.h
line 169 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why, for read access, you recommend to acquire any of the two locks?
I would be specific and ask to acquire
handle->lock
. I understand that both locks work, but this creates ambiguity for future devs. And technicallyhandle->lock
is an appropriate one in this case. So I would just change toacquire handle->lock
From a safety (& liveness) perspective either should be enough and while handle->lock would be the natural one, there were places such as in libos_syscall_ioctl which already had the g_dcache_lock
or libsys_syscal_mknodat
, so to minimize changes seemed to allow this also? But of course i could add handle->locks there as well if you prefer ...
libos/src/fs/libos_fs_pseudo.c
line 271 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This looks wrong. If the
dent
becomes unused, it may be garbage-collected before we callpseudo_istat()
, and that function will work on a dangling pointer.You should do this instead:
int ret = -ENOENT; lock(&handle->lock); if (handle->dentry) ret = pseudo_istat(handle->dentry, handle->inode, buf); unlock(&handle->lock); return ret;
Looks like this function assumes that
handle->dentry
is initialized here, so the IF in my code snippet may be useless. One needs to analyze the callers of this func to understand if we can remove that.
I see what you mean. I thought this is only to deal with cached meta-data as get_handle just increases a reference counter and put_handle doesn't invalidate (NULL) the pointers even inside or on the outside of the cases i've seen. Maybe beyond this PR but wouldn't it be better that put_handle would get a reference to the pointer which could be NULLed iff ref count is really zero? But short term i guess the better approach is what you hinted below is two wrap the derefence by get_dentry/put_dentry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @g2flyer and @oshogbo)
libos/include/libos_handle.h
line 169 at r4 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
From a safety (& liveness) perspective either should be enough and while handle->lock would be the natural one, there were places such as in libos_syscall_ioctl which already had the
g_dcache_lock
orlibsys_syscal_mknodat
, so to minimize changes seemed to allow this also? But of course i could add handle->locks there as well if you prefer ...
Ok, let's keep like this.
libos/src/fs/libos_fs_pseudo.c
line 271 at r4 (raw file):
Maybe beyond this PR but wouldn't it be better that put_handle would get a reference to the pointer which could be NULLed iff ref count is really zero?
This sounds pretty uncommon? I don't remember semantics like this in any reference-counting implementations I've seen. Anyway, this is something orthogonal to this PR.
But short term i guess the better approach is what you hinted below is two wrap the derefence by get_dentry/put_dentry?
Yes, well, two approaches (I described both in the PR):
- In some cases, just extend the
handle->lock
to cover all related lines of code (like in this particular example here). No need to complicate with getting the reference and putting it back, just to getstat
buffer. - In other cases, where more stuff happens to dentry, it's better to get the reference and then drop it at func end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, next try :-)
Reviewable status: 7 of 13 files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @oshogbo)
libos/src/fs/libos_fs_pseudo.c
line 271 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Maybe beyond this PR but wouldn't it be better that put_handle would get a reference to the pointer which could be NULLed iff ref count is really zero?
This sounds pretty uncommon? I don't remember semantics like this in any reference-counting implementations I've seen. Anyway, this is something orthogonal to this PR.
But short term i guess the better approach is what you hinted below is two wrap the derefence by get_dentry/put_dentry?
Yes, well, two approaches (I described both in the PR):
- In some cases, just extend the
handle->lock
to cover all related lines of code (like in this particular example here). No need to complicate with getting the reference and putting it back, just to getstat
buffer.- In other cases, where more stuff happens to dentry, it's better to get the reference and then drop it at func end.
Moved locks, although it is a bit more complicated as do_hstat
from libos_stat.c
requires a lock before calling fs_ops->hstat
for consistency of dentry
yet other callers of fs_ops->hstat
do not have the lock, so have to do it conditionally.
Regarding whether we need the if or not, it looks currently a bit inconsistent: generic_inode_hstat
has an assert(handle->inode) and is used by a number of fs as fs_ops.hstat. Yet others, e.g., the hstat
in libos_fs_path.c
checks (as only action before redirecting to generic_inode_hstat
. Based on this i added the if as you had it to be on the safe side ....
libos/src/fs/proc/thread.c
line 34 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Same problem as discussed in another comment --
dent
may be garbage-collected. So you need to move theunlock()
two lines below. This is because ifget_dentry(dent)
happens under the lock, then we are guaranteed that dentry won't be freed -- Gramine will know that there is at least one caller of this object still processing.
Yep, i release lock too early. Moving it to after the get_dentry
should make it safe
libos/src/sys/libos_fcntl.c
line 204 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
That's wrong because
file_lock_set(dent ...)
may work on a dangling pointer.This whole code snippet looks wrong, it doesn't have any locking or
get_dentry()
. Hmm, or no, this code snippet is fine -- it takes a reference ofhdl
and assumes thathdl->dentry
is immutable.So yes, in this PR, we must either keep the
hdl->lock
until the very end of this case, orget_dentry()
.
Done.
libos/src/sys/libos_fcntl.c
line 243 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
That's wrong because
file_lock_get(dent ...)
may work on a dangling pointer.Same notes as in the comment above.
Done.
libos/src/sys/libos_fcntl.c
line 353 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Please remove spaces
Done.
libos/src/sys/libos_fcntl.c
line 356 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto, cannot unlock here, otherwise dangling pointer.
Done.
libos/src/sys/libos_file.c
line 370 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I'll keep this as blocking comment then.
this should be hopefully fixed now (and hence i removed the TODO, although for now leave the 'draft' as it is still rebased on #1875 which has the test cases but is not yet merged
libos/src/sys/libos_getcwd.c
line 87 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto. Here you should
get_dentry()
under the lock and at the end of the funcput_dentry()
for other reason (discussed bug around if (!dent\->inode || dent\->inode\->type != S\_IFDIR) {
i switched to g_dcache_lock
and across the whole function.
libos/src/sys/libos_stat.c
line 217 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
FYI: here we are fine because
mount
objects are immutable and are never freed.
Rename doesn't work across mount-points, so true that mount should everywhere be the same. But you still need look as there are two dereferences from hdl->dentry?
Signed-off-by: g2flyer <[email protected]>
[LibOS] Fix dentry of open files after rename Signed-off-by: g2flyer <[email protected]>
Signed-off-by: g2flyer <[email protected]>
Signed-off-by: g2flyer <[email protected]>
Signed-off-by: g2flyer <[email protected]>
Signed-off-by: g2flyer <[email protected]>
6a0f5b3
to
b980834
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As PR #1875 (which containing parts originally here) is merged, i rebased this PR to a clean state (retaining, though, the fixups which were already discussed) and moved the PR back out of draft into active state
Reviewable status: 3 of 13 files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @oshogbo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r5, 3 of 4 files at r6, all commit messages.
Reviewable status: 7 of 13 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @g2flyer and @oshogbo)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
For other reviewers:
fchown()
fails with ENOENT because it operates through the dentry of the handle, and if it doesn't find the inode associated with the dentry, it returns with ENOENT.And since during
rename()
, the inode of the old dentry is NULLified (see the surrounding code in this PR), we end up in this error path:gramine/libos/src/sys/libos_file.c
Line 297 in 80377ca
ret = -ENOENT;
This was fixed by db1d16d, already in master and in this PR. Resolving.
a discussion (no related file):
Sorry, these libos_handle::dentry
locking fixes are getting too complicated.
Ideally, we must choose one lock to acquire; the possibility to use any of the two locks (dentry-global or handle-specific) is plain weird.
The current locks are sprinkled around functions, so it's very hard to notice and reason about. I think we should rework that to acquire the locks at the beginning of funcs and release at the end of funcs.
We definitely must not introduce conditional locking. Instead, if such situations arise, we need to rework the functions and their callers, possibly splitting/duplicating the funcs.
Finally, I would split this PR into a series of PRs: each PR fixing one usage of hdl->dentry
.
I suggest we discuss what to do here in the next Gramine community meeting. I added to the agenda. I can drive that discussion, if @g2flyer won't be able to attend. See #1906
libos/src/fs/libos_fs_pseudo.c
line 271 at r4 (raw file):
Previously, g2flyer (Michael Steiner) wrote…
Moved locks, although it is a bit more complicated as
do_hstat
fromlibos_stat.c
requires a lock before callingfs_ops->hstat
for consistency ofdentry
yet other callers offs_ops->hstat
do not have the lock, so have to do it conditionally.
Regarding whether we need the if or not, it looks currently a bit inconsistent:generic_inode_hstat
has an assert(handle->inode) and is used by a number of fs as fs_ops.hstat. Yet others, e.g., thehstat
inlibos_fs_path.c
checks (as only action before redirecting togeneric_inode_hstat
. Based on this i added the if as you had it to be on the safe side ....
Sorry, this is a hard no. Conditional locking is a bad smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 13 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @oshogbo)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Sorry, these
libos_handle::dentry
locking fixes are getting too complicated.Ideally, we must choose one lock to acquire; the possibility to use any of the two locks (dentry-global or handle-specific) is plain weird.
The current locks are sprinkled around functions, so it's very hard to notice and reason about. I think we should rework that to acquire the locks at the beginning of funcs and release at the end of funcs.
We definitely must not introduce conditional locking. Instead, if such situations arise, we need to rework the functions and their callers, possibly splitting/duplicating the funcs.
Finally, I would split this PR into a series of PRs: each PR fixing one usage of
hdl->dentry
.I suggest we discuss what to do here in the next Gramine community meeting. I added to the agenda. I can drive that discussion, if @g2flyer won't be able to attend. See #1906
It's a while back and i haven't "swapped back in" all the context i have in my notes. But thinking right now i think there is probably a bit a mixup between locks which protect the dentry pointer in the handle itself (for which the handle's lock should be the right lock) and a lock you need for reading changeable fields in the handle dereferenced dentry itself (only inode i think, which requires g_dcache_lock). I vaguely remember i was worried that requiring both locks would give issue with circular lock-dependencies but i'm actually not sure anymore why (and how) the current strategy should have overcome that perceived problem. And neither looking at my notes during the call/lock analysis nor looking at the code right now i see why the strategy above (dentry dereference via hdl-lock and dentry->inode dereference via additional g_dcache_lock (as already required) wouldn't work.
Regarding splitting in different PRs, i certainly can see a separate PR for the refactoring around hstat to remove the need for conditional locking. But for the rest i'm a bit worried that splitting risks getting the big picture lost. But then i probably don't understand how exactly you wanted to split
libos/src/fs/libos_fs_pseudo.c
line 271 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Sorry, this is a hard no. Conditional locking is a bad smell.
I've seen conditional locking elsewhere and it was the solution which touched fewest places. That said, not conditional locking is not great and consistently acquiring (and documenting corresponding requirement) locking for all fs_ops->hstat definitely looks cleaner; as part probably the hstat top-level handlers also can be made more consistent/"standardized"
Closing this PR as a revised (simplified and cleaned-up) version of the fix is available as PR #1979 |
During
rename
, thedentry
of the corresponding handle is not updated, which e.g., makesfchown
not work properly. I did discover that during work on issue #1835 in Draft PR #1856 where i have to be able to figure out stale handles. A corresponding test-case is added astest_rename_fchown_fchmod
inrename_unlink
. An additional rename related test casetest_rename_follow
test the "flip-side" oftest_rename_replace
(the one which originally triggered me finding the bug).BTW: the original (in)validation tests from here are moved now to PR #1875. To still have these tests, this PR is currently rebased on the that PRs branch rather than master and currently marked as draft. Will remove the draft once #1875 is merged and i can rebase to master
Description of the changes
To fix that,
do_rename
iterates over open file handles and fixes any which was was pointing to the source-dentry of the rename.PS: I'm not very "fluent" on the overall locking strategy. I didn't find anything obvious where my code violates invarients but definitely good to cross-check that (a) using sufficents locks and (b) the implied lock prioties
g_dcache.lock
>libos_handle_map.lock
>libos_handle.lock
does not violate anything.How to test this PR?
I tested by adding the two new test-cases and then running the usual regression tests.
This change is