-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix shared memory permission issue in a shared pod environment #813
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems fine to me, but are you able to reproduce this with an example where this used to fail, and then installing streaming off this branch now succeeds?
closing after offline discussion with @XiaohanZhangCMU |
4e0d001
to
98ac253
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine to me...i think. would like to see testing
@karan6181 @snarayan21 @bigning testings are done. PR is ready for review then merge. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple questions, but seems fine to me overall. Deferring to @karan6181 and @bigning on approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the fix! Will let @karan6181 stamp!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank You!
Description of changes:
We run into read permission issues when the shared memory is created in a shared computer environment. We relax the retry condition to allow the creation of a new shared memory file under permission error. When creating a shared memory file, we make sure both LOCALS and FILELOCKS do not overlap with existing ones and they have the same prefix ints in the name.
What makes it more complicated is when a program exited non-normally, some of the shared meomory files are cleaned up but some are not. To address that, we need to make sure all files in SHM_TO_CLEAN do not have duplicates.
Issue #, if available:
Merge Checklist:
Put an
x
without space in the boxes that apply. If you are unsure about any checklist, please don't hesitate to ask. We are here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
pre-commit
on my change. (check out thepre-commit
section of prerequisites)A complete list of testings done for this PR is put in this doc.