Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

my_dodrio_quota command is not robust against filesystems unmounting #96

Open
boegel opened this issue Aug 13, 2024 · 0 comments
Open

Comments

@boegel
Copy link
Member

boegel commented Aug 13, 2024

Every now and then, my_dodrio_quota crashes with:

2024-08-09 13:47:45,085 ERROR      LustreOperations MainThread  Failed to create the list of current mounted filesystems
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vsc/filesystem/posix.py", line 245, in _local_filesystems
    self.localfilesystems = [[y[2], y[1], os.stat(y[1]).st_dev, y[0]] for y in currentmounts]
  File "/usr/lib/python3.6/site-packages/vsc/filesystem/posix.py", line 245, in <listcomp>
    self.localfilesystems = [[y[2], y[1], os.stat(y[1]).st_dev, y[0]] for y in currentmounts]
FileNotFoundError: [Errno 2] No such file or directory: '/run/user/0'
Traceback (most recent call last):
  File "/usr/bin/my_dodrio_quota", line 80, in <module>
    main()
  File "/usr/bin/my_dodrio_quota", line 70, in main
    filesystems = lustop.list_filesystems()
  File "/usr/lib/python3.6/site-packages/vsc/filesystem/lustre.py", line 214, in list_filesystems
    self._local_filesystems()
  File "/usr/lib/python3.6/site-packages/vsc/filesystem/posix.py", line 245, in _local_filesystems
    self.localfilesystems = [[y[2], y[1], os.stat(y[1]).st_dev, y[0]] for y in currentmounts]
  File "/usr/lib/python3.6/site-packages/vsc/filesystem/posix.py", line 245, in <listcomp>
    self.localfilesystems = [[y[2], y[1], os.stat(y[1]).st_dev, y[0]] for y in currentmounts]
FileNotFoundError: [Errno 2] No such file or directory: '/run/user/0'

It looks like it's looping over all mounted fileystems, and then statting them one by one, but some may be unmounted by the time the stat is done, leading to the crash.

$ mount
...
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6542136k,mode=700)
[vsc40023@login55 ~]$ stat /run/user/0
stat: cannot statx '/run/user/0': No such file or directory

@kwaegema Can you take a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant