[BUG] Random medperf local server failures: os.getcwd(), FileNotFoundError: No such file or directory #501

VukW · 2023-11-17T16:24:14Z

Issue description

I'm running medperf tutorials in WSL and face a strange behaviour when client and server start to fail randomly. As firstly I was thinking that's an internal medperf issue, I'm going to document details here.
When passing tutorials https://docs.medperf.org/getting_started/benchmark_owner_demo/ (this and other ones), I use a local medperf server. While running some (random) commands (usually heavy ones, that require a lot of i/o operations), I got the following error:

Client side:

Traceback (most recent call last):
File "/home/vukw/anaconda3/envs/env39_medperf/bin/mlcube", line 5, in <module>
from mlcube.__main__ import cli
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/mlcube/__main__.py", line 66, in <module>
default=os.getcwd(),
FileNotFoundError: [Errno 2] No such file or directory

Interesting thing is that it touch not only client side, but a server side also (that's running in an independent bash terminal):

Traceback (most recent call last):
  File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/db/backends/base/base.py", line 219, in ensure_connection
    self.connect()
  File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/db/backends/base/base.py", line 200, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 209, in get_new_connection
    conn = Database.connect(**conn_params)
sqlite3.OperationalError: unable to open database file

Still, rerun server doesn't help:

$ sh setup-dev-server.sh
realpath: cert.crt: No such file or directory
realpath: cert.key: No such file or directory


1
1

0
CERT FILE must not be empty

Moreover, not just medperf is broken, but pip also:

$ pip list
The folder you are executing pip from can no longer be found.

Workarounds and solutions.

Workarounds

First of all, rerunning server and client in a new bash terminal helps to fix issue - for a while. Still after a few commands error is raised again.
cd . also helps like a magic. Looks like it resets working directory path - but again only for a while.

Solution debugging

Together with @hasan7n we've found that sometimes such a behavior can be noticed on external encrypted storages: stackoverflow discussion. In my case I checked out repo in Windows env - so all the files are located somewhere on /mnt/c/Users/vykuk/repos/mlc/medperf, that's actually an external and encrypted drive. Moreover, we've found a WSL issue with a similar behavior and workaround, but without notes about drive encryption. So, looks like WSL mounting drive (in my case) is a particular kind of main problem - that sometimes external drives can be locked & unlocked, and it causes working directory issues for all the scripts running on that storages.

Solution

Thus, a reasonable solution (that helped in my case also) is to move a whole medperf repository from windows host mounted drive /mnt/c/.... to the internal WSL filesystem. Moving the whole repo folder to /home/medperf removes the issue.

Future explorations

I still don't know why exactly mounted storage is locked, which conditions lead to it and who is responsible (Windows host or Ubuntu itself). Also, I didn't met such an issue with other projects located on mounted drive - medperf is the first one who reproduces that behavior. Finally, the nature of the issue makes it extremely hard to find a way to reproduce it with 100% guarantee. Same commands can sometimes pass successfully, and next time fail with error.

We can expect same issue may arise in other systems & combinations - when medperf repo is located on external storages.

Environment

Host system: Windows 11, 22H2, OS build 22623.891
WSL 1.2.5.0
WSL image ($ uname -r): 5.15.90.1-microsoft-standard-WSL2
Guest system: (lsb_release -a): Ubuntu 22.04.1 LTS

The text was updated successfully, but these errors were encountered:

VukW added the type: bug Something isn't working label Nov 17, 2023

hasan7n added the project: Core label Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Random medperf local server failures: os.getcwd(), FileNotFoundError: No such file or directory #501

[BUG] Random medperf local server failures: os.getcwd(), FileNotFoundError: No such file or directory #501

VukW commented Nov 17, 2023

[BUG] Random medperf local server failures: os.getcwd(), FileNotFoundError: No such file or directory #501

[BUG] Random medperf local server failures: os.getcwd(), FileNotFoundError: No such file or directory #501

Comments

VukW commented Nov 17, 2023

Issue description

Workarounds and solutions.

Workarounds

Solution debugging

Solution

Future explorations

Environment