Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Errno 28] No space left on device when trying to install a package with pip inside a VM #192

Open
shadeofblue opened this issue Feb 22, 2024 · 2 comments

Comments

@shadeofblue
Copy link
Contributor

steps to reproduce:

  1. start a cluster normally with ray up golem-cluster.dev.yaml
  2. ray attach golem-cluster.dev.yaml
  3. pip install numba
Collecting numba
  Downloading https://pypi.dev.golem.network/packages/73/d5/d359cece32302442c8ea9742b1324c4eda689fd54281eb3144f520c81f6d/numba-0.59.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata
     - 2.7 kB ? 0:00:00
Collecting llvmlite<0.43,>=0.42.0dev0 (from numba)
  Downloading https://pypi.dev.golem.network/packages/2b/01/764489e364948f52aa7cb958a91a8dafd489357d2401f66946542bbc1764/llvmlite-0.42.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
     - 4.8 kB ? 0:00:00
Collecting numpy<1.27,>=1.22 (from numba)
  Downloading https://pypi.dev.golem.network/packages/4b/d7/ecf66c1cd12dc28b4040b15ab4d17b773b87fa9d29ca16125de01adb36cd/numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
     - 61.0 kB 271.3 MB/s 0:00:00
Downloading https://pypi.dev.golem.network/packages/73/d5/d359cece32302442c8ea9742b1324c4eda689fd54281eb3144f520c81f6d/numba-0.59.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
   \ 3.7 MB 25.9 MB/s 0:00:00
Downloading https://pypi.dev.golem.network/packages/2b/01/764489e364948f52aa7cb958a91a8dafd489357d2401f66946542bbc1764/llvmlite-0.42.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
   - 43.8 MB 24.4 MB/s 0:00:02
Downloading https://pypi.dev.golem.network/packages/4b/d7/ecf66c1cd12dc28b4040b15ab4d17b773b87fa9d29ca16125de01adb36cd/numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
   / 18.2 MB 31.4 MB/s 0:00:00
Installing collected packages: numpy, llvmlite, numba
ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

suggested solution: we should change the VM image to use a VOLUME and ensure that pip is configured to keep at least the temporary files but preferably all the installed packages on an external volume

@mateuszsrebrny
Copy link
Contributor

Let's talk @shadeofblue - with Kamil we didn't know about this VOLUMES enough

@shadeofblue
Copy link
Contributor Author

okay, so there are three facets to this issue:

  • when running any payloads on ray on golem, if there are artifacts written to the current working directory, they'll pretty quickly fill up the remaining space
  • when one installs packages inside the VM, the downloaded package files and then packages themselves also take up space on the current filesystem, which also means that you cannot install anything itself larger than the available space
  • when anything happens to the ray node on the VM, to pull the logs we now need to log through SSH but it would be easier to just snatch those logs from the machine with regular exescript transfers - but afair/k, they require the files to be transferred residing on a volume...

so, it all boils down to the fact that we need to rework the images so that they use VOLUMES appropriately to alleviate the above issues...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants