Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pynvml: supposed to work? #298

Open
FlorianHeigl opened this issue Nov 11, 2024 · 0 comments
Open

pynvml: supposed to work? #298

FlorianHeigl opened this issue Nov 11, 2024 · 0 comments

Comments

@FlorianHeigl
Copy link

FlorianHeigl commented Nov 11, 2024

Hi,

I'm trying to use ZLUDA with tinygrad using CUDA=1 or GPU=1
I installed the release version of zluda(v3), and am loading my libraries as such:

# export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/rocm/lib:/gpu/zluda"

I've also added a symlink to map the library version as expected:

 /opt/rocm/lib#  ln -s librocm_smi64.so.7.2.60103 librocm_smi64.so.5

I have to be honest, I've just trying to do this as good as possible after many other more direct approaches had failed.
So I'm not sure if that is even supposed to work!

I can see it's not trying to do any actual work but fails at detection of device features. I would assume that is a pretty normal cuda call, but maybe not...

(.venv) floh@beast-lnx:/gpu/exo$ CUDA=1 exo --inference-engine=tinygrad
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Selected inference engine: tinygrad

  _____  _____  
 / _ \ \/ / _ \ 
|  __/>  < (_) |
 \___/_/\_\___/ 
    
Detected system: Linux
Inference engine name after selection: tinygrad
Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader
[52570, 62927, 52223, 53164, 51600, 53762, 59607, 53664, 54527, 52711, 53927, 53528, 55031, 58034, 50307, 55299]
Chat interface started:
 - http://127.0.0.1:8000
 - http://172.17.0.1:8000
 - http://192.168.86.30:8000
 - http://172.18.0.1:8000
ChatGPT API endpoint served at:
 - http://127.0.0.1:8000/v1/chat/completions
 - http://172.17.0.1:8000/v1/chat/completions
 - http://192.168.86.30:8000/v1/chat/completions
 - http://172.18.0.1:8000/v1/chat/completions
Traceback (most recent call last):
  File "/gpu/exo/.venv/bin/exo", line 33, in <module>
    sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
  File "/gpu/exo/.venv/bin/exo", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/gpu/exo/exo/main.py", line 108, in <module>
    node = StandardNode(
  File "/gpu/exo/exo/orchestration/standard_node.py", line 41, in __init__
    self.device_capabilities = device_capabilities()
  File "/gpu/exo/exo/topology/device_capabilities.py", line 148, in device_capabilities
    return linux_device_capabilities()
  File "/gpu/exo/exo/topology/device_capabilities.py", line 188, in linux_device_capabilities
    gpu_raw_name = pynvml.nvmlDeviceGetName(handle).upper()
  File "/gpu/exo/.venv/lib/python3.10/site-packages/pynvml.py", line 2175, in wrapper
    res = func(*args, **kwargs)
  File "/gpu/exo/.venv/lib/python3.10/site-packages/pynvml.py", line 2472, in nvmlDeviceGetName
    _nvmlCheckReturn(ret)
  File "/gpu/exo/.venv/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported

edit: I looked at this again today, the library that is acting up is pyNVML.
the question stays why it isn't convinced by ZLUDA to assume that everything must certainly be fine with this 'nvidia device'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant