pynvml: supposed to work? #298

FlorianHeigl · 2024-11-11T23:35:36Z

Hi,

I'm trying to use ZLUDA with tinygrad using CUDA=1 or GPU=1
I installed the release version of zluda(v3), and am loading my libraries as such:

# export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/rocm/lib:/gpu/zluda"

I've also added a symlink to map the library version as expected:

 /opt/rocm/lib#  ln -s librocm_smi64.so.7.2.60103 librocm_smi64.so.5

I have to be honest, I've just trying to do this as good as possible after many other more direct approaches had failed.
So I'm not sure if that is even supposed to work!

I can see it's not trying to do any actual work but fails at detection of device features. I would assume that is a pretty normal cuda call, but maybe not...

(.venv) floh@beast-lnx:/gpu/exo$ CUDA=1 exo --inference-engine=tinygrad
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Selected inference engine: tinygrad

  _____  _____  
 / _ \ \/ / _ \ 
|  __/>  < (_) |
 \___/_/\_\___/ 
    
Detected system: Linux
Inference engine name after selection: tinygrad
Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader
[52570, 62927, 52223, 53164, 51600, 53762, 59607, 53664, 54527, 52711, 53927, 53528, 55031, 58034, 50307, 55299]
Chat interface started:
 - http://127.0.0.1:8000
 - http://172.17.0.1:8000
 - http://192.168.86.30:8000
 - http://172.18.0.1:8000
ChatGPT API endpoint served at:
 - http://127.0.0.1:8000/v1/chat/completions
 - http://172.17.0.1:8000/v1/chat/completions
 - http://192.168.86.30:8000/v1/chat/completions
 - http://172.18.0.1:8000/v1/chat/completions
Traceback (most recent call last):
  File "/gpu/exo/.venv/bin/exo", line 33, in <module>
    sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
  File "/gpu/exo/.venv/bin/exo", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/gpu/exo/exo/main.py", line 108, in <module>
    node = StandardNode(
  File "/gpu/exo/exo/orchestration/standard_node.py", line 41, in __init__
    self.device_capabilities = device_capabilities()
  File "/gpu/exo/exo/topology/device_capabilities.py", line 148, in device_capabilities
    return linux_device_capabilities()
  File "/gpu/exo/exo/topology/device_capabilities.py", line 188, in linux_device_capabilities
    gpu_raw_name = pynvml.nvmlDeviceGetName(handle).upper()
  File "/gpu/exo/.venv/lib/python3.10/site-packages/pynvml.py", line 2175, in wrapper
    res = func(*args, **kwargs)
  File "/gpu/exo/.venv/lib/python3.10/site-packages/pynvml.py", line 2472, in nvmlDeviceGetName
    _nvmlCheckReturn(ret)
  File "/gpu/exo/.venv/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported

edit: I looked at this again today, the library that is acting up is pyNVML.
the question stays why it isn't convinced by ZLUDA to assume that everything must certainly be fine with this 'nvidia device'.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pynvml: supposed to work? #298

pynvml: supposed to work? #298

FlorianHeigl commented Nov 11, 2024 •

edited

Loading

pynvml: supposed to work? #298

pynvml: supposed to work? #298

Comments

FlorianHeigl commented Nov 11, 2024 • edited Loading

FlorianHeigl commented Nov 11, 2024 •

edited

Loading