You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Reproducing Steps and Traceback
~/Desktop/Code/text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m
2024-11-10 22:26:52.790 | INFO | text_generation_server.utils.import_utils::80 - Detected system cuda
/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
Using prefix caching = True
Using Attention = flashinfer
Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
Error when initializing model
Traceback (most recent call last):
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/cli.py", line 373, in
app()
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/cli.py", line 116, in serve
server.serve(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/server.py", line 268, in serve_inner
model = get_model_with_lora_adapters(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/models/init.py", line 1358, in get_model_with_lora_adapters
model = get_model(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/models/init.py", line 631, in get_model
raise RuntimeError(
RuntimeError: ssm models have been deprecated in favor of mamba models, which follow standard HF formats. Check out a list here: https://huggingface.co/models?search=mamba%20-hf
System Info
System Specifications
2024-11-10T21:20:44.880890Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: 97f7a22
Docker label: N/A
nvidia-smi:
Sun Nov 10 21:20:43 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:9E:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:A0:00.0 Off | 0 |
| N/A 25C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:A2:00.0 Off | 0 |
| N/A 27C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:A4:00.0 Off | 0 |
| N/A 27C P8 31W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA L40S On | 00000000:C6:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA L40S On | 00000000:C8:00.0 Off | 0 |
| N/A 26C P8 30W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA L40S On | 00000000:CA:00.0 Off | 0 |
| N/A 29C P8 33W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA L40S On | 00000000:CC:00.0 Off | 0 |
| N/A 26C P8 30W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Reproducing Steps and Traceback
~/Desktop/Code/text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m
2024-11-10 22:26:52.790 | INFO | text_generation_server.utils.import_utils::80 - Detected system cuda
/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
Using prefix caching = True
Using Attention = flashinfer
Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
Error when initializing model
Traceback (most recent call last):
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/cli.py", line 373, in
app()
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/cli.py", line 116, in serve
server.serve(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
Information
Tasks
Reproduction
SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m
Expected behavior
Web server started
The text was updated successfully, but these errors were encountered: