Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssm models have been deprecated in favor of mamba models #2739

Open
2 of 4 tasks
mokeddembillel opened this issue Nov 10, 2024 · 1 comment · May be fixed by #2740
Open
2 of 4 tasks

ssm models have been deprecated in favor of mamba models #2739

mokeddembillel opened this issue Nov 10, 2024 · 1 comment · May be fixed by #2740

Comments

@mokeddembillel
Copy link

System Info

System Specifications

2024-11-10T21:20:44.880890Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: 97f7a22
Docker label: N/A
nvidia-smi:
Sun Nov 10 21:20:43 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:9E:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:A0:00.0 Off | 0 |
| N/A 25C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L40S On | 00000000:A2:00.0 Off | 0 |
| N/A 27C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L40S On | 00000000:A4:00.0 Off | 0 |
| N/A 27C P8 31W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA L40S On | 00000000:C6:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA L40S On | 00000000:C8:00.0 Off | 0 |
| N/A 26C P8 30W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA L40S On | 00000000:CA:00.0 Off | 0 |
| N/A 29C P8 33W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA L40S On | 00000000:CC:00.0 Off | 0 |
| N/A 26C P8 30W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Reproducing Steps and Traceback

~/Desktop/Code/text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m
2024-11-10 22:26:52.790 | INFO | text_generation_server.utils.import_utils::80 - Detected system cuda
/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
Using prefix caching = True
Using Attention = flashinfer
Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
Error when initializing model
Traceback (most recent call last):
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/cli.py", line 373, in
app()
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/envs/tgi/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/cli.py", line 116, in serve
server.serve(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/envs/tgi/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)

File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/server.py", line 268, in serve_inner
model = get_model_with_lora_adapters(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/models/init.py", line 1358, in get_model_with_lora_adapters
model = get_model(
File "/home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/models/init.py", line 631, in get_model
raise RuntimeError(
RuntimeError: ssm models have been deprecated in favor of mamba models, which follow standard HF formats. Check out a list here: https://huggingface.co/models?search=mamba%20-hf

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130m

Expected behavior

Web server started

@mokeddembillel
Copy link
Author

Solved the issue. Will submit a pull request.

@mokeddembillel mokeddembillel linked a pull request Nov 10, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant