Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Falcon-Mamba-7B #2736

Open
1 of 2 tasks
mokeddembillel opened this issue Nov 10, 2024 · 1 comment
Open
1 of 2 tasks

Support for Falcon-Mamba-7B #2736

mokeddembillel opened this issue Nov 10, 2024 · 1 comment

Comments

@mokeddembillel
Copy link

Model description

Hi I'm interested in adding support for Falcon-Mamba 7B to TGI, Here are some links for this model:

paper: https://arxiv.org/abs/2410.05355
model: https://huggingface.co/tiiuae/falcon-mamba-7b

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

No response

@mokeddembillel
Copy link
Author

I'm new to TGI and Opensource. After a lot of bugs with the local installation, I managed to get to this point.
text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python -m torch.distributed.run --nproc_per_node=1 text_generation_server/cli.py serve tiiuae/falcon-7b-instruct

2024-11-10 15:35:51.475 | INFO | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda /home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop. warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.") Using prefix caching = True Using Attention = flashinfer Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.16s/it] Using experimental prefill chunking = False Server started at unix:///tmp/text-generation-server-0

After running TGI in dev mode, It's getting stuck at Server started at unix:///tmp/text-generation-server-0 not sure what's the issue. Anyone knows how to solve this?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant