You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm new to TGI and Opensource. After a lot of bugs with the local installation, I managed to get to this point. text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python -m torch.distributed.run --nproc_per_node=1 text_generation_server/cli.py serve tiiuae/falcon-7b-instruct
2024-11-10 15:35:51.475 | INFO | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda /home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop. warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.") Using prefix caching = True Using Attention = flashinfer Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.16s/it] Using experimental prefill chunking = False Server started at unix:///tmp/text-generation-server-0
After running TGI in dev mode, It's getting stuck at Server started at unix:///tmp/text-generation-server-0 not sure what's the issue. Anyone knows how to solve this?
Model description
Hi I'm interested in adding support for Falcon-Mamba 7B to TGI, Here are some links for this model:
paper: https://arxiv.org/abs/2410.05355
model: https://huggingface.co/tiiuae/falcon-mamba-7b
Open source status
Provide useful links for the implementation
No response
The text was updated successfully, but these errors were encountered: