You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using prefix caching = True
Using Attention = flashinfer
WARNING 11-10 11:16:48 ray_utils.py:46] Failed to import Ray with ModuleNotFoundError("No module named 'ray'"). For distributed inference, please install Ray with pip install ray.
/usr/src/server/text_generation_server/layers/gptq/triton.py:242: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd(cast_inputs=torch.float16)
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
@custom_bwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
@custom_bwd
/opt/conda/lib/python3.11/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
return func(*args, **kwargs)
Using experimental prefill chunking = False
/usr/src/server/text_generation_server/server.py(278)serve_inner()
-> server = aio.server(
(Pdb) c
Server started at unix:///tmp/text-generation-server-0
System Info
Using prefix caching = True
Using Attention = flashinfer
WARNING 11-10 11:16:48 ray_utils.py:46] Failed to import Ray with ModuleNotFoundError("No module named 'ray'"). For distributed inference, please install Ray with
pip install ray
./usr/src/server/text_generation_server/layers/gptq/triton.py:242: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.@custom_fwd(cast_inputs=torch.float16)
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.@custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.@custom_bwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.@custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.@custom_bwd
/opt/conda/lib/python3.11/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
return func(*args, **kwargs)
Using experimental prefill chunking = False
Information
Tasks
Reproduction
SAFETENSORS_FAST_GPU=1 python text_generation_server/cli.py serve state-spaces/mamba-130
Expected behavior
launching webserver
The text was updated successfully, but these errors were encountered: