You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7742 64-Core Processor
Stepping: 0
CPU MHz: 2250.000
CPU max MHz: 2250.0000
CPU min MHz: 1500.0000
Using text-generation-inference docker containers, the issue is reproduced with TGI 2.3.1 & 2.2.0
Note that the issue is still present when passing the JSON schema instead of the regex, but for various reasons we compute the regex ourselves using outlines.build_regex_from_schema.
So when running above curl, TGI does accept the request and start processing. The %CPU goes to 99.7% and it stays like this for very long time until I decide to kill it. The container no longer accepts new requests and becomes useless.
Expected behavior
A timeout that would stop the processing before it runs for too long.
Not sure which kind of timeout is relevant but since no token is received, a timeout until first token is received would make sense for me.
Here's another schema that also triggers above issue but is more common
The text was updated successfully, but these errors were encountered:
Rictus
changed the title
Complexe regex or schema can lead the model to run forever on CPU
Complexe response format lead the container to run forever on CPU
Oct 25, 2024
System Info
System:
Linux 4.18.0-553.22.1.el8_10.x86_64 #1 SMP Wed Sep 25 09:20:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Rocky Linux 8.10
Model:
mistralai/Mistral-Nemo-Instruct-2407
Hardware:
NVIDIA A100-SXM4-80GB
Using text-generation-inference docker containers, the issue is reproduced with TGI 2.3.1 & 2.2.0
Information
Tasks
Reproduction
2. Make a chat completion call with a complexe regex in response format:
JSON schema for above regex
Note that the issue is still present when passing the JSON schema instead of the regex, but for various reasons we compute the regex ourselves using outlines.build_regex_from_schema.
So when running above curl, TGI does accept the request and start processing. The %CPU goes to 99.7% and it stays like this for very long time until I decide to kill it. The container no longer accepts new requests and becomes useless.
Expected behavior
A timeout that would stop the processing before it runs for too long.
Not sure which kind of timeout is relevant but since no token is received, a timeout until first token is received would make sense for me.
Here's another schema that also triggers above issue but is more common
The text was updated successfully, but these errors were encountered: