You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Everything done here was tested and performed on a vllm of llama 3 8b instruct on an A100 80GB
I was running a guardrails on C3 8 cpu 16GB RAM
I have done some test and here is what i have found :
Base model response is generally in 100s of milli seconds (EVEN IF I GIVE IT THE VECTOR DB DATA MYSELF AND ASK IT IN A QUERY)
Nemo Guardrails (Bare Bones NO KB, NO Colang or anything) takes around 3.5s
Nemo Guardrails (Qdrant VectorDB, No colang , No input and output , dialog rails) takes around 10 to 11s
No my issues is why does it take so much time to generate query with a vector DB ? Because the total latency of all calls (LLM, VectorDB shouldnt be more than 500ms) so there shouldn't be such a difference.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Everything done here was tested and performed on a vllm of llama 3 8b instruct on an A100 80GB
I was running a guardrails on C3 8 cpu 16GB RAM
I have done some test and here is what i have found :
Base model response is generally in 100s of milli seconds (EVEN IF I GIVE IT THE VECTOR DB DATA MYSELF AND ASK IT IN A QUERY)
Nemo Guardrails (Bare Bones NO KB, NO Colang or anything) takes around 3.5s
Nemo Guardrails (Qdrant VectorDB, No colang , No input and output , dialog rails) takes around 10 to 11s
No my issues is why does it take so much time to generate query with a vector DB ? Because the total latency of all calls (LLM, VectorDB shouldnt be more than 500ms) so there shouldn't be such a difference.
Is there something that I am missing ?
Beta Was this translation helpful? Give feedback.
All reactions