You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've tried both offline batch inference and server inference. I found that with the same dataset and the same model, the speed of server inference is more than twice as slow as that of offline batch inference.
I guess the main reason is that offline batch inference uses batched inference, because the default value of max_num_seqs is 256. (Please correct me if my understanding is wrong.)
If I change the offline inference to input one by one, it will become very slow.
However, I don't know how to transform server inference into the form of batch inference.
Also, I'm wondering if there are other options that have caused the server to slow down. If there are, please let me know. I'd be really grateful!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've tried both offline batch inference and server inference. I found that with the same dataset and the same model, the speed of server inference is more than twice as slow as that of offline batch inference.
I guess the main reason is that offline batch inference uses batched inference, because the default value of max_num_seqs is 256. (Please correct me if my understanding is wrong.)
If I change the offline inference to input one by one, it will become very slow.
However, I don't know how to transform server inference into the form of batch inference.
Also, I'm wondering if there are other options that have caused the server to slow down. If there are, please let me know. I'd be really grateful!
The offline batch inference scripts is like:
The server scripts:
And the client scripts:
Beta Was this translation helpful? Give feedback.
All reactions