You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@pefortin messaged to say he's seeing about a 5x total output token rate when enabling continuous batching, although which tools/models he was using is unclear. Nonetheless, it looks like this is now supported in ollama:
Work item is to enable this in ollama, then test simulataneous queries to make sure everything works and boosts performance. This will apparently require some additional memory for the input tokens of all the requests, so should be monitored.
The text was updated successfully, but these errors were encountered:
@pefortin messaged to say he's seeing about a 5x total output token rate when enabling continuous batching, although which tools/models he was using is unclear. Nonetheless, it looks like this is now supported in ollama:
ollama/ollama#1396
Work item is to enable this in ollama, then test simulataneous queries to make sure everything works and boosts performance. This will apparently require some additional memory for the input tokens of all the requests, so should be monitored.
The text was updated successfully, but these errors were encountered: