Fixes #183: Preload models during server initialization to prevent request timeouts #227

Cola-Rex · 2024-10-09T11:43:24Z

Fix issue #183 : Pre-download models during server initialization to prevent HTTP timeouts

This commit moves the model downloading logic from the chat_completion method to the initialize method in OllamaInferenceAdapter. By pre-loading required models during server startup, we ensure that large models (e.g., 16GB) are downloaded before serving requests, thus preventing HTTP request timeouts and aborted downloads during the first inference request.

Closes #183

terrytangyuan

Can you update the PR title?

What is the behavior when client sends a request while the server is still starting up?

Cola-Rex · 2024-10-10T03:44:36Z

Thank you for your feedback! I've updated the PR title as requested.

Regarding your question about the behavior when a client sends a request while the server is starting up:

The server currently initializes and preloads the models before accepting any incoming requests.
This means that if a client tries to send a request during the initialization phase, they will receive a connection error because the server is not yet listening on the port.
Once initialization is complete and the server is running, it will accept and process client requests as usual.
This approach ensures that all required models are loaded and ready, preventing any request failures or timeouts due to missing models.

ashwinb

See my comment on #183

We thought long and hard about this and went through a series of changes. Where we landed at is the following:

a user should be loading and running the ollama server before running the llama stack. this is as it works for any other provider (e.g., if you have a remote provider like fireworks surely we can't be in the business of taking over their lifecycle.)
when the stack starts, it consults the provider throough a list_models() method to know which models it is running. for ollama, we invoke ollama.client.ps() for this. only these models are registered and available for serving (naturally!)
registering of new models dynamically is not permitted right now, although we could make that work easily enough (by calling ollama.client.run()) but then it would have to be done by the user explicitly through the models/register llama stack API.

You can see this here: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/ollama/ollama.py#L61-L85

ashwinb · 2024-10-10T18:26:01Z

The most important part of the comment above is: the Llama Stack cannot and will not take responsibility of managing the "operations" or "lifecycle" of a provider (which we consider to be remote.) Preloading is like that. We have introduced /models/register as a generic API and as I commented above, should the user request it we can make the register_model() method on ollama.py call client.run() to make this possible and it will do what you perhaps wanted to accomplish?

…prevent request timeouts

Cola-Rex · 2024-10-11T04:33:53Z

Hello,

Thank you for your detailed feedback and for directing me to your comment on #183.

I understand that Llama Stack should not manage the operations or lifecycle of providers like Ollama, and that preloading models is considered part of that lifecycle management. Based on your guidance, I've updated the PR accordingly:

Removed model preloading during server initialization:

The adapter no longer attempts to preload models or manage the provider's lifecycle.
Users are expected to start the Ollama server and load any required models independently before running Llama Stack.
Implemented the register_model() method:

The register_model() method in ollama.py now calls client.pull() to allow users to explicitly register and load models via the /models/register API.
This enables dynamic model registration when requested by the user, aligning with the approach you've outlined.
Updated the adapter to align with project guidelines:

The adapter assumes that the Ollama server is running and only interacts with it through defined APIs.
It no longer manages any operations related to the provider's lifecycle.
I believe these changes address your concerns and align with the project's design principles. Please let me know if there's anything I've missed or if further adjustments are needed.

Thank you for your guidance and support!

Cola-Rex requested review from ashwinb, yanxi0830, hardikjshah, dltn and raghotham as code owners October 9, 2024 11:43

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 9, 2024

terrytangyuan reviewed Oct 9, 2024

View reviewed changes

Cola-Rex changed the title ~~Fix issue #183~~ Fix issue #183 :Preload models during server initialization to prevent request timeouts Oct 10, 2024

Cola-Rex changed the title ~~Fix issue #183 :Preload models during server initialization to prevent request timeouts~~ Fixes #183: Preload models during server initialization to prevent request timeouts Oct 10, 2024

ashwinb requested changes Oct 10, 2024

View reviewed changes

Cola-Rex closed this Oct 11, 2024

Cola-Rex force-pushed the issue#183 branch from 5127df3 to 9fbe885 Compare October 11, 2024 03:54

Fixes meta-llama#183: Preload models during server initialization to …

4ce9314

…prevent request timeouts

Cola-Rex reopened this Oct 11, 2024

Cola-Rex requested a review from ashwinb October 14, 2024 08:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #183: Preload models during server initialization to prevent request timeouts #227

Fixes #183: Preload models during server initialization to prevent request timeouts #227

Cola-Rex commented Oct 9, 2024

terrytangyuan left a comment •

edited

Loading

Cola-Rex commented Oct 10, 2024

ashwinb left a comment

ashwinb commented Oct 10, 2024

Cola-Rex commented Oct 11, 2024

Fixes #183: Preload models during server initialization to prevent request timeouts #227

Are you sure you want to change the base?

Fixes #183: Preload models during server initialization to prevent request timeouts #227

Conversation

Cola-Rex commented Oct 9, 2024

terrytangyuan left a comment • edited Loading

Choose a reason for hiding this comment

Cola-Rex commented Oct 10, 2024

ashwinb left a comment

Choose a reason for hiding this comment

ashwinb commented Oct 10, 2024

Cola-Rex commented Oct 11, 2024

terrytangyuan left a comment •

edited

Loading