You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The large size of foundation models raise several resource and cost questions around deploying them in production. This EPIC will focus on creating experiments and showing results around some of the following questions:
What is the relationship between model parameters and its memory consumption? Create a "rosetta stone" document of GPU memory required by models belonging to different parameter size. Create a notebook that captures the footprint of the GPU memory used.
How are the models loaded into GPU RAM? Is it directly from S3 or do we also require significant RAM? If so, capture the RAM requirements in a notebook. What about CPU? Update the cost document with RAM, and CPU information. Are there ways to optimize this?
What happens when we load the models in a lower precision format like INT-8? How is the accuracy, CPU, and memory performance affected? Explain theoretically and show results in a notebook. Touch upon challenges of frameworks like bitsandbytes in production.
Is distributed training and inference with a lot of cheap instances more efficient per dollar than 1 instance with a large GPU? If we have just one GPU of 16GB memory, how much can be done with it in the space of LLMs? Design experiments and share results in a notebook.
What are the options of running these models just on CPU? Are there ways of optimizing more than 1B = 1GB of GPU with INT8 precision?
The text was updated successfully, but these errors were encountered:
The large size of foundation models raise several resource and cost questions around deploying them in production. This EPIC will focus on creating experiments and showing results around some of the following questions:
The text was updated successfully, but these errors were encountered: