You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Foundation models need to be adapted for specific use cases and domains. There are several questions on around how to target different use cases. As a part of this epic, we will find answers to the following questions:
How do different variants of llms compare with each other, in terms of architecture (input tokens, hidden & attention layers, parameters, decoder encoder variations), licenses, hardware utilization, etc.?
What is the difference between small FMs (<15B) and large FMs(>50B)?
How does performance vary for few shot prompting large models vs fine tuning smaller models?
Do we need a hierarchy of models for specific tasks? For example, one base large model for text generation and two smaller models each for code generation and documentation QA? What's the difference between Bloom 13B and Bloom 3B?
Do smaller models have a smaller context window or token limit and is that a limitation? How are contexts used by the models, in other words how is the model learning complemented by the context to generate a response?
What is the relevance of vector databases in these solutions? Are they still relevant in smaller fine-tuned models with smaller context windows?
What are the production cost and performance comparisons of these approaches? Design experiments to show some of these comparisons.
What is the role of datasets in fine tuning? Does fine tuning for a domain require a QA format dataset or self-supervised masking words in a sentence (recheck) dataset? Can we try BERT based models that have a different architecture?
Foundation models need to be adapted for specific use cases and domains. There are several questions on around how to target different use cases. As a part of this epic, we will find answers to the following questions:
The text was updated successfully, but these errors were encountered: