Skip to content

Commit

Permalink
Merge pull request #745 from sujee/rag-example6-granite
Browse files Browse the repository at this point in the history
updating RAG example to use IBM granite model- Missing final test after changes per review.
  • Loading branch information
touma-I authored Oct 31, 2024
2 parents a725112 + 8155bb7 commit aa013d2
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 32 deletions.
2 changes: 1 addition & 1 deletion examples/notebooks/rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ REPLICATE_API_TOKEN=your REPLICATE token goes here

### 5.2 - Run the query code

Code: [rag_1D_query_llama_replicate.ipynb](rag_1D_query_llama_replicate.ipynb)
Code: [rag_1D_query_replicate.ipynb](rag_1D_query_replicate.ipynb)



Expand Down
6 changes: 4 additions & 2 deletions examples/notebooks/rag/my_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,10 @@ class MyConfig:
MY_CONFIG.EMBEDDING_LENGTH = 384

## LLM Model
MY_CONFIG.LLM_MODEL = "meta/meta-llama-3-8b-instruct"

# MY_CONFIG.LLM_MODEL = "meta/meta-llama-3-8b-instruct"
# MY_CONFIG.LLM_MODEL = "meta/meta-llama-3-70b-instruct"
# MY_CONFIG.LLM_MODEL = "ibm-granite/granite-3.0-2b-instruct"
MY_CONFIG.LLM_MODEL = "ibm-granite/granite-3.0-8b-instruct"


## RAY CONFIGURATION
Expand Down
8 changes: 4 additions & 4 deletions examples/notebooks/rag/rag_1A_dpk_process_ray.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "b0cd8ebd-bf71-42d6-a397-8df0c7b66a26",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -303,7 +303,7 @@
" \"data_files_to_use\": ast.literal_eval(\"['.pdf']\"),\n",
" # orchestrator\n",
" \"runtime_worker_options\": ParamsUtils.convert_to_ast(worker_options),\n",
" \"runtime_num_workers\": MY_CONFIG.RAY_RUNTIME_WORKERS,\n",
" \"runtime_num_workers\": 1, # so model download to cleanup works properly\n",
" \"runtime_pipeline_id\": \"pipeline_id\",\n",
" \"runtime_job_id\": \"job_id\",\n",
" \"runtime_code_location\": ParamsUtils.convert_to_ast(code_location),\n",
Expand Down Expand Up @@ -2159,7 +2159,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "data-prep-kit-3-py312",
"language": "python",
"name": "python3"
},
Expand All @@ -2173,7 +2173,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.12.7"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -249,33 +249,45 @@
"\n",
"### LLM Choices at Replicate\n",
"\n",
"- llama 3.1 : Latest\n",
" - **meta/meta-llama-3.1-405b-instruct** : Meta's flagship 405 billion parameter language model, fine-tuned for chat completions\n",
"- Base version of llama-3 from meta\n",
" - [meta/meta-llama-3-8b](https://replicate.com/meta/meta-llama-3-8b) : Base version of Llama 3, an 8 billion parameter language model from Meta.\n",
" - **meta/meta-llama-3-70b** : 70 billion\n",
"- Instruct versions of llama-3 from meta, fine tuned for chat completions\n",
" - **meta/meta-llama-3-8b-instruct** : An 8 billion parameter language model from Meta, \n",
" - **meta/meta-llama-3-70b-instruct** : 70 billion\n",
"\n",
"| Model | Publisher | Params | Description |\n",
"|-------------------------------------|-----------|--------|------------------------------------------------------|\n",
"| ibm-granite/granite-3.0-8b-instruct | IBM | 8 B | IBM's newest Granite Model v3.0 (default) |\n",
"| ibm-granite/granite-3.0-2b-instruct | IBM | 2 B | IBM's newest Granite Model v3.0 |\n",
"| meta/meta-llama-3.1-405b-instruct | Meta | 405 B | Meta's flagship 405 billion parameter language model |\n",
"| meta/meta-llama-3-8b-instruct | Meta | 8 B | Meta's 8 billion parameter language model |\n",
"| meta/meta-llama-3-70b-instruct | Meta | 70 B | Meta's 70 billion parameter language model |\n",
"\n",
"References \n",
"\n",
"- https://docs.llamaindex.ai/en/stable/examples/llm/llama_2/?h=replicate"
"- https://www.ibm.com/granite\n",
"- https://www.llama.com/\n",
"- https://replicate.com/ "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using model: ibm-granite/granite-3.0-8b-instruct\n"
]
}
],
"source": [
"import os\n",
"os.environ[\"REPLICATE_API_TOKEN\"] = MY_CONFIG.REPLICATE_API_TOKEN"
"os.environ[\"REPLICATE_API_TOKEN\"] = MY_CONFIG.REPLICATE_API_TOKEN\n",
"\n",
"print ('Using model:', MY_CONFIG.LLM_MODEL)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -335,7 +347,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 10,
"metadata": {},
"outputs": [
{
Expand All @@ -351,11 +363,11 @@
"Mayank Mishra ⋆ Matt Stallone ⋆ Gaoyuan Zhang ⋆ Yikang Shen Aditya Prasad Adriana Meza Soria Michele Merler Parameswaran Selvam Saptha Surendran Shivdeep Singh Manish Sethi Xuan-Hong Dang Pengyuan Li Kun-Lung Wu Syed Zawad Andrew Coleman Matthew White Mark Lewis Raju Pavuluri Yan Koyfman Boris Lublinsky Maximilien de Bayser Ibrahim Abdelaziz Kinjal Basu Mayank Agarwal Yi Zhou Chris Johnson Aanchal Goyal Hima Patel Yousaf Shah Petros Zerfos Heiko Ludwig Asim Munawar Maxwell Crouse Pavan Kapanipathi Shweta Salaria Bob Calio Sophia Wen Seetharami Seelam Brian Belgodere Carlos Fonseca Amith Singhee Nirmit Desai David D. Cox Ruchir Puri † Rameswar Panda †\n",
"============ end context ============\n",
"============ here is the answer from LLM... STREAMING... =====\n",
"Based on the provided context, the training data used to train Granite models is not explicitly mentioned. However, it is mentioned that the 20B model was used after 1.6T tokens to start training of 34B model with the same code pretraining data without any changes to the training and inference framework. This implies that the same code pretraining data was used for both models, but the exact nature of this data is not specified.\n",
"The context does not provide specific details about the training data used to train the Granite models. It only mentions that the 20B model was trained after 1.6T tokens and then used to start training the 34B model with the same code pretraining data. However, it does not specify what this code pretraining data is.\n",
"====== end LLM answer ======\n",
"\n",
"CPU times: user 75.3 ms, sys: 37.8 ms, total: 113 ms\n",
"Wall time: 1.95 s\n"
"CPU times: user 63.6 ms, sys: 12 ms, total: 75.6 ms\n",
"Wall time: 1.43 s\n"
]
}
],
Expand All @@ -369,7 +381,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"outputs": [
{
Expand All @@ -385,11 +397,11 @@
"We are excited about the future of attention-based models and plan to apply them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. Making generation less sequential is another research goals of ours.\n",
"============ end context ============\n",
"============ here is the answer from LLM... STREAMING... =====\n",
"Based on the provided context, an attention mechanism can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is computed as a weighted sum.\n",
"An attention mechanism is a method used in sequence modeling and transduction models to model dependencies between elements in input or output sequences, regardless of their distance. It maps a query and a set of key-value pairs to an output, which is computed as a weighted sum.\n",
"====== end LLM answer ======\n",
"\n",
"CPU times: user 41.1 ms, sys: 28.7 ms, total: 69.8 ms\n",
"Wall time: 1.58 s\n"
"CPU times: user 30.6 ms, sys: 17.3 ms, total: 47.9 ms\n",
"Wall time: 880 ms\n"
]
}
],
Expand All @@ -403,7 +415,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 12,
"metadata": {},
"outputs": [
{
Expand All @@ -419,11 +431,11 @@
"The Granite Code models achieve relatively high accuracy across all sizes (e.g., outperforming CodeGemma at 2B-3B scale, StarCoder2 at 7B-8B scale and CodeLlama models with half of the sizes). This shows that our Granite Code models are not only capable of generating good code but also of using libraries more accurately in real data science workflows.\n",
"============ end context ============\n",
"============ here is the answer from LLM... STREAMING... =====\n",
"I apologize, but the provided context does not mention the moon landing. The context appears to be about code generation and evaluation benchmarks, specifically discussing the MBPP and MBPP+ benchmarks, and the performance of different code models. There is no mention of the moon landing. If you provide a different context or question, I'll be happy to help.\n",
"I'm sorry, the provided context does not contain information about the moon landing.\n",
"====== end LLM answer ======\n",
"\n",
"CPU times: user 41.5 ms, sys: 21 ms, total: 62.5 ms\n",
"Wall time: 2.13 s\n"
"CPU times: user 45 ms, sys: 3.19 ms, total: 48.2 ms\n",
"Wall time: 412 ms\n"
]
}
],
Expand All @@ -445,7 +457,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "data-prep-kit-4-021",
"language": "python",
"name": "python3"
},
Expand Down

0 comments on commit aa013d2

Please sign in to comment.