-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
demos/rag: Introduce GPU-based Question-Answering
Introduce a new GPU variant of the Question-Answering demo, mirroring its CPU counterpart but leveraging GPUs for both the Embeddings model and the LLM model Inference Services (ISVCs). Using KServe with the Triton Inference Server backend, significantly boosts performance. Triton supports different backends itself, from simple Python scripts to TensorRT engines. The LLM ISVC runs the Llama 2 7B variant on the TensorRT-LLM backend. The Embeddings model ISVC runs BGE-M3, fine-tuned with data from the EzUA, EzDF, MLDE, and MLDM docs, ensuring optimized response accuracy and speed. Signed-off-by: Dimitris Poulopoulos <[email protected]>
- Loading branch information
Dimitris Poulopoulos
committed
Feb 29, 2024
1 parent
b7b9e64
commit 6a77763
Showing
55 changed files
with
77,368 additions
and
0 deletions.
There are no files selected for viewing
406 changes: 406 additions & 0 deletions
406
demos/rag-demos/question-answering-gpu/01.create-vectorstore.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
486 changes: 486 additions & 0 deletions
486
demos/rag-demos/question-answering-gpu/02.serve-vectorstore.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
225 changes: 225 additions & 0 deletions
225
demos/rag-demos/question-answering-gpu/03.document-prediction.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "972db2df-307f-4492-80c6-e84082d778f2", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"# Invoking and Testing the Vector Store Inference Service (Optional)\n", | ||
"\n", | ||
"Welcome to the third part of the tutorial series on building a question-answering application over a corpus of private\n", | ||
"documents using Large Language Models (LLMs). In the previous Notebooks, you've transformed unstructured text data into\n", | ||
"structured vector embeddings, stored them in a Vector Store, deployed an Inference Service (ISVC) to serve the Vector Store,\n", | ||
"and deploy the fine-tuned embeddings model using KServe and Triton.\n", | ||
"\n", | ||
"In this Notebook, you focus on invoking the Vector Store ISVC you've created and testing its performance. This\n", | ||
"is an essential step, as it allows you to verify the functionality of your service and observe how it performs in\n", | ||
"practice. Throughout this Notebook, you construct suitable requests, communicate with the service, and interpret the\n", | ||
"responses.\n", | ||
"\n", | ||
"By the end of this Notebook, you will gain practical insights into the workings of the Vector Store ISVC and will be\n", | ||
"well-prepared to integrate it into a larger system, alongside the LLM ISVC that you create in the subsequent Notebook.\n", | ||
"\n", | ||
"## Table of Contents\n", | ||
"\n", | ||
"1. [Invoke the Inference Service](#invoke-the-inference-service)\n", | ||
"1. [Conclusion and Next Steps](#conclusion-and-next-steps)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "428fd850-d35a-476f-ba05-b11763ddec68", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import json\n", | ||
"import getpass\n", | ||
"import requests\n", | ||
"import ipywidgets as widgets\n", | ||
"\n", | ||
"from IPython.display import display" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f1f8bb43-af00-4dae-bf22-dec236bcafe7", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"# Invoke the Inference Service\n", | ||
"\n", | ||
"First, you need to construct the URL you use in POST request. For this example, you use the V1 inference protocol,\n", | ||
"described below:\n", | ||
"\n", | ||
"| API | Verb | Path | Request Payload | Response Payload |\n", | ||
"|--------------|------|-------------------------------|-------------------|-----------------------------------|\n", | ||
"| List Models | GET | /v1/models | | {\"models\": [<model_name>]} |\n", | ||
"| Model Ready | GET | /v1/models/<model_name> | | {\"name\": <model_name>,\"ready\": $bool} |\n", | ||
"| Predict | POST | /v1/models/<model_name>:predict | {\"instances\": []}* | {\"predictions\": []} |\n", | ||
"| Explain | POST | /v1/models/<model_name>:explain | {\"instances\": []}* | {\"predictions\": [], \"explanations\": []} |\n", | ||
"\n", | ||
"\\* Payload is optional\n", | ||
"\n", | ||
"You want to invoke the `predict` API. So let's use a simple query to test the service:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e060904d-a7b8-4835-99d6-f9890b6afbf9", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Add heading\n", | ||
"heading = widgets.HTML(\"<h2>Credentials</h2>\")\n", | ||
"display(heading)\n", | ||
"\n", | ||
"domain_input = widgets.Text(description='Username:', placeholder=\"i001ua.tryezmeral.com\")\n", | ||
"username_input = widgets.Text(description='Username:')\n", | ||
"password_input = widgets.Password(description='Password:')\n", | ||
"submit_button = widgets.Button(description='Submit')\n", | ||
"success_message = widgets.Output()\n", | ||
"\n", | ||
"domain = None\n", | ||
"username = None\n", | ||
"password = None\n", | ||
"\n", | ||
"def submit_button_clicked(b):\n", | ||
" global domain, username, password\n", | ||
" domain = domain_input.value\n", | ||
" username = username_input.value\n", | ||
" password = password_input.value\n", | ||
" with success_message:\n", | ||
" success_message.clear_output()\n", | ||
" print(\"Credentials submitted successfully!\")\n", | ||
" submit_button.disabled = True\n", | ||
"\n", | ||
"submit_button.on_click(submit_button_clicked)\n", | ||
"\n", | ||
"# Set margin on the submit button\n", | ||
"submit_button.layout.margin = '20px 0 20px 0'\n", | ||
"\n", | ||
"# Display inputs and button\n", | ||
"display(domain_input, username_input, password_input, submit_button, success_message)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "4c546186-ac65-457b-b4fd-bb86767cc3d4", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"token_url = f\"https://keycloak.{domain}/realms/UA/protocol/openid-connect/token\"\n", | ||
"\n", | ||
"data = {\n", | ||
" \"username\" : username,\n", | ||
" \"password\" : password,\n", | ||
" \"grant_type\" : \"password\",\n", | ||
" \"client_id\" : \"ua-grant\",\n", | ||
"}\n", | ||
"\n", | ||
"token_responce = requests.post(token_url, data=data, allow_redirects=True, verify=False)\n", | ||
"\n", | ||
"token = token_responce.json()[\"access_token\"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "173e2ebd-5e3b-4289-8358-9406ba816921", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"DOMAIN_NAME = \"svc.cluster.local\"\n", | ||
"NAMESPACE = \"bob\"\n", | ||
"DEPLOYMENT_NAME = \"vectorstore-predictor\"\n", | ||
"MODEL_NAME = \"vectorstore\"\n", | ||
"SVC = f'{DEPLOYMENT_NAME}.{NAMESPACE}.{DOMAIN_NAME}'\n", | ||
"URL = f\"https://{SVC}/v1/models/{MODEL_NAME}:predict\"\n", | ||
"\n", | ||
"print(URL)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "78da091c-9fce-4f91-8382-e5c785bdf24f", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"data = {\n", | ||
" \"instances\": [{\n", | ||
" \"input\": \"How can I get started with HPE Ezmeral Unified Anaytics?\",\n", | ||
" \"num_docs\": 4 # number of documents to retrieve\n", | ||
" }]\n", | ||
"}\n", | ||
"\n", | ||
"headers = {\"Authorization\": f\"Bearer {token}\"}\n", | ||
"\n", | ||
"response = requests.post(URL, json=data, headers=headers, verify=False)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "461fdac2-cacb-40cc-bf2d-d1548072bb90", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"result = json.loads(response.text)[\"predictions\"]; result" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0e6c9e0e-6d17-4d15-ba4e-e353cc1cd3c2", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"# Conclusion and Next Steps\n", | ||
"\n", | ||
"Well done! Through this Notebook, you've successfully interacted with and tested the Vector Store ISVC. You've learned\n", | ||
"how to construct and send requests to the service and how to interpret the responses. This hands-on experience is\n", | ||
"crucial as it provides a practical understanding of the service's operation, preparing you for real-world applications.\n", | ||
"\n", | ||
"In the next Notebook, you extend your question-answering system by creating an ISVC for the LLM. The LLM ISVC works in\n", | ||
"conjunction with the Vector Store ISVC to provide comprehensive and accurate answers to user queries." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.