Skip to content

Commit

Permalink
demos/rag: Introduce GPU-based Question-Answering
Browse files Browse the repository at this point in the history
Introduce a new GPU variant of the Question-Answering demo, mirroring
its CPU counterpart but leveraging GPUs for both the Embeddings model
and the LLM model Inference Services (ISVCs).

Using KServe with the Triton Inference Server backend, significantly
boosts performance. Triton supports different backends itself, from
simple Python scripts to TensorRT engines.

The LLM ISVC runs the Llama 2 7B variant on the TensorRT-LLM backend.
The Embeddings model ISVC runs BGE-M3, fine-tuned with data from the
EzUA, EzDF, MLDE, and MLDM docs, ensuring optimized response accuracy
and speed.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
  • Loading branch information
Dimitris Poulopoulos committed Feb 29, 2024
1 parent b7b9e64 commit 6a77763
Show file tree
Hide file tree
Showing 55 changed files with 77,368 additions and 0 deletions.
406 changes: 406 additions & 0 deletions demos/rag-demos/question-answering-gpu/01.create-vectorstore.ipynb

Large diffs are not rendered by default.

486 changes: 486 additions & 0 deletions demos/rag-demos/question-answering-gpu/02.serve-vectorstore.ipynb

Large diffs are not rendered by default.

225 changes: 225 additions & 0 deletions demos/rag-demos/question-answering-gpu/03.document-prediction.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "972db2df-307f-4492-80c6-e84082d778f2",
"metadata": {
"tags": []
},
"source": [
"# Invoking and Testing the Vector Store Inference Service (Optional)\n",
"\n",
"Welcome to the third part of the tutorial series on building a question-answering application over a corpus of private\n",
"documents using Large Language Models (LLMs). In the previous Notebooks, you've transformed unstructured text data into\n",
"structured vector embeddings, stored them in a Vector Store, deployed an Inference Service (ISVC) to serve the Vector Store,\n",
"and deploy the fine-tuned embeddings model using KServe and Triton.\n",
"\n",
"In this Notebook, you focus on invoking the Vector Store ISVC you've created and testing its performance. This\n",
"is an essential step, as it allows you to verify the functionality of your service and observe how it performs in\n",
"practice. Throughout this Notebook, you construct suitable requests, communicate with the service, and interpret the\n",
"responses.\n",
"\n",
"By the end of this Notebook, you will gain practical insights into the workings of the Vector Store ISVC and will be\n",
"well-prepared to integrate it into a larger system, alongside the LLM ISVC that you create in the subsequent Notebook.\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Invoke the Inference Service](#invoke-the-inference-service)\n",
"1. [Conclusion and Next Steps](#conclusion-and-next-steps)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "428fd850-d35a-476f-ba05-b11763ddec68",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"import getpass\n",
"import requests\n",
"import ipywidgets as widgets\n",
"\n",
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"id": "f1f8bb43-af00-4dae-bf22-dec236bcafe7",
"metadata": {
"tags": []
},
"source": [
"# Invoke the Inference Service\n",
"\n",
"First, you need to construct the URL you use in POST request. For this example, you use the V1 inference protocol,\n",
"described below:\n",
"\n",
"| API | Verb | Path | Request Payload | Response Payload |\n",
"|--------------|------|-------------------------------|-------------------|-----------------------------------|\n",
"| List Models | GET | /v1/models | | {\"models\": [<model_name>]} |\n",
"| Model Ready | GET | /v1/models/<model_name> | | {\"name\": <model_name>,\"ready\": $bool} |\n",
"| Predict | POST | /v1/models/<model_name>:predict | {\"instances\": []}* | {\"predictions\": []} |\n",
"| Explain | POST | /v1/models/<model_name>:explain | {\"instances\": []}* | {\"predictions\": [], \"explanations\": []} |\n",
"\n",
"\\* Payload is optional\n",
"\n",
"You want to invoke the `predict` API. So let's use a simple query to test the service:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e060904d-a7b8-4835-99d6-f9890b6afbf9",
"metadata": {},
"outputs": [],
"source": [
"# Add heading\n",
"heading = widgets.HTML(\"<h2>Credentials</h2>\")\n",
"display(heading)\n",
"\n",
"domain_input = widgets.Text(description='Username:', placeholder=\"i001ua.tryezmeral.com\")\n",
"username_input = widgets.Text(description='Username:')\n",
"password_input = widgets.Password(description='Password:')\n",
"submit_button = widgets.Button(description='Submit')\n",
"success_message = widgets.Output()\n",
"\n",
"domain = None\n",
"username = None\n",
"password = None\n",
"\n",
"def submit_button_clicked(b):\n",
" global domain, username, password\n",
" domain = domain_input.value\n",
" username = username_input.value\n",
" password = password_input.value\n",
" with success_message:\n",
" success_message.clear_output()\n",
" print(\"Credentials submitted successfully!\")\n",
" submit_button.disabled = True\n",
"\n",
"submit_button.on_click(submit_button_clicked)\n",
"\n",
"# Set margin on the submit button\n",
"submit_button.layout.margin = '20px 0 20px 0'\n",
"\n",
"# Display inputs and button\n",
"display(domain_input, username_input, password_input, submit_button, success_message)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c546186-ac65-457b-b4fd-bb86767cc3d4",
"metadata": {},
"outputs": [],
"source": [
"token_url = f\"https://keycloak.{domain}/realms/UA/protocol/openid-connect/token\"\n",
"\n",
"data = {\n",
" \"username\" : username,\n",
" \"password\" : password,\n",
" \"grant_type\" : \"password\",\n",
" \"client_id\" : \"ua-grant\",\n",
"}\n",
"\n",
"token_responce = requests.post(token_url, data=data, allow_redirects=True, verify=False)\n",
"\n",
"token = token_responce.json()[\"access_token\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "173e2ebd-5e3b-4289-8358-9406ba816921",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"DOMAIN_NAME = \"svc.cluster.local\"\n",
"NAMESPACE = \"bob\"\n",
"DEPLOYMENT_NAME = \"vectorstore-predictor\"\n",
"MODEL_NAME = \"vectorstore\"\n",
"SVC = f'{DEPLOYMENT_NAME}.{NAMESPACE}.{DOMAIN_NAME}'\n",
"URL = f\"https://{SVC}/v1/models/{MODEL_NAME}:predict\"\n",
"\n",
"print(URL)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "78da091c-9fce-4f91-8382-e5c785bdf24f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"data = {\n",
" \"instances\": [{\n",
" \"input\": \"How can I get started with HPE Ezmeral Unified Anaytics?\",\n",
" \"num_docs\": 4 # number of documents to retrieve\n",
" }]\n",
"}\n",
"\n",
"headers = {\"Authorization\": f\"Bearer {token}\"}\n",
"\n",
"response = requests.post(URL, json=data, headers=headers, verify=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "461fdac2-cacb-40cc-bf2d-d1548072bb90",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"result = json.loads(response.text)[\"predictions\"]; result"
]
},
{
"cell_type": "markdown",
"id": "0e6c9e0e-6d17-4d15-ba4e-e353cc1cd3c2",
"metadata": {
"tags": []
},
"source": [
"# Conclusion and Next Steps\n",
"\n",
"Well done! Through this Notebook, you've successfully interacted with and tested the Vector Store ISVC. You've learned\n",
"how to construct and send requests to the service and how to interpret the responses. This hands-on experience is\n",
"crucial as it provides a practical understanding of the service's operation, preparing you for real-world applications.\n",
"\n",
"In the next Notebook, you extend your question-answering system by creating an ISVC for the LLM. The LLM ISVC works in\n",
"conjunction with the Vector Store ISVC to provide comprehensive and accurate answers to user queries."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 6a77763

Please sign in to comment.