diff --git a/.github/workflows/publish_release.yml b/.github/workflows/publish_release.yml
index fdb07cb5aae36..62e7457da3760 100644
--- a/.github/workflows/publish_release.yml
+++ b/.github/workflows/publish_release.yml
@@ -14,6 +14,7 @@ env:
jobs:
build-n-publish:
name: Build and publish to PyPI
+ if: github.repository == 'run-llama/llama_index'
runs-on: ubuntu-latest
steps:
diff --git a/.github/workflows/publish_sub_package.yml b/.github/workflows/publish_sub_package.yml
new file mode 100644
index 0000000000000..3dd6ae4bc18cb
--- /dev/null
+++ b/.github/workflows/publish_sub_package.yml
@@ -0,0 +1,43 @@
+name: Publish Sub-Package to PyPI if Needed
+
+on:
+ push:
+ branches:
+ - main
+
+env:
+ POETRY_VERSION: "1.6.1"
+ PYTHON_VERSION: "3.10"
+
+jobs:
+ publish_subpackage_if_needed:
+ if: github.repository == 'run-llama/llama_index'
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v3
+ with:
+ fetch-depth: 0
+ - name: Set up python ${{ env.PYTHON_VERSION }}
+ uses: actions/setup-python@v4
+ with:
+ python-version: ${{ env.PYTHON_VERSION }}
+ - name: Install Poetry
+ uses: snok/install-poetry@v1
+ with:
+ version: ${{ env.POETRY_VERSION }}
+ - name: Get changed pyproject files
+ id: changed-files
+ run: |
+ echo "changed_files=$(git diff --name-only ${{ github.event.before }} ${{ github.event.after }} | grep -v llama-index-core | grep llama-index | grep pyproject | xargs)" >> $GITHUB_OUTPUT
+ - name: Publish changed packages
+ env:
+ PYPI_TOKEN: ${{ secrets.LLAMA_INDEX_PYPI_TOKEN }}
+ run: |
+ for file in ${{ steps.changed-files.outputs.changed_files }}; do
+ cd `echo $file | sed 's/\/pyproject.toml//g'`
+ poetry lock
+ pip install -e .
+ poetry config pypi-token.pypi $PYPI_TOKEN
+ poetry publish --build
+ cd -
+ done
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 275302e0025e8..b2676ba77f4da 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,43 @@
# ChangeLog
+## [0.10.16] - 2024-03-05
+
+### New Features
+
+- Anthropic support for new models (#11623, #11612)
+- Easier creation of chat prompts (#11583)
+- Added a raptor retriever llama-pack (#11527)
+- Improve batch cohere embeddings through bedrock (#11572)
+- Added support for vertex AI embeddings (#11561)
+
+### Bug Fixes / Nits
+
+- Ensure order in async embeddings generation (#11562)
+- Fixed empty metadata for csv reader (#11563)
+- Serializable fix for composable retrievers (#11617)
+- Fixed milvus metadata filter support (#11566)
+- FIxed pydantic import in clickhouse vector store (#11631)
+- Fixed system prompts for gemini/vertext-gemini (#11511)
+
+## [0.10.15] - 2024-03-01
+
+### New Features
+
+- Added FeishuWikiReader (#11491)
+- Added videodb retriever integration (#11463)
+- Added async to opensearch vector store (#11513)
+- New LangFuse one-click callback handler (#11324)
+
+### Bug Fixes / Nits
+
+- Fixed deadlock issue with async chat streaming (#11548)
+- Improved hidden file check in SimpleDirectoryReader (#11496)
+- Fixed null values in document metadata when using SimpleDirectoryReader (#11501)
+- Fix for sqlite utils in jsonalyze query engine (#11519)
+- Added base url and timeout to ollama multimodal LLM (#11526)
+- Updated duplicate handling in query fusion retriever (#11542)
+- Fixed bug in kg indexx struct updating (#11475)
+
## [0.10.14] - 2024-02-28
### New Features
diff --git a/docs/BUILD b/docs/BUILD
new file mode 100644
index 0000000000000..db46e8d6c978c
--- /dev/null
+++ b/docs/BUILD
@@ -0,0 +1 @@
+python_sources()
diff --git a/docs/community/integrations/uptrain.md b/docs/community/integrations/uptrain.md
index d07e25fcbf1b5..da808ae3db271 100644
--- a/docs/community/integrations/uptrain.md
+++ b/docs/community/integrations/uptrain.md
@@ -1,89 +1,121 @@
# Perform Evaluations on LlamaIndex with UpTrain
-**Overview**: In this example, we will see how to use UpTrain with LlamaIndex.
+**Overview**: In this example, we will see how to use UpTrain with LlamaIndex. UpTrain ([github](https://github.com/uptrain-ai/uptrain) || [website](https://github.com/uptrain-ai/uptrain/) || [docs](https://docs.uptrain.ai/)) is an open-source platform to evaluate and improve GenAI applications. It provides grades for 20+ preconfigured checks (covering language, code, embedding use cases), performs root cause analysis on failure cases and gives insights on how to resolve them. More details on UpTrain's evaluations can be found [here](https://github.com/uptrain-ai/uptrain?tab=readme-ov-file#pre-built-evaluations-we-offer-).
-**Problem**: There are two main problems:
+**Problem**: As an increasing number of companies are graduating their LLM prototypes to production-ready applications, their RAG pipelines are also getting complex. Developers are utilising modules like QueryRewrite, Context ReRank, etc., to enhance the accuracy of their RAG systems.
-1. The data that most Large Language Models are trained on is not representative of the data that they are used on. This leads to a mismatch between the training and test distributions, which can lead to poor performance.
-2. The results generated by Large Language Models are not always reliable. The responses might not be relevant to the prompt, not align with the desired tone or the context, or might be offensive etc.
+With increasing complexity comes more points of failure.
-**Solution**: The above two problems are solved by two different tools and we will show you how to use them together:
+1. Advanced Evals are needed to evaluate the quality of these newer modules and determine if they actually improve the system's accuracy.
+2. A robust experimentation framework is needed to systematically test different modules and make data-driven decisions.
-1. LlamaIndex solves the first problem by allowing you to perform Retrieval Augmented Generation (RAG) with a retriever that is fine-tuned on your own data. This allows you to use your own data to fine-tune a retriever, and then use that retriever to perform RAG.
-2. UpTrain solves the second problem by allowing you to perform evaluations on the generated responses. This helps you to ensure that the responses are relevant to the prompt, align with the desired tone or the context, and are not offensive etc.
+**Solution**: UpTrain helps to solve for both:
+
+1. UpTrain provides a series of checks to evaluate the quality of generated response, retrieved-context as well as all the interim steps. The relevant checks are ContextRelevance, SubQueryCompleteness, ContextReranking, ContextConciseness, FactualAccuracy, ContextUtilization, ResponseCompleteness, ResponseConciseness, etc.
+2. UpTrain also allows you to experiment with different embedding models as well as have an "evaluate_experiments" method to compare different RAG configurations.
# How to go about it?
-There two ways you can use UpTrain with LlamaIndex:
+There are two ways you can use UpTrain with LlamaIndex:
-1. **Using the UpTrain Callback Handler**: This method allows you to seamlessly integrate UpTrain with LlamaIndex. You can simply add UpTrainCallbackHandler to your existing LlamaIndex pipeline and it will take care of sending the generated responses to the UpTrain Managed Service for evaluations. This is the recommended method as it is the easiest to use and provides you with dashboards and insights with minimal effort.
+1. **Using the UpTrain Callback Handler**: This method allows you to seamlessly integrate UpTrain with LlamaIndex. You can simply add UpTrainCallbackHandler to your existing LlamaIndex pipeline and it will evaluate all components of your RAG pipeline. This is the recommended method as it is the easiest to use and provides you with dashboards and insights with minimal effort.
2. **Using UpTrain's EvalLlamaIndex**: This method allows you to use UpTrain to perform evaluations on the generated responses. You can use the EvalLlamaIndex object to generate responses for the queries and then perform evaluations on the responses. You can find a detailed tutorial on how to do this below. This method offers more flexibility and control over the evaluations, but requires more effort to set up and use.
# 1. Using the UpTrain Callback Handler
-Three additional evaluations for Llamaindex have been introduced, complementing existing ones. These evaluations run automatically, with results displayed in the output. More details on UpTrain's evaluations can be found [here](https://github.com/uptrain-ai/uptrain?tab=readme-ov-file#pre-built-evaluations-we-offer-).
+Below is how to use UpTrain Callback Handler to evaluate different components of your RAG pipelines.
+
+## 1. **RAG Query Engine Evaluations**:
+
+The RAG query engine plays a crucial role in retrieving context and generating responses. To ensure its performance and response quality, we conduct the following evaluations:
+
+- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.
+- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.
+- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively.
+
+## 2. **Sub-Question Query Generation Evaluation**:
+
+The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using an RAG query engine. To measure it's accuracy, we use:
+
+- **[Sub Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness)**: Assures that the sub-questions accurately and comprehensively cover the original query.
+
+## 3. **Re-Ranking Evaluations**:
+
+Re-ranking involves reordering nodes based on relevance to the query and choosing the top nodes. Different evaluations are performed based on the number of nodes returned after re-ranking.
+
+a. Same Number of Nodes
+
+- **[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**: Checks if the order of re-ranked nodes is more relevant to the query than the original order.
+
+b. Different Number of Nodes:
+
+- **[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**: Examines whether the reduced number of nodes still provides all the required information.
+
+These evaluations collectively ensure the robustness and effectiveness of the RAG query engine, SubQuestionQueryGeneration operator, and the re-ranking process in the LlamaIndex pipeline.
+
+#### **Note:**
-Selected operators from the LlamaIndex pipeline are highlighted for demonstration:
+- We have performed evaluations using a basic RAG query engine; the same evaluations can be performed using the advanced RAG query engine as well.
+- Same is true for Re-Ranking evaluations, we have performed evaluations using SentenceTransformerRerank, the same evaluations can be performed using other re-rankers as well.
## 1. **RAG Query Engine Evaluations**:
The RAG query engine plays a crucial role in retrieving context and generating responses. To ensure its performance and response quality, we conduct the following evaluations:
-- **Context Relevance**: Determines if the context extracted from the query is relevant to the response.
-- **Factual Accuracy**: Assesses if the LLM is hallcuinating or providing incorrect information.
-- **Response Completeness**: Checks if the response contains all the information requested by the query.
+- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.
+- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.
+- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively.
## 2. **Sub-Question Query Generation Evaluation**:
-The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using a RAG query engine. Given the complexity, we include the previous evaluations and add:
+The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using an RAG query engine. To measure it's accuracy, we use:
-- **Sub Query Completeness**: Assures that the sub-questions accurately and comprehensively cover the original query.
+- **[Sub Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness)**: Assures that the sub-questions accurately and comprehensively cover the original query.
## 3. **Re-Ranking Evaluations**:
-Re-ranking involves reordering nodes based on relevance to the query and choosing top n nodes. Different evaluations are performed based on the number of nodes returned after re-ranking.
+Re-ranking involves reordering nodes based on relevance to the query and choosing the top nodes. Different evaluations are performed based on the number of nodes returned after re-ranking.
a. Same Number of Nodes
-- **Context Reranking**: Checks if the order of re-ranked nodes is more relevant to the query than the original order.
+- **[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**: Checks if the order of re-ranked nodes is more relevant to the query than the original order.
b. Different Number of Nodes:
-- **Context Conciseness**: Examines whether the reduced number of nodes still provides all the required information.
+- **[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**: Examines whether the reduced number of nodes still provides all the required information.
These evaluations collectively ensure the robustness and effectiveness of the RAG query engine, SubQuestionQueryGeneration operator, and the re-ranking process in the LlamaIndex pipeline.
#### **Note:**
- We have performed evaluations using basic RAG query engine, the same evaluations can be performed using the advanced RAG query engine as well.
-- Same is true for Re-Ranking evaluations, we have performed evaluations using CohereRerank, the same evaluations can be performed using other re-rankers as well.
+- Same is true for Re-Ranking evaluations, we have performed evaluations using SentenceTransformerRerank, the same evaluations can be performed using other re-rankers as well.
## Install Dependencies and Import Libraries
Install notebook dependencies.
```bash
-pip install -q html2text llama-index pandas tqdm uptrain cohere
+%pip install llama-index-readers-web
+%pip install llama-index-callbacks-uptrain
+%pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers
```
Import libraries.
```python
-from llama_index import (
- ServiceContext,
- VectorStoreIndex,
-)
-from llama_index.node_parser import SentenceSplitter
-from llama_index.readers import SimpleWebPageReader
-from llama_index.callbacks import CallbackManager, UpTrainCallbackHandler
-from llama_index.postprocessor.cohere_rerank import CohereRerank
-from llama_index.service_context import set_global_service_context
-from llama_index.query_engine.sub_question_query_engine import (
- SubQuestionQueryEngine,
-)
-from llama_index.tools.query_engine import QueryEngineTool
-from llama_index.tools.types import ToolMetadata
+from llama_index.core import Settings, VectorStoreIndex
+from llama_index.core.node_parser import SentenceSplitter
+from llama_index.readers.web import SimpleWebPageReader
+from llama_index.core.callbacks import CallbackManager
+from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler
+from llama_index.core.query_engine import SubQuestionQueryEngine
+from llama_index.core.tools import QueryEngineTool, ToolMetadata
+from llama_index.core.postprocessor import SentenceTransformerRerank
+from llama_index.llms.openai import OpenAI
+
+import os
```
## Setup
@@ -123,16 +155,17 @@ Parameters:
**Note:** The `project_name_prefix` will be used as prefix for the project names in the UpTrain dashboard. These will be different for different types of evals. For example, if you set project_name_prefix="llama" and perform the sub_question evaluation, the project name will be "llama_sub_question_answering".
```python
+os.environ[
+ "OPENAI_API_KEY"
+] = "sk-***********" # Replace with your OpenAI API key
+
callback_handler = UpTrainCallbackHandler(
key_type="openai",
- api_key="sk-******************************",
+ api_key=os.environ["OPENAI_API_KEY"],
project_name_prefix="llama",
)
-callback_manager = CallbackManager([callback_handler])
-service_context = ServiceContext.from_defaults(
- callback_manager=callback_manager
-)
-set_global_service_context(service_context)
+
+Settings.callback_manager = CallbackManager([callback_handler])
```
## Load and Parse Documents
@@ -158,13 +191,13 @@ nodes = parser.get_nodes_from_documents(documents)
UpTrain callback handler will automatically capture the query, context and response once generated and will run the following three evaluations _(Graded from 0 to 1)_ on the response:
-- **Context Relevance**: Check if the context extractedfrom the query is relevant to the response.
-- **Factual Accuracy**: Check how factually accurate the response is.
-- **Response Completeness**: Check if the response contains all the information that the query is asking for.
+- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.
+- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.
+- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively.
```python
index = VectorStoreIndex.from_documents(
- documents, service_context=service_context
+ documents,
)
query_engine = index.as_query_engine()
@@ -181,55 +214,66 @@ for query in queries:
```
Question: What did Paul Graham do growing up?
+ Response: Paul Graham wrote short stories and started programming on the IBM 1401 in 9th grade using an early version of Fortran. Later, he convinced his father to buy a TRS-80, where he wrote simple games, a program to predict rocket heights, and a word processor.
+
Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
- Response Completeness Score: 0.0
+ Response Completeness Score: 1.0
Question: When and how did Paul Graham's mother die?
+ Response: Paul Graham's mother died when he was 18 years old, from a brain tumor.
+
Context Relevance Score: 0.0
- Factual Accuracy Score: 1.0
- Response Completeness Score: 0.0
+ Factual Accuracy Score: 0.0
+ Response Completeness Score: 1.0
Question: What, in Paul Graham's opinion, is the most distinctive thing about YC?
- Context Relevance Score: 1.0
- Factual Accuracy Score: 1.0
+ Response: The most distinctive thing about Y Combinator, according to Paul Graham, is that instead of deciding for himself what to work on, the problems come to him. Every 6 months, a new batch of startups brings their problems, which then become the focus of YC's work.
+
+ Context Relevance Score: 0.0
+ Factual Accuracy Score: 0.5
Response Completeness Score: 1.0
Question: When and how did Paul Graham meet Jessica Livingston?
+ Response: Paul Graham met Jessica Livingston at a big party at his house in October 2003.
+
Context Relevance Score: 1.0
- Factual Accuracy Score: 1.0
- Response Completeness Score: 0.5
+ Factual Accuracy Score: 0.5
+ Response Completeness Score: 1.0
Question: What is Bel, and when and where was it written?
+ Response: Bel is a new Lisp that was written in Arc. It was developed over a period of 4 years, from March 26, 2015 to October 12, 2019. Most of the work on Bel was done in England, where the author had moved to in the summer of 2016.
+
Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
- Response Completeness Score: 0.0
+ Response Completeness Score: 1.0
Here's an example of the dashboard showing how you can filter and drill down to the failing cases and get insights on the failing cases:
![image-2.png](https://uptrain-assets.s3.ap-south-1.amazonaws.com/images/llamaindex/image-2.png)
# 2. Sub-Question Query Engine Evaluation
-The **sub question query engine** is used to tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate responses and synthesizes a final response.
+The **sub-question query engine** is used to tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub-questions for each relevant data source, then gathers all the intermediate responses and synthesizes a final response.
UpTrain callback handler will automatically capture the sub-question and the responses for each of them once generated and will run the following three evaluations _(Graded from 0 to 1)_ on the response:
-- **Context Relevance**: Check if the context extractedfrom the query is relevant to the response.
-- **Factual Accuracy**: Check how factually accurate the response is.
-- **Response Completeness**: Check if the response contains all the information that the query is asking for.
+- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.
+- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.
+- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively.
In addition to the above evaluations, the callback handler will also run the following evaluation:
-- **Sub Query Completeness**: Checks if the sub-questions accurately and completely cover the original query.
+- **[Sub Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness)**: Assures that the sub-questions accurately and comprehensively cover the original query.
```python
# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
- documents=documents, use_async=True, service_context=service_context
+ documents=documents,
+ use_async=True,
).as_query_engine()
query_engine_tools = [
@@ -244,7 +288,6 @@ query_engine_tools = [
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools,
- service_context=service_context,
use_async=True,
)
@@ -253,22 +296,38 @@ response = query_engine.query(
)
```
- Question: What did Paul Graham work on during YC?
- Context Relevance Score: 0.5
+ Generated 3 sub questions.
+ [1;3;38;2;237;90;200m[documents] Q: What did Paul Graham work on before Y Combinator?
+ [0m[1;3;38;2;90;149;237m[documents] Q: What did Paul Graham work on during Y Combinator?
+ [0m[1;3;38;2;11;159;203m[documents] Q: What did Paul Graham work on after Y Combinator?
+ [0m[1;3;38;2;11;159;203m[documents] A: Paul Graham worked on a project with Robert and Trevor after Y Combinator.
+ [0m[1;3;38;2;237;90;200m[documents] A: Paul Graham worked on projects with his colleagues Robert and Trevor before Y Combinator.
+ [0m[1;3;38;2;90;149;237m[documents] A: Paul Graham worked on writing essays and working on Y Combinator during his time at Y Combinator.
+ [0m
+
+
+ Question: What did Paul Graham work on after Y Combinator?
+ Response: Paul Graham worked on a project with Robert and Trevor after Y Combinator.
+
+ Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5
- Question: What did Paul Graham work on after YC?
- Context Relevance Score: 0.5
+ Question: What did Paul Graham work on before Y Combinator?
+ Response: Paul Graham worked on projects with his colleagues Robert and Trevor before Y Combinator.
+
+ Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5
- Question: What did Paul Graham work on before YC?
- Context Relevance Score: 1.0
- Factual Accuracy Score: 1.0
- Response Completeness Score: 0.0
+ Question: What did Paul Graham work on during Y Combinator?
+ Response: Paul Graham worked on writing essays and working on Y Combinator during his time at Y Combinator.
+
+ Context Relevance Score: 0.0
+ Factual Accuracy Score: 0.5
+ Response Completeness Score: 0.5
Question: How was Paul Grahams life different before, during, and after YC?
@@ -280,7 +339,7 @@ Here's an example of the dashboard visualizing the scores of the sub-questions i
# 3. Re-ranking
-Re-ranking is the process of reordering the nodes based on their relevance to the query. There are multiple classes of re-ranking algorithms offered by Llamaindex. We have used CohereRerank for this example.
+Re-ranking is the process of reordering the nodes based on their relevance to the query. There are multiple classes of re-ranking algorithms offered by Llamaindex. We have used LLMRerank for this example.
The re-ranker allows you to enter the number of top n nodes that will be returned after re-ranking. If this value remains the same as the original number of nodes, the re-ranker will only re-rank the nodes and not change the number of nodes. Otherwise, it will re-rank the nodes and return the top n nodes.
@@ -290,22 +349,28 @@ We will perform different evaluations based on the number of nodes returned afte
If the number of nodes returned after re-ranking is the same as the original number of nodes, the following evaluation will be performed:
-- **Context Reranking**: Check if the order of the re-ranked nodes is more relevant to the query than the original order.
+- **[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**: Checks if the order of re-ranked nodes is more relevant to the query than the original order.
```python
-api_key = "**********************************" # Insert cohere API key here
-cohere_rerank = CohereRerank(
- api_key=api_key, top_n=5
-) # In this example, the number of nodes before re-ranking is 5 and after re-ranking is also 5.
+callback_handler = UpTrainCallbackHandler(
+ key_type="openai",
+ api_key=os.environ["OPENAI_API_KEY"],
+ project_name_prefix="llama",
+)
+Settings.callback_manager = CallbackManager([callback_handler])
+
+rerank_postprocessor = SentenceTransformerRerank(
+ top_n=3, # number of nodes after reranking
+ keep_retrieval_score=True,
+)
index = VectorStoreIndex.from_documents(
- documents=documents, service_context=service_context
+ documents=documents,
)
query_engine = index.as_query_engine(
- similarity_top_k=10,
- node_postprocessors=[cohere_rerank],
- service_context=service_context,
+ similarity_top_k=3, # number of nodes before reranking
+ node_postprocessors=[rerank_postprocessor],
)
response = query_engine.query(
@@ -316,25 +381,39 @@ response = query_engine.query(
Question: What did Sam Altman do in this essay?
Context Reranking Score: 0.0
+
+ Question: What did Sam Altman do in this essay?
+ Response: Sam Altman was asked to become the president of Y Combinator after the original founders decided to step back and reorganize the company for long-term sustainability.
+
+ Context Relevance Score: 1.0
+ Factual Accuracy Score: 1.0
+ Response Completeness Score: 0.5
+
# 3b. Re-ranking (With different number of nodes)
If the number of nodes returned after re-ranking is the lesser as the original number of nodes, the following evaluation will be performed:
-- **Context Conciseness**: If the re-ranked nodes are able to provide all the information required by the query.
+- **[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**: Examines whether the reduced number of nodes still provides all the required information.
```python
-api_key = "**********************************" # insert cohere API key here
-cohere_rerank = CohereRerank(
- api_key=api_key, top_n=2
-) # In this example, the number of nodes before re-ranking is 5 and after re-ranking is 2.
+callback_handler = UpTrainCallbackHandler(
+ key_type="openai",
+ api_key=os.environ["OPENAI_API_KEY"],
+ project_name_prefix="llama",
+)
+Settings.callback_manager = CallbackManager([callback_handler])
+
+rerank_postprocessor = SentenceTransformerRerank(
+ top_n=2, # Number of nodes after re-ranking
+ keep_retrieval_score=True,
+)
index = VectorStoreIndex.from_documents(
- documents=documents, service_context=service_context
+ documents=documents,
)
query_engine = index.as_query_engine(
- similarity_top_k=10,
- node_postprocessors=[cohere_rerank],
- service_context=service_context,
+ similarity_top_k=5, # Number of nodes before re-ranking
+ node_postprocessors=[rerank_postprocessor],
)
# Use your advanced RAG
@@ -344,18 +423,20 @@ response = query_engine.query(
```
Question: What did Sam Altman do in this essay?
- Context Conciseness Score: 1.0
+ Context Conciseness Score: 0.0
-# UpTrain's Managed Service Dashboard and Insights
-The UpTrain Managed Service offers the following features:
+ Question: What did Sam Altman do in this essay?
+ Response: Sam Altman offered unsolicited advice to the author during a visit to California for interviews.
+
+
+ Context Relevance Score: 1.0
+ Factual Accuracy Score: 1.0
+ Response Completeness Score: 0.5
-1. Advanced dashboards with drill-down and filtering options.
-1. Identification of insights and common themes among unsuccessful cases.
-1. Real-time observability and monitoring of production data.
-1. Integration with CI/CD pipelines for seamless regression testing.
+# UpTrain's Managed Service Dashboard and Insights
-To define the UpTrain callback handler, the only change required is to set the `key_type` and `api_key` parameters. The rest of the code remains the same.
+To use the UpTrain's managed service via the UpTrain callback handler, the only change required is to set the `key_type` and `api_key` parameters. The rest of the code remains the same.
```python
callback_handler = UpTrainCallbackHandler(
@@ -380,12 +461,13 @@ pip install uptrain llama_index
## Import required libraries
```python
+import httpx
import os
import openai
import pandas as pd
-from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
-from uptrain import Evals, EvalLlamaIndex, Settings
+from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
+from uptrain import Evals, EvalLlamaIndex, Settings as UpTrainSettings
```
## Create the dataset folder for the query engine
@@ -399,8 +481,6 @@ if not os.path.exists("nyc_wikipedia"):
dataset_path = os.path.join("./nyc_wikipedia", "nyc_text.txt")
if not os.path.exists(dataset_path):
- import httpx
-
r = httpx.get(url)
with open(dataset_path, "wb") as f:
f.write(r.content)
@@ -436,8 +516,6 @@ openai.api_key = "sk-************************" # your OpenAI API key
Let's create a vector store index using LLamaIndex and then use that as a query engine to retrieve relevant sections from the documentation.
```python
-from llama_index.core import Settings
-
Settings.chunk_size = 512
documents = SimpleDirectoryReader("./nyc_wikipedia/").load_data()
@@ -452,7 +530,7 @@ query_engine = vector_index.as_query_engine()
# Alternative 1: Evaluate using UpTrain's Open-Source Software (OSS)
```python
-settings = Settings(
+settings = UpTrainSettings(
openai_api_key=openai.api_key,
)
```
@@ -502,7 +580,7 @@ You can create a free UpTrain account [here](https://uptrain.ai/) and get free t
UPTRAIN_API_KEY = "up-**********************" # your UpTrain API key
# We use `uptrain_access_token` parameter instead of 'openai_api_key' in settings in this case
-settings = Settings(
+settings = UpTrainSettings(
uptrain_access_token=UPTRAIN_API_KEY,
)
```
diff --git a/docs/cookbooks/mixedbread_reranker.ipynb b/docs/cookbooks/mixedbread_reranker.ipynb
new file mode 100644
index 0000000000000..1f95a31e234ba
--- /dev/null
+++ b/docs/cookbooks/mixedbread_reranker.ipynb
@@ -0,0 +1,280 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "964030f7-40e4-4398-a5ab-668aabcf3bad",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "360313ab-9393-430e-9647-e0d5545809b9",
+ "metadata": {},
+ "source": [
+ "# mixedbread Rerank Cookbook\n",
+ "\n",
+ "mixedbread.ai has released three fully open-source reranker models under the Apache 2.0 license. For more in-depth information, you can check out their detailed [blog post](https://www.mixedbread.ai/blog/mxbai-rerank-v1). The following are the three models:\n",
+ "\n",
+ "1. `mxbai-rerank-xsmall-v1`\n",
+ "2. `mxbai-rerank-base-v1`\n",
+ "3. `mxbai-rerank-large-v1`\n",
+ "\n",
+ "In this notebook, we'll demonstrate how to use the `mxbai-rerank-base-v1` model with the `SentenceTransformerRerank` module in LlamaIndex. This setup allows you to seamlessly swap in any reranker model of your choice using the `SentenceTransformerRerank` module to enhance your RAG pipeline."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "856ecfdc-04fa-4fe9-a81c-9a5858cd4a6d",
+ "metadata": {},
+ "source": [
+ "### Installation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bfb5314f-e6c7-409c-86df-8e1a5ca59adb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip install llama-index\n",
+ "!pip install sentence-transformers"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "5f5393fb-b410-4769-9380-0ef90a33b82e",
+ "metadata": {},
+ "source": [
+ "### Set API Keys"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a9782acf-b0ab-4933-bb41-27cd2a02b5dd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ[\"OPENAI_API_KEY\"] = \"YOUR OPENAI API KEY\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b7596ddf-e1de-4098-81f3-fce504d2da94",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.core import (\n",
+ " VectorStoreIndex,\n",
+ " SimpleDirectoryReader,\n",
+ ")\n",
+ "\n",
+ "from llama_index.core.postprocessor import SentenceTransformerRerank"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "8011ff9c-2b82-47b4-983f-4fafc29e3127",
+ "metadata": {},
+ "source": [
+ "### Download Data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6dd335cb-900b-462f-987a-d4af2aac88fa",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "--2024-03-01 09:52:09-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
+ "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...\n",
+ "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 75042 (73K) [text/plain]\n",
+ "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
+ "\n",
+ "data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.007s \n",
+ "\n",
+ "2024-03-01 09:52:09 (9.86 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "!mkdir -p 'data/paul_graham/'\n",
+ "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "e482b09c-a0df-4788-a75b-a33ade7001d1",
+ "metadata": {},
+ "source": [
+ "### Load Documents"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "342c91b8-301f-40ed-9d09-9acdb1bbdc44",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "8afdfeb1-57ae-4d2b-ae73-683db205be32",
+ "metadata": {},
+ "source": [
+ "### Build Index"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "47c335e9-dd4d-475c-bade-e2a588e33294",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "index = VectorStoreIndex.from_documents(documents=documents)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "f1ab8157-dbcb-4588-9b3c-5bd2fc4a721e",
+ "metadata": {},
+ "source": [
+ "### Define postprocessor for `mxbai-rerank-base-v1` reranker"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3fcc5590-2e58-4a7e-8b18-a7153c06d1ff",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.core.postprocessor import SentenceTransformerRerank\n",
+ "\n",
+ "postprocessor = SentenceTransformerRerank(\n",
+ " model=\"mixedbread-ai/mxbai-rerank-base-v1\", top_n=2\n",
+ ")"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "c7c81b0d-0449-4092-80cb-88080e69f980",
+ "metadata": {},
+ "source": [
+ "### Create Query Engine\n",
+ "\n",
+ "We will first retrieve 10 relevant nodes and pick top-2 nodes using the defined postprocessor."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e1b23700-15ae-4f1a-9443-43eb1eecab5f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query_engine = index.as_query_engine(\n",
+ " similarity_top_k=10,\n",
+ " node_postprocessors=[postprocessor],\n",
+ ")"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "93871f9c-8871-4f43-8ee9-b3ca4e403d86",
+ "metadata": {},
+ "source": [
+ "### Test Queries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "658d3092-7d86-4520-83a2-c3e630dc02b6",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Sam Altman initially declined the offer of becoming president of Y Combinator because he wanted to start a startup focused on making nuclear reactors.\n"
+ ]
+ }
+ ],
+ "source": [
+ "response = query_engine.query(\n",
+ " \"Why did Sam Altman decline the offer of becoming president of Y Combinator?\",\n",
+ ")\n",
+ "\n",
+ "print(response)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "497e715e-3f7a-4140-a3ba-34356e473702",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Paul Graham started YC because he and his partners wanted to create an investment firm where they could implement their own ideas and provide the kind of support to startups that they felt was lacking when they were founders themselves. They aimed to not only make seed investments but also assist startups with various aspects of setting up a company, similar to the help they had received from others in the past.\n"
+ ]
+ }
+ ],
+ "source": [
+ "response = query_engine.query(\n",
+ " \"Why did Paul Graham start YC?\",\n",
+ ")\n",
+ "\n",
+ "print(response)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/examples/agent/custom_agent.ipynb b/docs/examples/agent/custom_agent.ipynb
index 7a26db4db1cfe..729d6cec4dd41 100644
--- a/docs/examples/agent/custom_agent.ipynb
+++ b/docs/examples/agent/custom_agent.ipynb
@@ -79,7 +79,7 @@
" Task,\n",
" AgentChatResponse,\n",
")\n",
- "from typing import Dict, Any, List, Tuple\n",
+ "from typing import Dict, Any, List, Tuple, Optional\n",
"from llama_index.core.tools import BaseTool, QueryEngineTool\n",
"from llama_index.core.program import LLMTextCompletionProgram\n",
"from llama_index.core.output_parsers import PydanticOutputParser\n",
@@ -200,7 +200,7 @@
" return {\"count\": 0, \"current_reasoning\": []}\n",
"\n",
" def _run_step(\n",
- " self, state: Dict[str, Any], task: Task\n",
+ " self, state: Dict[str, Any], task: Task, input: Optional[str] = None\n",
" ) -> Tuple[AgentChatResponse, bool]:\n",
" \"\"\"Run step.\n",
"\n",
diff --git a/docs/examples/callbacks/LangfuseCallbackHandler.ipynb b/docs/examples/callbacks/LangfuseCallbackHandler.ipynb
new file mode 100644
index 0000000000000..639178110583d
--- /dev/null
+++ b/docs/examples/callbacks/LangfuseCallbackHandler.ipynb
@@ -0,0 +1,288 @@
+{
+ "cells": [
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "d6509c3a",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0d8b66c",
+ "metadata": {},
+ "source": [
+ "# Langfuse Callback Handler\n",
+ "\n",
+ "[Langfuse](https://langfuse.com/docs) is an open source LLM engineering platform to help teams collaboratively debug, analyze and iterate on their LLM Applications.\n",
+ "\n",
+ "The `LangfuseCallbackHandler` is integrated with Langfuse and empowers you to seamlessly track and monitor performance, traces, and metrics of your LlamaIndex application. Detailed traces of the LlamaIndex context augmentation and the LLM querying processes are captured and can be inspected directly in the Langfuse UI."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4a59a00e",
+ "metadata": {},
+ "source": [
+ "![langfuse-tracing](https://static.langfuse.com/llamaindex-langfuse-docs.gif)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3b9057da",
+ "metadata": {},
+ "source": [
+ "## Setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5d9dfc7f",
+ "metadata": {},
+ "source": [
+ "### Install packages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "49c3527e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install llama-index llama-index-callbacks-langfuse"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bc10630b",
+ "metadata": {},
+ "source": [
+ "### Configure environment"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4c256817",
+ "metadata": {},
+ "source": [
+ "If you haven't done yet, [sign up on Langfuse](https://cloud.langfuse.com/auth/sign-up) and obtain your API keys from the project settings."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "787e836d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "# Langfuse\n",
+ "os.environ[\"LANGFUSE_SECRET_KEY\"] = \"sk-lf-...\"\n",
+ "os.environ[\"LANGFUSE_PUBLIC_KEY\"] = \"pk-lf-...\"\n",
+ "os.environ[\n",
+ " \"LANGFUSE_HOST\"\n",
+ "] = \"https://cloud.langfuse.com\" # 🇪🇺 EU region, 🇺🇸 US region: \"https://us.cloud.langfuse.com\"\n",
+ "\n",
+ "# OpenAI\n",
+ "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1fe2ba01",
+ "metadata": {},
+ "source": [
+ "### Register the Langfuse callback handler"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cfef9ddc",
+ "metadata": {},
+ "source": [
+ "#### Option 1: Set global LlamaIndex handler"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "72afb2b9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.core import global_handler, set_global_handler\n",
+ "\n",
+ "set_global_handler(\"langfuse\")\n",
+ "langfuse_callback_handler = global_handler"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0e6557d2",
+ "metadata": {},
+ "source": [
+ "#### Option 2: Use Langfuse callback directly"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4bdd95bf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.core import Settings\n",
+ "from llama_index.core.callbacks import CallbackManager\n",
+ "from langfuse.llama_index import LlamaIndexCallbackHandler\n",
+ "\n",
+ "langfuse_callback_handler = LlamaIndexCallbackHandler()\n",
+ "Settings.callback_manager = CallbackManager([langfuse_callback_handler])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e3e03ce7",
+ "metadata": {},
+ "source": [
+ "### Flush events to Langfuse"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e2c811ec",
+ "metadata": {},
+ "source": [
+ "The Langfuse SDKs queue and batches events in the background to reduce the number of network requests and improve overall performance. Before exiting your application, make sure all queued events have been flushed to Langfuse servers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4e28876c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ... your LlamaIndex calls here ...\n",
+ "\n",
+ "langfuse_callback_handler.flush()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6b86f1b5",
+ "metadata": {},
+ "source": [
+ "Done!✨ Traces and metrics from your LlamaIndex application are now automatically tracked in Langfuse. If you construct a new index or query an LLM with your documents in context, your traces and metrics are immediately visible in the Langfuse UI. Next, let's take a look at how traces will look in Langfuse."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1f0d4465",
+ "metadata": {},
+ "source": [
+ "## Example"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8a9f3428",
+ "metadata": {},
+ "source": [
+ "Fetch and save example data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "aa303ae3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!mkdir -p 'data/'\n",
+ "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9f053996",
+ "metadata": {},
+ "source": [
+ "Run an example index construction, query, and chat."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "983cbedd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.core import SimpleDirectoryReader, VectorStoreIndex\n",
+ "\n",
+ "# Create index\n",
+ "documents = SimpleDirectoryReader(\"data\").load_data()\n",
+ "index = VectorStoreIndex.from_documents(documents)\n",
+ "\n",
+ "# Execute query\n",
+ "query_engine = index.as_query_engine()\n",
+ "query_response = query_engine.query(\"What did the author do growing up?\")\n",
+ "print(query_response)\n",
+ "\n",
+ "# Execute chat query\n",
+ "chat_engine = index.as_chat_engine()\n",
+ "chat_response = chat_engine.chat(\"What did the author do growing up?\")\n",
+ "print(chat_response)\n",
+ "\n",
+ "# As we want to immediately see result in Langfuse, we need to flush the callback handler\n",
+ "langfuse_callback_handler.flush()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d5cdd88f",
+ "metadata": {},
+ "source": [
+ "Done!✨ You will now see traces of your index and query in your Langfuse project.\n",
+ "\n",
+ "Example traces (public links):\n",
+ "1. [Index construction](https://cloud.langfuse.com/project/clsuh9o2y0000mbztvdptt1mh/traces/1294ed01-8193-40a5-bb4e-2f0723d2c827)\n",
+ "2. [Query Engine](https://cloud.langfuse.com/project/clsuh9o2y0000mbztvdptt1mh/traces/eaa4ea74-78e0-42ef-ace0-7aa02c6fbbc6)\n",
+ "3. [Chat Engine](https://cloud.langfuse.com/project/clsuh9o2y0000mbztvdptt1mh/traces/d95914f5-66eb-4520-b996-49e84fd7f323)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0b50845f",
+ "metadata": {},
+ "source": [
+ "## 📚 More details\n",
+ "\n",
+ "Check out the full [Langfuse documentation](https://langfuse.com/docs) for more details on Langfuse's tracing and analytics capabilities and how to make most of this integration."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/examples/callbacks/UpTrainCallback.ipynb b/docs/examples/callbacks/UpTrainCallback.ipynb
index 7ccc72565c127..d4fb0947a7a31 100644
--- a/docs/examples/callbacks/UpTrainCallback.ipynb
+++ b/docs/examples/callbacks/UpTrainCallback.ipynb
@@ -13,30 +13,30 @@
"source": [
"# UpTrain Callback Handler\n",
"\n",
- "This notebook showcases the UpTrain callback handler seamlessly integrating into your pipeline, facilitating diverse evaluations. Three additional evaluations for Llamaindex have been introduced, complementing existing ones. These evaluations run automatically, with results displayed in the output. More details on UpTrain's evaluations can be found [here](https://github.com/uptrain-ai/uptrain?tab=readme-ov-file#pre-built-evaluations-we-offer-). \n",
+ "UpTrain ([github](https://github.com/uptrain-ai/uptrain) || [website](https://github.com/uptrain-ai/uptrain/) || [docs](https://docs.uptrain.ai/)) is an open-source platform to evaluate and improve GenAI applications. It provides grades for 20+ preconfigured checks (covering language, code, embedding use cases), performs root cause analysis on failure cases and gives insights on how to resolve them. \n",
"\n",
- "Selected operators from the LlamaIndex pipeline are highlighted for demonstration:\n",
+ "This notebook showcases how to use UpTrain Callback Handler to evaluate different components of your RAG pipelines.\n",
"\n",
"## 1. **RAG Query Engine Evaluations**:\n",
"The RAG query engine plays a crucial role in retrieving context and generating responses. To ensure its performance and response quality, we conduct the following evaluations:\n",
"\n",
- "- **Context Relevance**: Determines if the context extracted from the query is relevant to the response.\n",
- "- **Factual Accuracy**: Assesses if the LLM is hallcuinating or providing incorrect information.\n",
- "- **Response Completeness**: Checks if the response contains all the information requested by the query.\n",
+ "- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.\n",
+ "- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.\n",
+ "- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively.\n",
"\n",
"## 2. **Sub-Question Query Generation Evaluation**:\n",
- "The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using a RAG query engine. Given the complexity, we include the previous evaluations and add:\n",
+ "The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using an RAG query engine. To measure it's accuracy, we use:\n",
"\n",
- "- **Sub Query Completeness**: Assures that the sub-questions accurately and comprehensively cover the original query.\n",
+ "- **[Sub Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness)**: Assures that the sub-questions accurately and comprehensively cover the original query.\n",
"\n",
"## 3. **Re-Ranking Evaluations**:\n",
- "Re-ranking involves reordering nodes based on relevance to the query and chosing top n nodes. Different evaluations are performed based on the number of nodes returned after re-ranking.\n",
+ "Re-ranking involves reordering nodes based on relevance to the query and choosing the top nodes. Different evaluations are performed based on the number of nodes returned after re-ranking.\n",
"\n",
"a. Same Number of Nodes\n",
- "- **Context Reranking**: Checks if the order of re-ranked nodes is more relevant to the query than the original order.\n",
+ "- **[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**: Checks if the order of re-ranked nodes is more relevant to the query than the original order.\n",
"\n",
"b. Different Number of Nodes:\n",
- "- **Context Conciseness**: Examines whether the reduced number of nodes still provides all the required information.\n",
+ "- **[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**: Examines whether the reduced number of nodes still provides all the required information.\n",
"\n",
"These evaluations collectively ensure the robustness and effectiveness of the RAG query engine, SubQuestionQueryGeneration operator, and the re-ranking process in the LlamaIndex pipeline."
]
@@ -47,7 +47,7 @@
"source": [
"#### **Note:** \n",
"- We have performed evaluations using basic RAG query engine, the same evaluations can be performed using the advanced RAG query engine as well.\n",
- "- Same is true for Re-Ranking evaluations, we have performed evaluations using CohereRerank, the same evaluations can be performed using other re-rankers as well."
+ "- Same is true for Re-Ranking evaluations, we have performed evaluations using SentenceTransformerRerank, the same evaluations can be performed using other re-rankers as well."
]
},
{
@@ -65,10 +65,9 @@
"metadata": {},
"outputs": [],
"source": [
- "%pip install llama-index-postprocessor-cohere-rerank\n",
"%pip install llama-index-readers-web\n",
- "%pip install llama-index-callback-uptrain\n",
- "%pip install -q html2text llama-index pandas tqdm uptrain cohere"
+ "%pip install llama-index-callbacks-uptrain\n",
+ "%pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers"
]
},
{
@@ -84,15 +83,14 @@
"metadata": {},
"outputs": [],
"source": [
- "from llama_index.core.settings import Settings\n",
- "from llama_index.core import VectorStoreIndex\n",
+ "from llama_index.core import Settings, VectorStoreIndex\n",
"from llama_index.core.node_parser import SentenceSplitter\n",
"from llama_index.readers.web import SimpleWebPageReader\n",
"from llama_index.core.callbacks import CallbackManager\n",
"from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler\n",
"from llama_index.core.query_engine import SubQuestionQueryEngine\n",
"from llama_index.core.tools import QueryEngineTool, ToolMetadata\n",
- "from llama_index.core.postprocessor.llm_rerank import LLMRerank\n",
+ "from llama_index.core.postprocessor import SentenceTransformerRerank\n",
"from llama_index.llms.openai import OpenAI\n",
"\n",
"import os"
@@ -141,11 +139,16 @@
"metadata": {},
"outputs": [],
"source": [
+ "os.environ[\n",
+ " \"OPENAI_API_KEY\"\n",
+ "] = \"sk-************\" # Replace with your OpenAI API key\n",
+ "\n",
"callback_handler = UpTrainCallbackHandler(\n",
" key_type=\"openai\",\n",
- " api_key=\"sk-...\", # replace with your OpenAI API key\n",
+ " api_key=os.environ[\"OPENAI_API_KEY\"],\n",
" project_name_prefix=\"llama\",\n",
")\n",
+ "\n",
"Settings.callback_manager = CallbackManager([callback_handler])"
]
},
@@ -200,9 +203,9 @@
"metadata": {},
"source": [
"UpTrain callback handler will automatically capture the query, context and response once generated and will run the following three evaluations *(Graded from 0 to 1)* on the response:\n",
- "- **Context Relevance**: Check if the context extractedfrom the query is relevant to the response.\n",
- "- **Factual Accuracy**: Check how factually accurate the response is.\n",
- "- **Response Completeness**: Check if the response contains all the information that the query is asking for."
+ "- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.\n",
+ "- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.\n",
+ "- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively."
]
},
{
@@ -214,7 +217,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-14 16:04:09.869\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:01<00:00, 1.33s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.36s/it]\n",
+ "100%|██████████| 1/1 [00:03<00:00, 3.50s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.32s/it]\n"
]
},
{
@@ -223,8 +229,9 @@
"text": [
"\n",
"Question: What did Paul Graham do growing up?\n",
- "Response: Growing up, Paul Graham worked on writing and programming. He wrote short stories and also tried his hand at programming on the IBM 1401 computer that his school district had. He later got a microcomputer, a TRS-80, and started programming more extensively, creating simple games and even a word processor.\n",
- "Context Relevance Score: 0.5\n",
+ "Response: Growing up, Paul Graham worked on writing short stories and programming. He started programming on an IBM 1401 in 9th grade using an early version of Fortran. Later, he got a TRS-80 computer and wrote simple games, a rocket prediction program, and a word processor. Despite his interest in programming, he initially planned to study philosophy in college before eventually switching to AI.\n",
+ "\n",
+ "Context Relevance Score: 0.0\n",
"Factual Accuracy Score: 1.0\n",
"Response Completeness Score: 1.0\n",
"\n"
@@ -234,7 +241,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-14 16:04:36.895\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:01<00:00, 1.59s/it]\n",
+ "100%|██████████| 1/1 [00:00<00:00, 1.01it/s]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.76s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.28s/it]\n"
]
},
{
@@ -243,10 +253,11 @@
"text": [
"\n",
"Question: When and how did Paul Graham's mother die?\n",
- "Response: The context information does not provide any information about Paul Graham's mother or her death.\n",
+ "Response: Paul Graham's mother died when he was 18 years old, from a brain tumor.\n",
+ "\n",
"Context Relevance Score: 0.0\n",
"Factual Accuracy Score: 0.0\n",
- "Response Completeness Score: 0.0\n",
+ "Response Completeness Score: 0.5\n",
"\n"
]
},
@@ -254,7 +265,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-14 16:04:55.245\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:01<00:00, 1.75s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.55s/it]\n",
+ "100%|██████████| 1/1 [00:03<00:00, 3.39s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.48s/it]\n"
]
},
{
@@ -263,10 +277,11 @@
"text": [
"\n",
"Question: What, in Paul Graham's opinion, is the most distinctive thing about YC?\n",
- "Response: The most distinctive thing about YC, according to Paul Graham's opinion, is that it provides a sense of community and support for startup founders. It solves the problem of isolation that founders often face by connecting them with colleagues who understand the challenges they are going through and can offer guidance and support. Additionally, YC fosters a tight-knit alumni community where startups can help each other and even become each other's customers.\n",
- "Context Relevance Score: 0.0\n",
- "Factual Accuracy Score: 1.0\n",
- "Response Completeness Score: 0.5\n",
+ "Response: The most distinctive thing about Y Combinator, according to Paul Graham, is that instead of deciding for himself what to work on, the problems come to him. Every 6 months, a new batch of startups brings their problems, which then become the focus of YC. This engagement with a variety of startup problems and the direct involvement in solving them is what Graham finds most unique about Y Combinator.\n",
+ "\n",
+ "Context Relevance Score: 1.0\n",
+ "Factual Accuracy Score: 0.3333333333333333\n",
+ "Response Completeness Score: 1.0\n",
"\n"
]
},
@@ -274,7 +289,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-14 16:05:24.705\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:01<00:00, 1.92s/it]\n",
+ "100%|██████████| 1/1 [00:00<00:00, 1.20it/s]\n",
+ "100%|██████████| 1/1 [00:02<00:00, 2.15s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.08s/it]\n"
]
},
{
@@ -283,9 +301,10 @@
"text": [
"\n",
"Question: When and how did Paul Graham meet Jessica Livingston?\n",
- "Response: Paul Graham met Jessica Livingston at a party at his house in October 2003. They were introduced to each other by a mutual friend named Maria Daniels. A couple of days later, Paul asked Jessica out and they started dating.\n",
- "Context Relevance Score: 0.5\n",
- "Factual Accuracy Score: 1.0\n",
+ "Response: Paul Graham met Jessica Livingston at a big party at his house in October 2003.\n",
+ "\n",
+ "Context Relevance Score: 1.0\n",
+ "Factual Accuracy Score: 0.5\n",
"Response Completeness Score: 1.0\n",
"\n"
]
@@ -294,7 +313,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-14 16:05:52.062\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:01<00:00, 1.82s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.14s/it]\n",
+ "100%|██████████| 1/1 [00:03<00:00, 3.19s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.50s/it]"
]
},
{
@@ -303,10 +325,18 @@
"text": [
"\n",
"Question: What is Bel, and when and where was it written?\n",
- "Response: Bel is a new Lisp that was written in Arc. It was written over a period of 4 years, from March 26, 2015, to October 12, 2019. The majority of Bel was written in England, as the author moved there in the summer of 2016.\n",
+ "Response: Bel is a new Lisp that was written in Arc. It was developed over a period of 4 years, from March 26, 2015 to October 12, 2019. The majority of Bel was written in England.\n",
+ "\n",
"Context Relevance Score: 1.0\n",
"Factual Accuracy Score: 1.0\n",
- "Response Completeness Score: 0.5\n",
+ "Response Completeness Score: 1.0\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
"\n"
]
}
@@ -348,15 +378,15 @@
"source": [
"# 2. Sub-Question Query Engine Evaluation\n",
"\n",
- "The **sub question query engine** is used to tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.\n",
+ "The **sub-question query engine** is used to tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub-questions for each relevant data source, then gathers all the intermediate responses and synthesizes a final response.\n",
"\n",
"UpTrain callback handler will automatically capture the sub-question and the responses for each of them once generated and will run the following three evaluations *(Graded from 0 to 1)* on the response:\n",
- "- **Context Relevance**: Check if the context extractedfrom the query is relevant to the response.\n",
- "- **Factual Accuracy**: Check how factually accurate the response is.\n",
- "- **Response Completeness**: Check if the response contains all the information that the query is asking for.\n",
+ "- **[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**: Determines if the retrieved context has sufficient information to answer the user query or not.\n",
+ "- **[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**: Assesses if the LLM's response can be verified via the retrieved context.\n",
+ "- **[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**: Checks if the response contains all the information required to answer the user query comprehensively.\n",
"\n",
"In addition to the above evaluations, the callback handler will also run the following evaluation:\n",
- "- **Sub Query Completeness**: Checks if the sub-questions accurately and completely cover the original query."
+ "- **[Sub Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness)**: Assures that the sub-questions accurately and comprehensively cover the original query."
]
},
{
@@ -372,9 +402,9 @@
"\u001b[1;3;38;2;237;90;200m[documents] Q: What did Paul Graham work on before YC?\n",
"\u001b[0m\u001b[1;3;38;2;90;149;237m[documents] Q: What did Paul Graham work on during YC?\n",
"\u001b[0m\u001b[1;3;38;2;11;159;203m[documents] Q: What did Paul Graham work on after YC?\n",
- "\u001b[0m\u001b[1;3;38;2;237;90;200m[documents] A: Before Y Combinator (YC), Paul Graham worked on a startup called Viaweb.\n",
- "\u001b[0m\u001b[1;3;38;2;11;159;203m[documents] A: After leaving Y Combinator, Paul Graham focused on painting. He wanted to see how good he could get at painting if he dedicated his time and effort to it. He spent most of 2014 working on his painting skills, but eventually ran out of steam in November.\n",
- "\u001b[0m\u001b[1;3;38;2;90;149;237m[documents] A: During his time at Y Combinator (YC), Paul Graham worked on various projects. He initially intended to work on three things: hacking, writing essays, and working on YC. However, as YC grew and he became more excited about it, it started to take up a lot more of his attention. He also worked on writing essays and was responsible for writing all of YC's internal software in Arc.\n",
+ "\u001b[0m\u001b[1;3;38;2;11;159;203m[documents] A: After Y Combinator, Paul Graham decided to focus on painting as his next endeavor.\n",
+ "\u001b[0m\u001b[1;3;38;2;90;149;237m[documents] A: Paul Graham worked on writing essays and working on Y Combinator during YC.\n",
+ "\u001b[0m\u001b[1;3;38;2;237;90;200m[documents] A: Before Y Combinator, Paul Graham worked on projects with his colleagues Robert and Trevor.\n",
"\u001b[0m"
]
},
@@ -382,40 +412,65 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-14 08:24:08.958\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n",
- "\u001b[32m2024-02-14 08:24:34.450\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 3/3 [00:02<00:00, 1.47it/s]\n",
+ "100%|██████████| 3/3 [00:00<00:00, 3.28it/s]\n",
+ "100%|██████████| 3/3 [00:01<00:00, 1.68it/s]\n",
+ "100%|██████████| 3/3 [00:01<00:00, 2.28it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
- "\n",
- "Question: What did Paul Graham work on before YC?\n",
- "Response: Before Y Combinator (YC), Paul Graham worked on a startup called Viaweb.\n",
- "Context Relevance Score: 0.0\n",
- "Factual Accuracy Score: 1.0\n",
- "Response Completeness Score: 0.5\n",
- "\n",
"\n",
"Question: What did Paul Graham work on after YC?\n",
- "Response: After leaving Y Combinator, Paul Graham focused on painting. He wanted to see how good he could get at painting if he dedicated his time and effort to it. He spent most of 2014 working on his painting skills, but eventually ran out of steam in November.\n",
- "Context Relevance Score: 1.0\n",
+ "Response: After Y Combinator, Paul Graham decided to focus on painting as his next endeavor.\n",
+ "\n",
+ "Context Relevance Score: 0.0\n",
"Factual Accuracy Score: 0.0\n",
- "Response Completeness Score: 0.0\n",
+ "Response Completeness Score: 0.5\n",
"\n",
"\n",
"Question: What did Paul Graham work on during YC?\n",
- "Response: During his time at Y Combinator (YC), Paul Graham worked on various projects. He initially intended to work on three things: hacking, writing essays, and working on YC. However, as YC grew and he became more excited about it, it started to take up a lot more of his attention. He also worked on writing essays and was responsible for writing all of YC's internal software in Arc.\n",
- "Context Relevance Score: 0.5\n",
+ "Response: Paul Graham worked on writing essays and working on Y Combinator during YC.\n",
+ "\n",
+ "Context Relevance Score: 0.0\n",
"Factual Accuracy Score: 1.0\n",
"Response Completeness Score: 0.5\n",
"\n",
+ "\n",
+ "Question: What did Paul Graham work on before YC?\n",
+ "Response: Before Y Combinator, Paul Graham worked on projects with his colleagues Robert and Trevor.\n",
+ "\n",
+ "Context Relevance Score: 0.0\n",
+ "Factual Accuracy Score: 0.0\n",
+ "Response Completeness Score: 0.5\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 1/1 [00:01<00:00, 1.24s/it]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
"\n",
"Question: How was Paul Grahams life different before, during, and after YC?\n",
"Sub Query Completeness Score: 1.0\n",
"\n"
]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
}
],
"source": [
@@ -465,7 +520,7 @@
"source": [
"# 3. Re-ranking \n",
"\n",
- "Re-ranking is the process of reordering the nodes based on their relevance to the query. There are multiple classes of re-ranking algorithms offered by Llamaindex. We have used CohereRerank for this example.\n",
+ "Re-ranking is the process of reordering the nodes based on their relevance to the query. There are multiple classes of re-ranking algorithms offered by Llamaindex. We have used LLMRerank for this example.\n",
"\n",
"The re-ranker allows you to enter the number of top n nodes that will be returned after re-ranking. If this value remains the same as the original number of nodes, the re-ranker will only re-rank the nodes and not change the number of nodes. Otherwise, it will re-rank the nodes and return the top n nodes.\n",
"\n",
@@ -479,7 +534,8 @@
"## 3a. Re-ranking (With same number of nodes)\n",
"\n",
"If the number of nodes returned after re-ranking is the same as the original number of nodes, the following evaluation will be performed:\n",
- "- **Context Reranking**: Check if the order of the re-ranked nodes is more relevant to the query than the original order."
+ "\n",
+ "- **[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**: Checks if the order of re-ranked nodes is more relevant to the query than the original order."
]
},
{
@@ -491,7 +547,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-13 20:00:17.459\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:01<00:00, 1.89s/it]\n"
]
},
{
@@ -500,28 +556,62 @@
"text": [
"\n",
"Question: What did Sam Altman do in this essay?\n",
+ "Context Reranking Score: 1.0\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 1/1 [00:01<00:00, 1.88s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.44s/it]\n",
+ "100%|██████████| 1/1 [00:02<00:00, 2.77s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.45s/it]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Question: What did Sam Altman do in this essay?\n",
+ "Response: Sam Altman was asked to become the president of Y Combinator after the original founders decided to step down and reorganize the company for long-term sustainability.\n",
+ "\n",
"Context Relevance Score: 1.0\n",
"Factual Accuracy Score: 1.0\n",
- "Response Completeness Score: 1.0\n",
+ "Response Completeness Score: 0.5\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
"\n"
]
}
],
"source": [
- "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\" # Replace with your OpenAI API key\n",
- "llm = OpenAI(model=\"gpt-4-turbo-preview\")\n",
+ "callback_handler = UpTrainCallbackHandler(\n",
+ " key_type=\"openai\",\n",
+ " api_key=os.environ[\"OPENAI_API_KEY\"],\n",
+ " project_name_prefix=\"llama\",\n",
+ ")\n",
+ "Settings.callback_manager = CallbackManager([callback_handler])\n",
"\n",
- "cohere_rerank = LLMRerank(\n",
- " llm=llm, top_n=5\n",
- ") # In this example, the number of nodes before re-ranking is 5 and after re-ranking is also 5.\n",
+ "rerank_postprocessor = SentenceTransformerRerank(\n",
+ " top_n=3, # number of nodes after reranking\n",
+ " keep_retrieval_score=True,\n",
+ ")\n",
"\n",
"index = VectorStoreIndex.from_documents(\n",
" documents=documents,\n",
")\n",
"\n",
"query_engine = index.as_query_engine(\n",
- " similarity_top_k=10,\n",
- " node_postprocessors=[cohere_rerank],\n",
+ " similarity_top_k=3, # number of nodes before reranking\n",
+ " node_postprocessors=[rerank_postprocessor],\n",
")\n",
"\n",
"response = query_engine.query(\n",
@@ -536,7 +626,8 @@
"# 3b. Re-ranking (With different number of nodes)\n",
"\n",
"If the number of nodes returned after re-ranking is the lesser as the original number of nodes, the following evaluation will be performed:\n",
- "- **Context Conciseness**: If the re-ranked nodes are able to provide all the information required by the query."
+ "\n",
+ "- **[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**: Examines whether the reduced number of nodes still provides all the required information."
]
},
{
@@ -548,7 +639,27 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "\u001b[32m2024-02-13 20:01:39.343\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36muptrain.framework.evalllm\u001b[0m:\u001b[36mevaluate\u001b[0m:\u001b[36m110\u001b[0m - \u001b[1mSending evaluation request for rows 0 to <50 to the Uptrain\u001b[0m\n"
+ "100%|██████████| 1/1 [00:02<00:00, 2.22s/it]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Question: What did Sam Altman do in this essay?\n",
+ "Context Conciseness Score: 0.0\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 1/1 [00:01<00:00, 1.58s/it]\n",
+ "100%|██████████| 1/1 [00:00<00:00, 1.19it/s]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.62s/it]\n",
+ "100%|██████████| 1/1 [00:01<00:00, 1.42s/it]"
]
},
{
@@ -557,27 +668,41 @@
"text": [
"\n",
"Question: What did Sam Altman do in this essay?\n",
- "Context Relevance Score: 0.5\n",
+ "Response: Sam Altman offered unsolicited advice to the author during a visit to California for interviews.\n",
+ "\n",
+ "Context Relevance Score: 0.0\n",
"Factual Accuracy Score: 1.0\n",
- "Response Completeness Score: 1.0\n",
+ "Response Completeness Score: 0.5\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
"\n"
]
}
],
"source": [
- "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\" # Replace with your OpenAI API key\n",
- "llm = OpenAI(model=\"gpt-4-turbo-preview\")\n",
+ "callback_handler = UpTrainCallbackHandler(\n",
+ " key_type=\"openai\",\n",
+ " api_key=os.environ[\"OPENAI_API_KEY\"],\n",
+ " project_name_prefix=\"llama\",\n",
+ ")\n",
+ "Settings.callback_manager = CallbackManager([callback_handler])\n",
"\n",
- "cohere_rerank = LLMRerank(\n",
- " llm=llm, top_n=2\n",
- ") # In this example, the number of nodes before re-ranking is 5 and after re-ranking is 2.\n",
+ "rerank_postprocessor = SentenceTransformerRerank(\n",
+ " top_n=2, # Number of nodes after re-ranking\n",
+ " keep_retrieval_score=True,\n",
+ ")\n",
"\n",
"index = VectorStoreIndex.from_documents(\n",
" documents=documents,\n",
")\n",
"query_engine = index.as_query_engine(\n",
- " similarity_top_k=10,\n",
- " node_postprocessors=[cohere_rerank],\n",
+ " similarity_top_k=5, # Number of nodes before re-ranking\n",
+ " node_postprocessors=[rerank_postprocessor],\n",
")\n",
"\n",
"# Use your advanced RAG\n",
@@ -592,14 +717,7 @@
"source": [
"# UpTrain's Managed Service Dashboard and Insights\n",
"\n",
- "The UpTrain Managed Service offers the following features:\n",
- "\n",
- "1. Advanced dashboards with drill-down and filtering options.\n",
- "1. Identification of insights and common themes among unsuccessful cases.\n",
- "1. Real-time observability and monitoring of production data.\n",
- "1. Integration with CI/CD pipelines for seamless regression testing.\n",
- "\n",
- "To define the UpTrain callback handler, the only change required is to set the `key_type` and `api_key` parameters. The rest of the code remains the same.\n",
+ "To use the UpTrain's managed service via the UpTrain callback handler, the only change required is to set the `key_type` and `api_key` parameters. The rest of the code remains the same.\n",
"\n",
"```python\n",
"callback_handler = UpTrainCallbackHandler(\n",
@@ -622,7 +740,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "phoenixdev",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -639,5 +757,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}
diff --git a/docs/examples/customization/prompts/chat_prompts.ipynb b/docs/examples/customization/prompts/chat_prompts.ipynb
index 8efc743c68d8c..18a4408e92d4c 100644
--- a/docs/examples/customization/prompts/chat_prompts.ipynb
+++ b/docs/examples/customization/prompts/chat_prompts.ipynb
@@ -8,7 +8,6 @@
]
},
{
- "attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -41,13 +40,51 @@
]
},
{
- "attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prompt Setup\n",
"\n",
- "Below, we take the default prompts and customize them to always answer, even if the context is not helpful."
+ "Below, we take the default prompts and customize them to always answer, even if the context is not helpful.\n",
+ "\n",
+ "We show two ways of setting up the prompts:\n",
+ "1. Explicitly define ChatMessage and MessageRole objects.\n",
+ "2. Call ChatPromptTemplate.from_messages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "qa_prompt_str = (\n",
+ " \"Context information is below.\\n\"\n",
+ " \"---------------------\\n\"\n",
+ " \"{context_str}\\n\"\n",
+ " \"---------------------\\n\"\n",
+ " \"Given the context information and not prior knowledge, \"\n",
+ " \"answer the question: {query_str}\\n\"\n",
+ ")\n",
+ "\n",
+ "refine_prompt_str = (\n",
+ " \"We have the opportunity to refine the original answer \"\n",
+ " \"(only if needed) with some more context below.\\n\"\n",
+ " \"------------\\n\"\n",
+ " \"{context_msg}\\n\"\n",
+ " \"------------\\n\"\n",
+ " \"Given the new context, refine the original answer to better \"\n",
+ " \"answer the question: {query_str}. \"\n",
+ " \"If the context isn't useful, output the original answer again.\\n\"\n",
+ " \"Original Answer: {existing_answer}\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 1. Explicitly Define `ChatMessage` and `MessageRole` objects"
]
},
{
@@ -67,17 +104,7 @@
" \"Always answer the question, even if the context isn't helpful.\"\n",
" ),\n",
" ),\n",
- " ChatMessage(\n",
- " role=MessageRole.USER,\n",
- " content=(\n",
- " \"Context information is below.\\n\"\n",
- " \"---------------------\\n\"\n",
- " \"{context_str}\\n\"\n",
- " \"---------------------\\n\"\n",
- " \"Given the context information and not prior knowledge, \"\n",
- " \"answer the question: {query_str}\\n\"\n",
- " ),\n",
- " ),\n",
+ " ChatMessage(role=MessageRole.USER, content=qa_prompt_str),\n",
"]\n",
"text_qa_template = ChatPromptTemplate(chat_text_qa_msgs)\n",
"\n",
@@ -89,26 +116,50 @@
" \"Always answer the question, even if the context isn't helpful.\"\n",
" ),\n",
" ),\n",
- " ChatMessage(\n",
- " role=MessageRole.USER,\n",
- " content=(\n",
- " \"We have the opportunity to refine the original answer \"\n",
- " \"(only if needed) with some more context below.\\n\"\n",
- " \"------------\\n\"\n",
- " \"{context_msg}\\n\"\n",
- " \"------------\\n\"\n",
- " \"Given the new context, refine the original answer to better \"\n",
- " \"answer the question: {query_str}. \"\n",
- " \"If the context isn't useful, output the original answer again.\\n\"\n",
- " \"Original Answer: {existing_answer}\"\n",
- " ),\n",
- " ),\n",
+ " ChatMessage(role=MessageRole.USER, content=refine_prompt_str),\n",
"]\n",
"refine_template = ChatPromptTemplate(chat_refine_msgs)"
]
},
{
- "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2. Call `ChatPromptTemplate.from_messages`\n",
+ "\n",
+ "`from_messages` is syntatic sugar that allows you to define a chat prompt template as a list of tuples, with each tuple corresponding to a chat message (\"role\", \"message\"). "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.core import ChatPromptTemplate\n",
+ "\n",
+ "# Text QA Prompt\n",
+ "chat_text_qa_msgs = [\n",
+ " (\n",
+ " \"system\",\n",
+ " \"Always answer the question, even if the context isn't helpful.\",\n",
+ " ),\n",
+ " (\"user\", qa_prompt_str),\n",
+ "]\n",
+ "text_qa_template = ChatPromptTemplate.from_messages(chat_text_qa_msgs)\n",
+ "\n",
+ "# Refine Prompt\n",
+ "chat_refine_msgs = [\n",
+ " (\n",
+ " \"system\",\n",
+ " \"Always answer the question, even if the context isn't helpful.\",\n",
+ " ),\n",
+ " (\"user\", refine_prompt_str),\n",
+ "]\n",
+ "refine_template = ChatPromptTemplate.from_messages(chat_refine_msgs)"
+ ]
+ },
+ {
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -165,7 +216,6 @@
]
},
{
- "attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -181,7 +231,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "I'm sorry, but the given context does not provide any information about Joe Biden.\n"
+ "I'm unable to provide an answer to that question based on the context information provided.\n"
]
}
],
@@ -190,7 +240,6 @@
]
},
{
- "attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -206,7 +255,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "Joe Biden is the 46th President of the United States.\n"
+ "Joe Biden is the current President of the United States, having taken office in January 2021. He previously served as Vice President under President Barack Obama from 2009 to 2017.\n"
]
}
],
@@ -223,9 +272,9 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3 (ipykernel)",
+ "display_name": "llama_index_v3",
"language": "python",
- "name": "python3"
+ "name": "llama_index_v3"
},
"language_info": {
"codemirror_mode": {
diff --git a/docs/examples/discover_llamaindex/document_management/BUILD b/docs/examples/discover_llamaindex/document_management/BUILD
new file mode 100644
index 0000000000000..db46e8d6c978c
--- /dev/null
+++ b/docs/examples/discover_llamaindex/document_management/BUILD
@@ -0,0 +1 @@
+python_sources()
diff --git a/docs/examples/embeddings/bedrock.ipynb b/docs/examples/embeddings/bedrock.ipynb
index ef727d08147c5..45bcc63e8328e 100644
--- a/docs/examples/embeddings/bedrock.ipynb
+++ b/docs/examples/embeddings/bedrock.ipynb
@@ -41,12 +41,12 @@
"metadata": {},
"outputs": [],
"source": [
- "embed_model = BedrockEmbedding.from_credentials(\n",
+ "embed_model = BedrockEmbedding(\n",
" aws_access_key_id=os.getenv(\"AWS_ACCESS_KEY_ID\"),\n",
" aws_secret_access_key=os.getenv(\"AWS_SECRET_ACCESS_KEY\"),\n",
" aws_session_token=os.getenv(\"AWS_SESSION_TOKEN\"),\n",
- " aws_region=\"\",\n",
- " aws_profile=\"\",\n",
+ " region_name=\"\",\n",
+ " profile_name=\"\",\n",
")"
]
},
@@ -97,9 +97,7 @@
"source": [
"from llama_index.embeddings.bedrock import BedrockEmbedding\n",
"\n",
- "model = BedrockEmbedding().from_credentials(\n",
- " model_name=\"amazon.titan-embed-g1-text-02\"\n",
- ")\n",
+ "model = BedrockEmbedding(model=\"amazon.titan-embed-g1-text-02\")\n",
"embeddings = model.get_text_embedding(\"hello world\")\n",
"print(embeddings)"
]
@@ -119,15 +117,13 @@
"metadata": {},
"outputs": [],
"source": [
- "model = BedrockEmbedding().from_credentials(\n",
- " model_name=\"cohere.embed-english-v3\"\n",
- ")\n",
- "coherePayload = {\n",
- " \"texts\": [\"This is a test document\", \"This is another test document\"],\n",
- " \"input_type\": \"search_document\",\n",
- " \"truncate\": \"NONE\",\n",
- "}\n",
- "embeddings = model.get_text_embedding(coherePayload)\n",
+ "model = BedrockEmbedding(model=\"cohere.embed-english-v3\")\n",
+ "coherePayload = [\"This is a test document\", \"This is another test document\"]\n",
+ "\n",
+ "embed1 = model.get_text_embedding(\"This is a test document\")\n",
+ "print(embed1)\n",
+ "\n",
+ "embeddings = model.get_text_embedding_batch(coherePayload)\n",
"print(embeddings)"
]
},
@@ -144,18 +140,16 @@
"metadata": {},
"outputs": [],
"source": [
- "model = BedrockEmbedding().from_credentials(\n",
- " model_name=\"cohere.embed-multilingual-v3\"\n",
- ")\n",
- "coherePayload = {\n",
- " \"texts\": [\n",
- " \"This is a test document\",\n",
- " \"తెలుగు అనేది ద్రావిడ భాషల కుటుంబానికి చెందిన భాష.\",\n",
- " ],\n",
- " \"input_type\": \"search_document\",\n",
- " \"truncate\": \"NONE\",\n",
- "}\n",
- "embeddings = model.get_text_embedding(coherePayload)\n",
+ "model = BedrockEmbedding(model=\"cohere.embed-multilingual-v3\")\n",
+ "coherePayload = [\n",
+ " \"This is a test document\",\n",
+ " \"తెలుగు అనేది ద్రావిడ భాషల కుటుంబానికి చెందిన భాష.\",\n",
+ " \"Esto es una prueba de documento multilingüe.\",\n",
+ " \"攻殻機動隊\",\n",
+ " \"Combien de temps ça va prendre ?\",\n",
+ " \"Документ проверен\",\n",
+ "]\n",
+ "embeddings = model.get_text_embedding_batch(coherePayload)\n",
"print(embeddings)"
]
}
diff --git a/docs/examples/evaluation/UpTrain.ipynb b/docs/examples/evaluation/UpTrain.ipynb
index 6cb85c2340c9e..ba718a41234dc 100644
--- a/docs/examples/evaluation/UpTrain.ipynb
+++ b/docs/examples/evaluation/UpTrain.ipynb
@@ -21,7 +21,7 @@
"id": "0958c248",
"metadata": {},
"source": [
- "**Overview**: In this example, we will see how to use UpTrain with LlamaIndex. "
+ "**Overview**: In this example, we will see how to use UpTrain with LlamaIndex. UpTrain ([github](https://github.com/uptrain-ai/uptrain) || [website](https://github.com/uptrain-ai/uptrain/) || [docs](https://docs.uptrain.ai/)) is an open-source platform to evaluate and improve GenAI applications. It provides grades for 20+ preconfigured checks (covering language, code, embedding use cases), performs root cause analysis on failure cases and gives insights on how to resolve them. More details on UpTrain's evaluations can be found [here](https://github.com/uptrain-ai/uptrain?tab=readme-ov-file#pre-built-evaluations-we-offer-).\n"
]
},
{
@@ -49,12 +49,25 @@
"id": "0b101745",
"metadata": {},
"source": [
- "## Install UpTrain and LlamaIndex\n",
- "\n",
- "\n",
- "```bash\n",
- "pip install uptrain llama_index\n",
- "```"
+ "## Install UpTrain and LlamaIndex"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a6734276",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "%pip install -q uptrain llama-index"
]
},
{
@@ -70,14 +83,24 @@
"execution_count": null,
"id": "6c6e7a1d",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/dhruvchawla/Work/llama_index/venv/lib/python3.11/site-packages/lazy_loader/__init__.py:185: RuntimeWarning: subpackages can technically be lazily loaded, but it causes the package to be eagerly loaded even if it is already lazily loaded.So, you probably shouldn't use subpackages with this lazy feature.\n",
+ " warnings.warn(msg, RuntimeWarning)\n"
+ ]
+ }
+ ],
"source": [
+ "import httpx\n",
"import os\n",
"import openai\n",
"import pandas as pd\n",
"\n",
- "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
- "from uptrain import Evals, EvalLlamaIndex, Settings"
+ "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings\n",
+ "from uptrain import Evals, EvalLlamaIndex, Settings as UpTrainSettings"
]
},
{
@@ -103,8 +126,6 @@
"dataset_path = os.path.join(\"./nyc_wikipedia\", \"nyc_text.txt\")\n",
"\n",
"if not os.path.exists(dataset_path):\n",
- " import httpx\n",
- "\n",
" r = httpx.get(url)\n",
" with open(dataset_path, \"wb\") as f:\n",
" f.write(r.content)"
@@ -176,8 +197,6 @@
"metadata": {},
"outputs": [],
"source": [
- "from llama_index.core import Settings\n",
- "\n",
"Settings.chunk_size = 512\n",
"\n",
"documents = SimpleDirectoryReader(\"./nyc_wikipedia/\").load_data()\n",
@@ -204,7 +223,7 @@
"metadata": {},
"outputs": [],
"source": [
- "settings = Settings(\n",
+ "settings = UpTrainSettings(\n",
" openai_api_key=openai.api_key,\n",
")"
]
@@ -306,101 +325,101 @@
"
0
\n",
"
What is the population of New York City?
\n",
"
The population of New York City is 8,804,190 a...
\n",
- "
New York, often called New York City or NYC, i...
\n",
- "
1.0
\n",
- "
The question is asking for the population of N...
\n",
- "
1.0
\n",
- "
The question asks for the population of New Yo...
\n",
+ "
=== Population density ===\\n\\nIn 2020, the cit...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
" \n",
"
\n",
"
1
\n",
"
What is the area of New York City?
\n",
"
New York City has a total area of 468.484 squa...
\n",
- "
New York, often called New York City or NYC, i...
\n",
- "
1.0
\n",
- "
Step 1: The question asks for the area of New ...
\n",
- "
1.0
\n",
- "
The question asks for the area of New York Cit...
\n",
+ "
Some of the natural relief in topography has b...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
2
\n",
"
What is the largest borough in New York City?
\n",
"
Queens is the largest borough in New York City.
\n",
"
==== Brooklyn ====\\nBrooklyn (Kings County), o...
\n",
- "
1.0
\n",
- "
The question is asking for the largest borough...
\n",
- "
1.0
\n",
- "
The question asks for the largest borough in N...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
3
\n",
"
What is the average temperature in New York City?
\n",
- "
The average temperature in New York City is 57...
\n",
+ "
The average temperature in New York City is 33...
\n",
"
Similarly, readings of 0 °F (−18 °C) are also ...
\n",
- "
0.5
\n",
- "
The question is asking for the average tempera...
\n",
- "
1.0
\n",
- "
The question asks for the average temperature ...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
4
\n",
"
What is the main airport in New York City?
\n",
- "
The main airport in New York City is John F. K...
\n",
+ "
John F. Kennedy International Airport
\n",
"
along the Northeast Corridor, and long-distanc...
\n",
- "
1.0
\n",
- "
The question is asking for the main airport in...
\n",
- "
1.0
\n",
- "
The question asks for the main airport in New ...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
5
\n",
"
What is the famous landmark in New York City?
\n",
- "
The famous landmark in New York City is the Em...
\n",
- "
A record 66.6 million tourists visited New Yor...
\n",
- "
1.0
\n",
- "
The question is asking for the famous landmark...
\n",
- "
1.0
\n",
- "
The question asks for the famous landmark in N...
\n",
+ "
The famous landmark in New York City is the St...
\n",
+ "
The settlement was named New Amsterdam (Dutch:...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
6
\n",
"
What is the official language of New York City?
\n",
- "
The official language of New York City is not ...
\n",
+ "
As many as 800 languages are spoken in New Yor...
\n",
"
=== Accent and dialect ===\\n\\nThe New York are...
\n",
- "
0.0
\n",
- "
The question is asking for the official langua...
\n",
- "
1.0
\n",
- "
The question asks for the official language of...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
7
\n",
"
What is the currency used in New York City?
\n",
- "
The currency used in New York City is the Unit...
\n",
+ "
The currency used in New York City is the US D...
\n",
"
=== Real estate ===\\n\\nReal estate is a major ...
\n",
- "
0.0
\n",
- "
The question is asking for the currency used i...
\n",
- "
1.0
\n",
- "
The question asks specifically for the currenc...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
8
\n",
"
What is the time zone of New York City?
\n",
"
Eastern Standard Time (EST)
\n",
"
According to the New York City Comptroller, wo...
\n",
- "
0.0
\n",
- "
The question is \"What is the time zone of New ...
\n",
- "
1.0
\n",
- "
The question asks for the time zone of New Yor...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
"
\n",
"
9
\n",
"
What is the famous sports team in New York City?
\n",
"
The famous sports team in New York City is the...
\n",
- "
==== Baseball ====\\nNew York has been describe...
\n",
- "
0.5
\n",
- "
The question is asking for the famous sports t...
\n",
- "
1.0
\n",
- "
The question asks for the famous sports team i...
\n",
+ "
==== Soccer ====\\nIn soccer, New York City is ...
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
+ "
None
\n",
"
\n",
" \n",
"\n",
@@ -423,61 +442,49 @@
"0 The population of New York City is 8,804,190 a... \n",
"1 New York City has a total area of 468.484 squa... \n",
"2 Queens is the largest borough in New York City. \n",
- "3 The average temperature in New York City is 57... \n",
- "4 The main airport in New York City is John F. K... \n",
- "5 The famous landmark in New York City is the Em... \n",
- "6 The official language of New York City is not ... \n",
- "7 The currency used in New York City is the Unit... \n",
+ "3 The average temperature in New York City is 33... \n",
+ "4 John F. Kennedy International Airport \n",
+ "5 The famous landmark in New York City is the St... \n",
+ "6 As many as 800 languages are spoken in New Yor... \n",
+ "7 The currency used in New York City is the US D... \n",
"8 Eastern Standard Time (EST) \n",
"9 The famous sports team in New York City is the... \n",
"\n",
- " context score_context_relevance \\\n",
- "0 New York, often called New York City or NYC, i... 1.0 \n",
- "1 New York, often called New York City or NYC, i... 1.0 \n",
- "2 ==== Brooklyn ====\\nBrooklyn (Kings County), o... 1.0 \n",
- "3 Similarly, readings of 0 °F (−18 °C) are also ... 0.5 \n",
- "4 along the Northeast Corridor, and long-distanc... 1.0 \n",
- "5 A record 66.6 million tourists visited New Yor... 1.0 \n",
- "6 === Accent and dialect ===\\n\\nThe New York are... 0.0 \n",
- "7 === Real estate ===\\n\\nReal estate is a major ... 0.0 \n",
- "8 According to the New York City Comptroller, wo... 0.0 \n",
- "9 ==== Baseball ====\\nNew York has been describe... 0.5 \n",
- "\n",
- " explanation_context_relevance \\\n",
- "0 The question is asking for the population of N... \n",
- "1 Step 1: The question asks for the area of New ... \n",
- "2 The question is asking for the largest borough... \n",
- "3 The question is asking for the average tempera... \n",
- "4 The question is asking for the main airport in... \n",
- "5 The question is asking for the famous landmark... \n",
- "6 The question is asking for the official langua... \n",
- "7 The question is asking for the currency used i... \n",
- "8 The question is \"What is the time zone of New ... \n",
- "9 The question is asking for the famous sports t... \n",
+ " context score_context_relevance \\\n",
+ "0 === Population density ===\\n\\nIn 2020, the cit... None \n",
+ "1 Some of the natural relief in topography has b... None \n",
+ "2 ==== Brooklyn ====\\nBrooklyn (Kings County), o... None \n",
+ "3 Similarly, readings of 0 °F (−18 °C) are also ... None \n",
+ "4 along the Northeast Corridor, and long-distanc... None \n",
+ "5 The settlement was named New Amsterdam (Dutch:... None \n",
+ "6 === Accent and dialect ===\\n\\nThe New York are... None \n",
+ "7 === Real estate ===\\n\\nReal estate is a major ... None \n",
+ "8 According to the New York City Comptroller, wo... None \n",
+ "9 ==== Soccer ====\\nIn soccer, New York City is ... None \n",
"\n",
- " score_response_conciseness \\\n",
- "0 1.0 \n",
- "1 1.0 \n",
- "2 1.0 \n",
- "3 1.0 \n",
- "4 1.0 \n",
- "5 1.0 \n",
- "6 1.0 \n",
- "7 1.0 \n",
- "8 1.0 \n",
- "9 1.0 \n",
+ " explanation_context_relevance score_response_conciseness \\\n",
+ "0 None None \n",
+ "1 None None \n",
+ "2 None None \n",
+ "3 None None \n",
+ "4 None None \n",
+ "5 None None \n",
+ "6 None None \n",
+ "7 None None \n",
+ "8 None None \n",
+ "9 None None \n",
"\n",
- " explanation_response_conciseness \n",
- "0 The question asks for the population of New Yo... \n",
- "1 The question asks for the area of New York Cit... \n",
- "2 The question asks for the largest borough in N... \n",
- "3 The question asks for the average temperature ... \n",
- "4 The question asks for the main airport in New ... \n",
- "5 The question asks for the famous landmark in N... \n",
- "6 The question asks for the official language of... \n",
- "7 The question asks specifically for the currenc... \n",
- "8 The question asks for the time zone of New Yor... \n",
- "9 The question asks for the famous sports team i... "
+ " explanation_response_conciseness \n",
+ "0 None \n",
+ "1 None \n",
+ "2 None \n",
+ "3 None \n",
+ "4 None \n",
+ "5 None \n",
+ "6 None \n",
+ "7 None \n",
+ "8 None \n",
+ "9 None "
]
},
"execution_count": null,
@@ -530,7 +537,7 @@
"UPTRAIN_API_KEY = \"up-**********************\" # your UpTrain API key\n",
"\n",
"# We use `uptrain_access_token` parameter instead of 'openai_api_key' in settings in this case\n",
- "settings = Settings(\n",
+ "settings = UpTrainSettings(\n",
" uptrain_access_token=UPTRAIN_API_KEY,\n",
")"
]
diff --git a/docs/examples/finetuning/embeddings/BUILD b/docs/examples/finetuning/embeddings/BUILD
new file mode 100644
index 0000000000000..db46e8d6c978c
--- /dev/null
+++ b/docs/examples/finetuning/embeddings/BUILD
@@ -0,0 +1 @@
+python_sources()
diff --git a/docs/examples/llama_hub/llama_pack_ollama.ipynb b/docs/examples/llama_hub/llama_pack_ollama.ipynb
index 630e0fa88791b..73bf384ef4094 100644
--- a/docs/examples/llama_hub/llama_pack_ollama.ipynb
+++ b/docs/examples/llama_hub/llama_pack_ollama.ipynb
@@ -123,8 +123,6 @@
"metadata": {},
"outputs": [],
"source": [
- "from ollama_pack.base import OllamaQueryEnginePack\n",
- "\n",
"# You can use any llama-hub loader to get documents!\n",
"ollama_pack = OllamaQueryEnginePack(model=\"llama2\", documents=documents)"
]
diff --git a/docs/examples/llm/anthropic.ipynb b/docs/examples/llm/anthropic.ipynb
index b4bc70c55886d..7ae2e54511a76 100644
--- a/docs/examples/llm/anthropic.ipynb
+++ b/docs/examples/llm/anthropic.ipynb
@@ -13,7 +13,13 @@
"id": "72ed6f61-28a7-4f90-8a45-e3f452f95dbd",
"metadata": {},
"source": [
- "# Anthropic"
+ "# Anthropic\n",
+ "\n",
+ "Anthropic has recently released its latest models: `Claude 3 Opus`, `Claude 3 Sonnet`, and `Claude 3 Haiku` (which will be available soon). By default, the `claude-2.1 model` is used. This notebook provides guidance on how to utilize these new models.\n",
+ "\n",
+ "1. Claude 3 Opus - claude-3-opus-20240229\n",
+ "2. Claude 3 Sonnet\t- claude-3-sonnet-20240229\n",
+ "3. Claude 3 Haiku - Coming soon"
]
},
{
@@ -44,6 +50,32 @@
"!pip install llama-index"
]
},
+ {
+ "cell_type": "markdown",
+ "id": "3cbf8694-ad53-459a-84c1-64de2dadeaf5",
+ "metadata": {},
+ "source": [
+ "#### Set Tokenizer\n",
+ "\n",
+ "First we want to set the tokenizer, which is slightly different than TikToken.\n",
+ "\n",
+ "**NOTE**: The Claude 3 tokenizer has not been updated yet; using the existing Anthropic tokenizer leads to context overflow errors for 200k tokens. We've temporarily set the max tokens for Claude 3 to 180k."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c6ac37cb-b588-44c7-8fd9-8eab454900a5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from llama_index.llms.anthropic import Anthropic\n",
+ "from llama_index.core import Settings\n",
+ "\n",
+ "tokenizer = Anthropic().tokenizer\n",
+ "Settings.tokenizer = tokenizer"
+ ]
+ },
{
"cell_type": "markdown",
"id": "b81a3ef6-2ee5-460d-9aa4-f73708774014",
@@ -52,6 +84,18 @@
"#### Call `complete` with a prompt"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "85fbba23",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ[\"ANTHROPIC_API_KEY\"] = \"YOUR ANTHROPIC API KEY\""
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -64,7 +108,7 @@
"# To customize your API key, do this\n",
"# otherwise it will lookup ANTHROPIC_API_KEY from your env variable\n",
"# llm = Anthropic(api_key=\"\")\n",
- "llm = Anthropic()\n",
+ "llm = Anthropic(model=\"claude-3-opus-20240229\")\n",
"\n",
"resp = llm.complete(\"Paul Graham is \")"
]
@@ -79,21 +123,21 @@
"name": "stdout",
"output_type": "stream",
"text": [
- " Here are some key facts about Paul Graham:\n",
+ "Paul Graham is a well-known entrepreneur, programmer, venture capitalist, and essayist. He is best known for co-founding Viaweb, one of the first web application companies, which was later sold to Yahoo! in 1998 and became Yahoo! Store. Graham is also the co-founder of Y Combinator, a highly successful startup accelerator that has helped launch numerous successful companies, such as Dropbox, Airbnb, and Reddit.\n",
"\n",
- "- Paul Graham is an American computer scientist, venture capitalist, and essayist. He is known for co-founding Viaweb, one of the first web-based application companies, which was acquired by Yahoo in 1998.\n",
+ "Some key points about Paul Graham:\n",
"\n",
- "- In 1995, Graham co-founded Viaweb with Robert Morris, Trevor Blackwell, and Jessica Livingston. The company helped popularize the business model of applying software as a service.\n",
+ "1. Programming: Graham is a skilled programmer and has written extensively on the subject, including his book \"Hackers & Painters: Big Ideas from the Computer Age.\"\n",
"\n",
- "- After selling Viaweb to Yahoo, Graham became a venture capitalist. He co-founded Y Combinator in 2005 with Jessica Livingston, Trevor Blackwell, and Robert Morris. Y Combinator is an influential startup accelerator that provides seed funding and advice to startups.\n",
+ "2. Essays: He is a prolific essayist, writing on various topics related to technology, startups, and entrepreneurship. His essays have been influential in the tech startup community.\n",
"\n",
- "- Graham has written several influential essays on startups, technology, and programming. Some of his most well-known essays include \"How to Start a Startup\", \"Do Things that Don't Scale\", and \"Beating the Averages\" about Lisp programming. \n",
+ "3. Lisp: Graham is an advocate for the Lisp programming language and has written several essays on its advantages.\n",
"\n",
- "- He pioneered the concept of using online essays to attract startup founders to apply to Y Combinator's program. His essays are often required reading in Silicon Valley.\n",
+ "4. Y Combinator: As a co-founder of Y Combinator, Graham has played a significant role in shaping the startup ecosystem and has mentored and invested in numerous successful companies.\n",
"\n",
- "- Graham has a Bachelor's degree in philosophy from Cornell University and a PhD in computer science from Harvard University. His doctoral thesis focused on Lisp compilers.\n",
+ "5. Wealth and inequality: In recent years, Graham has written about income inequality and the concentration of wealth, sparking discussions and debates within the tech community.\n",
"\n",
- "- He is considered an influential figure in the tech and startup worlds, known for his insights on startups, programming languages, and technology trends. His writings have shaped the strategies of many founders building startups.\n"
+ "Overall, Paul Graham is a significant figure in the technology and startup world, known for his contributions as a programmer, investor, and thought leader.\n"
]
}
],
@@ -125,7 +169,7 @@
" ),\n",
" ChatMessage(role=\"user\", content=\"Tell me a story\"),\n",
"]\n",
- "resp = Anthropic().chat(messages)"
+ "resp = Anthropic(model=\"claude-3-opus-20240229\").chat(messages)"
]
},
{
@@ -138,19 +182,19 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "assistant: Here is a fun pirate story for you:\n",
+ "assistant: *clears throat and speaks in a pirate accent* Aye, gather 'round me hearties and I'll spin ye a yarn of adventure on the high seas!\n",
"\n",
- "Yarrr matey! Me name be Captain Redbeard, the most fearsome pirate to sail the seven seas. I be the captain of the good ship Salty Dog, and we be lookin' fer treasure! \n",
+ "T'was a dark and stormy night when the Black Pearl set sail from Tortuga. The salty sea spray stung me eyes as I stood at the helm, guidin' me beloved ship through the roilin' waves. Me loyal crew scurried about, securin' the riggin' and battening down the hatches. \n",
"\n",
- "I lost me leg in a battle with the evil Captain Bluebeard years ago. That scallywag got the better of me that time, but I'll have me revenge! Now I got me a peg leg that I can use to stomp the deck or kick me enemies right in the rear! \n",
+ "Suddenly, the lookout cried \"Ship ahoy!\" and pointed off the starboard bow. I raised me spyglass and spied a Spanish galleon, her decks heavily laden with treasure. The crew gave a hearty cheer - we'd be feastin' and drinkin' well tonight!\n",
"\n",
- "Me first mate Scurvy Sam be my best friend. We go way back to when we were just lads dreamin' of a pirate's life. He may only have one good eye after losin' the other one to a seagull, but he can still spot treasure from a league away! \n",
+ "I ordered the crew to ready the cannons as we drew alongside the galleon. \"Fire all!\" I bellowed and the Pearl shook as the guns unleashed a barrage. The Spaniards returned fire but they were no match for me skilled gunners.\n",
"\n",
- "Today we be sailin' for the fabled Treasure Island, in search of the loot buried long ago by the notorious Captain Flint. Flint was the most ruthless pirate ever to live, but he buried his treasure and no one ever found it. But I have a map, given to me by a dying sailor. I just know it'll lead us right to Flint's trove of rubies, diamonds and mountains of gold! \n",
+ "We boarded the galleon, swords flashin' and pistols blazin'. The fight was fast and bloody but in the end, the Pearl was victorious! We claimed the treasure as our own - mountains of gold and jewels glintin' in the moonlight.\n",
"\n",
- "It won't be easy. We may have to fight off Flint's ghost, or deal with tribes of cannibals, or outwit double-crossing thieves. But that's all part of a pirate's life! And when we finally get our hands on that treasure, we'll live like kings. We'll party all night and sleep all day in our fancy pirate cove. \n",
+ "As we sailed away, I couldn't help but grin. T'was a fine night of piratin' and I knew many more adventures lay ahead for me and me crew. No matter the danger, the Black Pearl would always prevail! Yo ho ho!\n",
"\n",
- "So hoist the mainsail me hearties, and let's set sail for adventure! Keep a weather eye on the horizon, mateys. Treasure awaits!\n"
+ "*laughs heartily* And that, me friends, is a taste of the pirate's life. May yer sails always be full and yer horizons bright. Fare thee well!\n"
]
}
],
@@ -183,7 +227,7 @@
"source": [
"from llama_index.llms.anthropic import Anthropic\n",
"\n",
- "llm = Anthropic()\n",
+ "llm = Anthropic(model=\"claude-3-opus-20240229\", max_tokens=100)\n",
"resp = llm.stream_complete(\"Paul Graham is \")"
]
},
@@ -197,21 +241,9 @@
"name": "stdout",
"output_type": "stream",
"text": [
- " Here are some key points about Paul Graham:\n",
- "\n",
- "- Paul Graham is an American computer scientist, venture capitalist, and essayist. He is known for co-founding Viaweb, one of the first web-based applications, which was acquired by Yahoo in 1998.\n",
- "\n",
- "- In 2005, Graham co-founded Y Combinator, a startup accelerator that provides seed funding and advice to startups. Y Combinator has backed over 2000 companies including Dropbox, Airbnb, Stripe, and Reddit. \n",
- "\n",
- "- Graham has written extensively about startups, programming, and technology. Some of his most popular essays include \"How to Start a Startup\", \"The Age of the Essay\", and \"Beating the Averages\" about his experiences with Viaweb.\n",
+ "Paul Graham is a well-known entrepreneur, programmer, venture capitalist, and essayist. He is best known for co-founding Viaweb, one of the first web application companies, which was later sold to Yahoo! in 1998 and became Yahoo! Store. \n",
"\n",
- "- As an essayist, Graham has a very analytical and insightful writing style. He is skilled at breaking down complex concepts and explaining ideas clearly. His essays cover a wide range of topics including startups, programming, economics, and philosophy.\n",
- "\n",
- "- In addition to his work with startups, Graham previously worked as a programmer at Yahoo and was also a professor of computer science at Harvard University. He studied mathematics at Cornell University and obtained a PhD in Computer Science from Harvard.\n",
- "\n",
- "- Graham has advocated for funding and supporting startup founders who may lack traditional credentials like college degrees. He has argued that intelligence, determination, and flexibility are more important than formal education for succeeding in startups.\n",
- "\n",
- "In summary, Paul Graham is a prominent figure in the tech industry known for his work with startups, programming, and influential writing and perspectives on technology. His ideas have had a major impact on the startup ecosystem."
+ "After the sale of Viaweb, Graham and his wife Jessica Livingston co-founded Y Combinator in 2005, a highly successful startup accelerator that has helped launch"
]
}
],
@@ -229,7 +261,7 @@
"source": [
"from llama_index.llms.anthropic import Anthropic\n",
"\n",
- "llm = Anthropic()\n",
+ "llm = Anthropic(model=\"claude-3-opus-20240229\")\n",
"messages = [\n",
" ChatMessage(\n",
" role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
@@ -249,15 +281,23 @@
"name": "stdout",
"output_type": "stream",
"text": [
- " Here is a fun pirate story for you:\n",
+ "*clears throat and speaks in a gruff, piratey voice* \n",
+ "\n",
+ "Aye, gather 'round me hearties and I'll spin ye a yarn of adventure on the high seas! \n",
+ "\n",
+ "'Twas a dark and stormy night, the kind where the wind howls like a banshee and the waves crash over the deck. Me and me crew were sailin' the Caribbean, searchin' for treasure and glory.\n",
"\n",
- "Yarrr matey! Me name be Captain Redbeard, the most fearsome pirate to sail the seven seas. I be the captain of the good ship Salty Dog, and we be lookin' fer treasure! \n",
+ "Suddenly, the lookout cried \"Ship ahoy!\" and sure enough, a Spanish galleon was bearin' down on us, her decks bristlin' with cannons. The scurvy dogs wanted our gold, but I'd sooner walk the plank than surrender!\n",
"\n",
- "I lost me leg in a battle with the evil Captain Bluebeard years ago. That scallywag got the better of me that time, but I'll have me revenge! Now I got me a peg leg that I can use to kick me enemies right in the behind! Har har!\n",
+ "\"All hands to battle stations!\" I bellowed. \"Ready the cannons and prepare to board!\" \n",
"\n",
- "Just last week me crew and I found a map leading to the lost treasure of the island of Rundoon. We set sail right away, braving storms and sea creatures the size of ships! When we got to the island, it were guarded by angry natives with spears and poison darts. Me crew fought 'em off while I snuck into the temple and grabbed the treasure chest.\n",
+ "A mighty battle erupted, cannons boomin' and swords clashin'. We swung over on ropes and fought the Spaniards hand-to-hand on the pitchin' and rollin' deck. Me cutlass was a blur as I dueled their captain, a big brute with a wicked scar.\n",
"\n",
- "Now we be rich with dubloons and jewels! I plan to stash me loot on a remote island, then find a tavern and drink grog until I can't stand up straight. Being a pirate captain be a tough life, but someone's got to sail the high seas in search of adventure! Maybe one day I'll get enough treasure to retire and open up a little beach shack...but probably not, cause I love me pirate life too much! Har har har!"
+ "Finally, I drove me blade into that bilge rat's black heart and he fell dead at me feet. His crew surrendered and we took their ship as a prize. In the hold, we found chests overflowing with gold doubloons and jewels - a king's ransom! \n",
+ "\n",
+ "We sailed off into the sunset, our pirate flag snappin' in the breeze, flush with coin and the thrill of victory. And that, me buckos, is a taste of the pirate life! Now who wants some grog?\n",
+ "\n",
+ "*laughs heartily*"
]
}
],
@@ -283,7 +323,7 @@
"source": [
"from llama_index.llms.anthropic import Anthropic\n",
"\n",
- "llm = Anthropic(model=\"claude-instant-1\")"
+ "llm = Anthropic(model=\"claude-3-sonnet-20240229\")"
]
},
{
@@ -306,23 +346,21 @@
"name": "stdout",
"output_type": "stream",
"text": [
- " Here are a few key facts about Paul Graham:\n",
- "\n",
- "- Paul Graham is an American computer scientist, venture capitalist, and essayist. He is known for co-founding Viaweb, one of the first web-based application companies, which was acquired by Yahoo in 1998.\n",
+ "Paul Graham is a computer scientist, entrepreneur, venture capitalist, and author. He is best known for the following:\n",
"\n",
- "- In 2005, Graham co-founded Y Combinator, a startup accelerator that provides seed funding and advice to startups. Y Combinator has backed over 3,000 startups including Dropbox, Airbnb, Stripe, and Reddit. \n",
+ "1. Co-founding Y Combinator: Y Combinator is a prominent startup accelerator based in Silicon Valley. It has funded and helped launch thousands of startups, including Airbnb, Dropbox, Stripe, and Reddit.\n",
"\n",
- "- Graham has written several influential essays on startups, programming languages, and other technology topics. Some of his most well-known essays include \"Beating the Averages\", \"The Refragmentation\", and \"How to Start a Startup\".\n",
+ "2. Writing essays on startups and technology: Graham has written numerous influential essays on topics related to startups, programming, and entrepreneurship. His essays are widely read and have helped shape the thinking of many entrepreneurs and technologists.\n",
"\n",
- "- He pioneered and popularized the idea of using Lisp as a web programming language via his company Viaweb. This helped inspire interest in functional programming languages for web development.\n",
+ "3. Developing the programming language Arc: In the early 2000s, Graham developed a new programming language called Arc, which was designed to be a more powerful and expressive dialect of Lisp.\n",
"\n",
- "- Graham has a Bachelor's degree in philosophy from Cornell University and a PhD in computer science from Harvard University. \n",
+ "4. Advocating for the use of Lisp and functional programming: Graham is a strong proponent of the Lisp programming language and functional programming paradigms. He has written extensively about the benefits of these approaches and has influenced many programmers to explore them.\n",
"\n",
- "- He was inducted into the American Academy of Arts and Sciences in 2020 for his contributions to computer science and entrepreneurship.\n",
+ "5. Authoring books: Graham has authored several books, including \"Hackers & Painters: Big Ideas from the Computer Age\" (2004), \"On Lisp\" (1993), and \"ANSI Common Lisp\" (1995).\n",
"\n",
- "- In addition to his work in technology and startups, Graham is also known for his essays on topics like education, productivity, and economics. Many consider him an influential writer and thinker in the tech industry.\n",
+ "6. Investing in startups: Through Y Combinator and his own investments, Graham has invested in and advised numerous successful startups, helping to shape the technology industry.\n",
"\n",
- "In summary, Paul Graham is a prominent computer scientist, entrepreneur, investor and writer who has made significant contributions to the web, startups and programming languages. He continues to share his insights through his writings and his work with Y Combinator."
+ "Overall, Paul Graham is widely respected in the technology and startup communities for his contributions as a programmer, writer, investor, and advocate for innovative ideas and approaches."
]
}
],
@@ -348,7 +386,7 @@
"source": [
"from llama_index.llms.anthropic import Anthropic\n",
"\n",
- "llm = Anthropic()\n",
+ "llm = Anthropic(\"claude-3-sonnet-20240229\")\n",
"resp = await llm.acomplete(\"Paul Graham is \")"
]
},
@@ -362,21 +400,19 @@
"name": "stdout",
"output_type": "stream",
"text": [
- " Here are some key facts about Paul Graham:\n",
+ "Paul Graham is a computer scientist, entrepreneur, venture capitalist, and author. He is best known for the following:\n",
"\n",
- "- Paul Graham is an American computer scientist, venture capitalist, and essayist. He is known for co-founding Viaweb, one of the first web-based application companies, which was acquired by Yahoo in 1998.\n",
+ "1. Co-founding Y Combinator: Y Combinator is a prominent startup accelerator based in Silicon Valley. It has funded and helped launch many successful startups, including Airbnb, Dropbox, Stripe, and Reddit.\n",
"\n",
- "- In 1995, Graham co-founded Viaweb with Robert Morris, Trevor Blackwell, and Jessica Livingston. The company helped popularize the business model of applying software as a service.\n",
+ "2. Writing essays on startups and technology: Graham has written numerous influential essays on topics related to startups, programming, and entrepreneurship. His essays are widely read and have helped shape the thinking of many entrepreneurs and technologists.\n",
"\n",
- "- After selling Viaweb to Yahoo, Graham became a venture capitalist. He co-founded Y Combinator in 2005 with Jessica Livingston, Trevor Blackwell, and Robert Morris. Y Combinator is an influential startup accelerator that provides seed funding and advice to startups.\n",
+ "3. Developing the programming language Arc: Graham designed and developed the programming language Arc, which was intended to be a more powerful and expressive dialect of Lisp.\n",
"\n",
- "- Graham has written several influential essays on startups, technology, and programming. Some of his most well-known essays include \"How to Start a Startup\", \"Do Things that Don't Scale\", and \"Beating the Averages\" about Lisp programming. \n",
+ "4. Authoring books: He has written several books, including \"Hackers & Painters: Big Ideas from the Computer Age,\" \"ANSI Common Lisp,\" and \"On Lisp.\"\n",
"\n",
- "- He pioneered the concept of using online essays to attract startup founders to apply to Y Combinator's program. His essays are often required reading in Silicon Valley.\n",
+ "5. Founding Viaweb: In the 1990s, Graham co-founded Viaweb, one of the earliest web-based application software companies. Viaweb was later acquired by Yahoo! in 1998.\n",
"\n",
- "- Graham has a Bachelor's degree in philosophy from Cornell University and a PhD in computer science from Harvard University. His doctoral thesis focused on Lisp compilers.\n",
- "\n",
- "- He is considered an influential figure in the tech and startup worlds, known for his insights on startups, programming languages, and technology trends. His writings have shaped the strategies of many founders building startups.\n"
+ "Graham is widely respected in the technology and startup communities for his insights, writings, and contributions to the field of computer science and entrepreneurship.\n"
]
}
],
@@ -401,6 +437,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
+ },
+ "vscode": {
+ "interpreter": {
+ "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
+ }
}
},
"nbformat": 4,
diff --git a/docs/examples/multi_modal/anthropic_multi_modal.ipynb b/docs/examples/multi_modal/anthropic_multi_modal.ipynb
new file mode 100644
index 0000000000000..21d78461bca1e
--- /dev/null
+++ b/docs/examples/multi_modal/anthropic_multi_modal.ipynb
@@ -0,0 +1,621 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "368686b4-f487-4dd4-aeff-37823976529d",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "# Multi-Modal LLM using Anthropic model for image reasoning\n",
+ "\n",
+ "Anthropic has recently released its latest Multi modal models: Claude 3 Opus, Claude 3 Sonnet.\n",
+ "\n",
+ "1. Claude 3 Opus - claude-3-opus-20240229\n",
+ "\n",
+ "2. Claude 3 Sonnet - claude-3-sonnet-20240229\n",
+ "\n",
+ "In this notebook, we show how to use Anthropic MultiModal LLM class/abstraction for image understanding/reasoning.\n",
+ "\n",
+ "We also show several functions we are now supporting for Anthropic MultiModal LLM:\n",
+ "* `complete` (both sync and async): for a single prompt and list of images\n",
+ "* `chat` (both sync and async): for multiple chat messages\n",
+ "* `stream complete` (both sync and async): for steaming output of complete\n",
+ "* `stream chat` (both sync and async): for steaming output of chat"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "396d319e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip install llama-index-multi-modal-llms-anthropic\n",
+ "!pip install llama-index-vector-stores-qdrant\n",
+ "!pip install matplotlib"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4479bf64",
+ "metadata": {},
+ "source": [
+ "## Use Anthropic to understand Images from Local directory"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5455d8c6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ[\"ANTHROPIC_API_KEY\"] = \"\" # Your ANTHROPIC API key here"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4990a807",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "