Merge branch 'run-llama:main' into master

raghavdixit99 · Mar 6, 2024 · 44f1b7a · 44f1b7a
2 parents 8b5fd2f + dca3099
commit 44f1b7a
Show file tree

Hide file tree

Showing 418 changed files with 11,451 additions and 1,878 deletions.
diff --git a/.github/workflows/publish_release.yml b/.github/workflows/publish_release.yml
@@ -14,6 +14,7 @@ env:
 jobs:
   build-n-publish:
     name: Build and publish to PyPI
+    if: github.repository == 'run-llama/llama_index'
     runs-on: ubuntu-latest
 
     steps:

diff --git a/.github/workflows/publish_sub_package.yml b/.github/workflows/publish_sub_package.yml
@@ -0,0 +1,43 @@
+name: Publish Sub-Package to PyPI if Needed
+
+on:
+  push:
+    branches:
+      - main
+
+env:
+  POETRY_VERSION: "1.6.1"
+  PYTHON_VERSION: "3.10"
+
+jobs:
+  publish_subpackage_if_needed:
+    if: github.repository == 'run-llama/llama_index'
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+      - name: Set up python ${{ env.PYTHON_VERSION }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ env.PYTHON_VERSION }}
+      - name: Install Poetry
+        uses: snok/install-poetry@v1
+        with:
+          version: ${{ env.POETRY_VERSION }}
+      - name: Get changed pyproject files
+        id: changed-files
+        run: |
+          echo "changed_files=$(git diff --name-only ${{ github.event.before }} ${{ github.event.after }} | grep -v llama-index-core | grep llama-index | grep pyproject | xargs)" >> $GITHUB_OUTPUT
+      - name: Publish changed packages
+        env:
+          PYPI_TOKEN: ${{ secrets.LLAMA_INDEX_PYPI_TOKEN }}
+        run: |
+          for file in ${{ steps.changed-files.outputs.changed_files }}; do
+              cd `echo $file | sed 's/\/pyproject.toml//g'`
+              poetry lock
+              pip install -e .
+              poetry config pypi-token.pypi $PYPI_TOKEN
+              poetry publish --build
+              cd -
+          done
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,43 @@
 # ChangeLog
 
+## [0.10.16] - 2024-03-05
+
+### New Features
+
+- Anthropic support for new models (#11623, #11612)
+- Easier creation of chat prompts (#11583)
+- Added a raptor retriever llama-pack (#11527)
+- Improve batch cohere embeddings through bedrock (#11572)
+- Added support for vertex AI embeddings (#11561)
+
+### Bug Fixes / Nits
+
+- Ensure order in async embeddings generation (#11562)
+- Fixed empty metadata for csv reader (#11563)
+- Serializable fix for composable retrievers (#11617)
+- Fixed milvus metadata filter support (#11566)
+- FIxed pydantic import in clickhouse vector store (#11631)
+- Fixed system prompts for gemini/vertext-gemini (#11511)
+
+## [0.10.15] - 2024-03-01
+
+### New Features
+
+- Added FeishuWikiReader (#11491)
+- Added videodb retriever integration (#11463)
+- Added async to opensearch vector store (#11513)
+- New LangFuse one-click callback handler (#11324)
+
+### Bug Fixes / Nits
+
+- Fixed deadlock issue with async chat streaming (#11548)
+- Improved hidden file check in SimpleDirectoryReader (#11496)
+- Fixed null values in document metadata when using SimpleDirectoryReader (#11501)
+- Fix for sqlite utils in jsonalyze query engine (#11519)
+- Added base url and timeout to ollama multimodal LLM (#11526)
+- Updated duplicate handling in query fusion retriever (#11542)
+- Fixed bug in kg indexx struct updating (#11475)
+
 ## [0.10.14] - 2024-02-28
 
 ### New Features

diff --git a/docs/BUILD b/docs/BUILD
@@ -0,0 +1 @@
+python_sources()
diff --git a/docs/community/integrations/uptrain.md b/docs/community/integrations/uptrain.md
diff --git a/docs/cookbooks/mixedbread_reranker.ipynb b/docs/cookbooks/mixedbread_reranker.ipynb
@@ -0,0 +1,280 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "964030f7-40e4-4398-a5ab-668aabcf3bad",
+   "metadata": {},
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/cookbooks/mixedbread_reranker.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "360313ab-9393-430e-9647-e0d5545809b9",
+   "metadata": {},
+   "source": [
+    "# mixedbread Rerank Cookbook\n",
+    "\n",
+    "mixedbread.ai has released three fully open-source reranker models under the Apache 2.0 license. For more in-depth information, you can check out their detailed [blog post](https://www.mixedbread.ai/blog/mxbai-rerank-v1). The following are the three models:\n",
+    "\n",
+    "1. `mxbai-rerank-xsmall-v1`\n",
+    "2. `mxbai-rerank-base-v1`\n",
+    "3. `mxbai-rerank-large-v1`\n",
+    "\n",
+    "In this notebook, we'll demonstrate how to use the `mxbai-rerank-base-v1` model with the `SentenceTransformerRerank` module in LlamaIndex. This setup allows you to seamlessly swap in any reranker model of your choice using the `SentenceTransformerRerank` module to enhance your RAG pipeline."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "856ecfdc-04fa-4fe9-a81c-9a5858cd4a6d",
+   "metadata": {},
+   "source": [
+    "### Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bfb5314f-e6c7-409c-86df-8e1a5ca59adb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install llama-index\n",
+    "!pip install sentence-transformers"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "5f5393fb-b410-4769-9380-0ef90a33b82e",
+   "metadata": {},
+   "source": [
+    "### Set API Keys"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a9782acf-b0ab-4933-bb41-27cd2a02b5dd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"YOUR OPENAI API KEY\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7596ddf-e1de-4098-81f3-fce504d2da94",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core import (\n",
+    "    VectorStoreIndex,\n",
+    "    SimpleDirectoryReader,\n",
+    ")\n",
+    "\n",
+    "from llama_index.core.postprocessor import SentenceTransformerRerank"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "8011ff9c-2b82-47b4-983f-4fafc29e3127",
+   "metadata": {},
+   "source": [
+    "### Download Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6dd335cb-900b-462f-987a-d4af2aac88fa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--2024-03-01 09:52:09--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
+      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...\n",
+      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
+      "HTTP request sent, awaiting response... 200 OK\n",
+      "Length: 75042 (73K) [text/plain]\n",
+      "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
+      "\n",
+      "data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.007s  \n",
+      "\n",
+      "2024-03-01 09:52:09 (9.86 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "!mkdir -p 'data/paul_graham/'\n",
+    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "e482b09c-a0df-4788-a75b-a33ade7001d1",
+   "metadata": {},
+   "source": [
+    "### Load Documents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "342c91b8-301f-40ed-9d09-9acdb1bbdc44",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "8afdfeb1-57ae-4d2b-ae73-683db205be32",
+   "metadata": {},
+   "source": [
+    "### Build Index"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "47c335e9-dd4d-475c-bade-e2a588e33294",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "index = VectorStoreIndex.from_documents(documents=documents)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "f1ab8157-dbcb-4588-9b3c-5bd2fc4a721e",
+   "metadata": {},
+   "source": [
+    "### Define postprocessor for `mxbai-rerank-base-v1` reranker"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3fcc5590-2e58-4a7e-8b18-a7153c06d1ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.postprocessor import SentenceTransformerRerank\n",
+    "\n",
+    "postprocessor = SentenceTransformerRerank(\n",
+    "    model=\"mixedbread-ai/mxbai-rerank-base-v1\", top_n=2\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "c7c81b0d-0449-4092-80cb-88080e69f980",
+   "metadata": {},
+   "source": [
+    "### Create Query Engine\n",
+    "\n",
+    "We will first retrieve 10 relevant nodes and pick top-2 nodes using the defined postprocessor."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e1b23700-15ae-4f1a-9443-43eb1eecab5f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_engine = index.as_query_engine(\n",
+    "    similarity_top_k=10,\n",
+    "    node_postprocessors=[postprocessor],\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "93871f9c-8871-4f43-8ee9-b3ca4e403d86",
+   "metadata": {},
+   "source": [
+    "### Test Queries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "658d3092-7d86-4520-83a2-c3e630dc02b6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sam Altman initially declined the offer of becoming president of Y Combinator because he wanted to start a startup focused on making nuclear reactors.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = query_engine.query(\n",
+    "    \"Why did Sam Altman decline the offer of becoming president of Y Combinator?\",\n",
+    ")\n",
+    "\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "497e715e-3f7a-4140-a3ba-34356e473702",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Paul Graham started YC because he and his partners wanted to create an investment firm where they could implement their own ideas and provide the kind of support to startups that they felt was lacking when they were founders themselves. They aimed to not only make seed investments but also assist startups with various aspects of setting up a company, similar to the help they had received from others in the past.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = query_engine.query(\n",
+    "    \"Why did Paul Graham start YC?\",\n",
+    ")\n",
+    "\n",
+    "print(response)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/examples/agent/custom_agent.ipynb b/docs/examples/agent/custom_agent.ipynb
@@ -79,7 +79,7 @@
     "    Task,\n",
     "    AgentChatResponse,\n",
     ")\n",
-    "from typing import Dict, Any, List, Tuple\n",
+    "from typing import Dict, Any, List, Tuple, Optional\n",
     "from llama_index.core.tools import BaseTool, QueryEngineTool\n",
     "from llama_index.core.program import LLMTextCompletionProgram\n",
     "from llama_index.core.output_parsers import PydanticOutputParser\n",
@@ -200,7 +200,7 @@
     "        return {\"count\": 0, \"current_reasoning\": []}\n",
     "\n",
     "    def _run_step(\n",
-    "        self, state: Dict[str, Any], task: Task\n",
+    "        self, state: Dict[str, Any], task: Task, input: Optional[str] = None\n",
     "    ) -> Tuple[AgentChatResponse, bool]:\n",
     "        \"\"\"Run step.\n",
     "\n",