Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase #12

Merged
merged 81 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
9561505
Revert "An alternative approach that doesnt use nest_asyncio"
ryanpeach Oct 27, 2023
98743af
add namespaced vector stores to storage context (#8753)
logan-markewich Nov 7, 2023
61c13c4
GPT4V load images from local directory and some refactoring (#8758)
hatianzhang Nov 7, 2023
0ee9a52
[version] bump version to 0.8.64.post1 (#8751)
jerryjliu Nov 7, 2023
0d55c7c
JSON mode vs. function calling notebook (#8739)
Disiok Nov 7, 2023
de46879
change log to print (#8760)
logan-markewich Nov 8, 2023
ba04c93
CogniSwitch Integration (#8466)
CogniJT Nov 8, 2023
55a3a1a
add demo showing assistant agent with vector store (#8766)
jerryjliu Nov 8, 2023
dbe19e1
add back rag from scratch (#8764)
jerryjliu Nov 8, 2023
6d77eee
Bugfix and naming for QueryFusionIndex (#8757)
ryanpeach Nov 8, 2023
6b3f1ea
Initialize Multi Modal Embedding Base (#8762)
hatianzhang Nov 8, 2023
e0c2288
docs: openllmetry integration (#8774)
nirga Nov 8, 2023
b37c2d5
Use tokenizer apply chat template for HuggingFaceLLM (#8755)
geodavic Nov 8, 2023
078272b
Multi Modal Vector Index (#8709)
hatianzhang Nov 8, 2023
114acfd
Fixed embeddings `__all__` (#8779)
jamesbraza Nov 8, 2023
148011c
Llama 633 refactor openaiagent (#8738)
nerdai Nov 8, 2023
113a3ae
Add truncate option to TEI (#8778)
logan-markewich Nov 8, 2023
0f2c703
nit: add parallel fn calling nb to docs (#8783)
jerryjliu Nov 8, 2023
de133bf
[version] bump version to 0.8.65 (#8781)
nerdai Nov 8, 2023
2507be4
improve multimodal retrieval and delete (#8782)
logan-markewich Nov 8, 2023
810e654
Update building_a_chatbot.md (#8792)
jeffxtang Nov 9, 2023
32793c0
Support parallel function calling in pydantic program (#8793)
Disiok Nov 9, 2023
18b15b6
Add Multi Modal Retriever (#8787)
hatianzhang Nov 9, 2023
6a0beb4
Astra DB documentation updates, and ordering of indices (#8777)
erichare Nov 9, 2023
dc19134
Minor fix for _build_image_vector_store_query (#8796)
hatianzhang Nov 9, 2023
e5c767f
support un-listable fsspecs (#8795)
logan-markewich Nov 9, 2023
830baf4
Include other powerpoint extension in SimpleDirectoryReader (#8736)
yashodeepdeshmukh Nov 9, 2023
eac1bd9
Fix SQL commit using enging.begin(). (#8801)
Nov 9, 2023
9fdedc7
Latest `huggingface_hub`'s recommended model (#8784)
jamesbraza Nov 9, 2023
72adbfa
Add Google PaLM Embeddings. (#8763)
ravi03071991 Nov 9, 2023
b400c6a
notebook: stress test gpt-4/claude w/ long docs with hidden context (…
jerryjliu Nov 9, 2023
9e51e0b
nit fix to multimodalretriever name (#8811)
jerryjliu Nov 9, 2023
49323d5
fix retreiver name (#8813)
hatianzhang Nov 9, 2023
7448ea3
[version] bump to v0.8.66 (#8816)
logan-markewich Nov 9, 2023
2e5b5c6
fix retriever in citation query engine (#8818)
logan-markewich Nov 9, 2023
4d58671
BUG fix: read image url field directly from ImageDocument field (#8821)
nerdai Nov 9, 2023
76f9153
Accumulate response synthesizer: use SelectorPromptTemplate for chat …
bmax Nov 10, 2023
cf46113
make PydanticProgramExtractor importable from extractors (#8773)
smowden Nov 10, 2023
8cc3825
Parallel func calling cleanup work (#8808)
nerdai Nov 10, 2023
f4429f8
Cassandra vector-store notebook updates (#8805)
hemidactylus Nov 10, 2023
ebeb499
Updated Cogniswitch docs (#8809)
saiCogniswitch Nov 10, 2023
636d302
Bump pyarrow from 14.0.0 to 14.0.1 (#8820)
dependabot[bot] Nov 10, 2023
95e022f
Adding routers to high level concepts (#8810)
aaronjimv Nov 10, 2023
a72c0e0
Change mutli_modal_base naming (#8840)
hatianzhang Nov 10, 2023
2feaa09
remove commonly unpicklable entries (#8838)
logan-markewich Nov 10, 2023
5300f52
Advanced Multi Modal Retrieval Example (#8822)
hatianzhang Nov 10, 2023
806b1f6
Add multi-modal use case section (#8823)
jerryjliu Nov 10, 2023
95c5e69
[version] bump version to 0.8.67 (#8841)
jerryjliu Nov 10, 2023
9b22588
Fix ImageNode type from NodeWithScore for SimpleMultiModalQueryEngine…
hatianzhang Nov 10, 2023
36cae40
Feature/citation metadata (#8722)
jordanparker6 Nov 10, 2023
746c4bf
Update MM retrieval notebook using new MM framework (#8845)
hatianzhang Nov 10, 2023
7d1df06
Disable legacy github actions (#8848)
Disiok Nov 10, 2023
a7de36f
Add JinaEmbedding class (#8704)
JoanFM Nov 10, 2023
1115e2a
add retrieval API benchmark (#8850)
jerryjliu Nov 11, 2023
12804a2
fix Replicate multi-modal LLM + notebook (#8854)
jerryjliu Nov 11, 2023
7960a3a
fix: paths treated as hidden (#8860)
EmanuelCampos Nov 11, 2023
4b123fd
Quick fix Replicate MultiModal example (#8861)
hatianzhang Nov 11, 2023
bb92a99
one line fix for wrapping of custom function tools to create OpenAI a…
JohannesHa Nov 11, 2023
2c6c39b
fix openai assistant tool creation + retrieval notebook (#8862)
logan-markewich Nov 11, 2023
2ccd935
OpenAI client improvements (#8819)
logan-markewich Nov 11, 2023
8df6806
openai assistant agent + advanced retrieval cookbook (#8863)
jerryjliu Nov 11, 2023
bb2346c
[version] bump version to 0.8.68 (#8864)
jerryjliu Nov 11, 2023
63b90c6
fix error that occurs when defining custom_path in download_loader. (…
Dilbarjot Nov 11, 2023
316b298
[Docs] Fixed broken links in the Customization tutorial docs (#8882)
anupj Nov 13, 2023
7d32b1f
Subclass getstate from pydantic's BaseModel to fix cannot pickle and …
trducng Nov 13, 2023
f5f5e56
fix grammar in SQLIndexDemo.ipynb (#8877)
ziliangpeng Nov 13, 2023
1cadc26
fix pickling again (#8889)
logan-markewich Nov 13, 2023
0f94f2f
Added retry policy for LLMRails Embeddings (#8872)
anar2706 Nov 13, 2023
2579176
Update evaluation.ipynb fixed refine missing service_context (#8870)
FarisHijazi Nov 13, 2023
9206452
remove stray print (#8891)
logan-markewich Nov 13, 2023
1122324
fix: Qdrant class_name property (#8892)
Anush008 Nov 13, 2023
98a024b
Create fleet_libraries_context.md (#8849)
adrwz Nov 13, 2023
55a45a8
Init Chroma and LLamaIndex Multi-Modal Demo (#8897)
hatianzhang Nov 14, 2023
dfb6828
[version] bump to v0.8.69 (#8899)
logan-markewich Nov 14, 2023
3f721a8
Updated limit for deleting weaviate objects (#8887)
omkargadute Nov 14, 2023
36384dc
bugfix: added entry for callback to clear _node_data_buffer (#8867)
azurewtl Nov 14, 2023
b2ad171
Remove remnant of Pydantic's state in getstate (#8902)
trducng Nov 14, 2023
e25aef9
[version] bump to 0.8.69.post1 (#8903)
logan-markewich Nov 14, 2023
a0263f2
[version] bump to v0.8.69.post2 (#8915)
logan-markewich Nov 14, 2023
603612a
Update base.py (#8917)
dennyglee Nov 14, 2023
1bbe589
Syncing changelog
Feb 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions .github/workflows/dev_docs.yml

This file was deleted.

22 changes: 0 additions & 22 deletions .github/workflows/publish_release_gpt_index.yml

This file was deleted.

Binary file added docs/_static/integrations/openllmetry.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/community/integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,5 +83,6 @@ maxdepth: 1
integrations/chatgpt_plugins.md
Poe <https://github.com/poe-platform/poe-protocol/tree/main/llama_poe>
Airbyte <https://airbyte.com/tutorials/airbyte-and-llamaindex-elt-and-chat-with-your-data-warehouse-without-writing-sql>
integrations/fleet_libraries_context.md

```
194 changes: 194 additions & 0 deletions docs/community/integrations/fleet_libraries_context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Fleet Context Embeddings - Building a Hybrid Search Engine for the Llamaindex Library

In this guide, we will be using Fleet Context to download the embeddings for LlamaIndex's documentation and build a hybrid dense/sparse vector retrieval engine on top of it.

<br><br>

## Pre-requisites

```
!pip install llama-index
!pip install --upgrade fleet-context
```

```
import os
import openai

os.environ["OPENAI_API_KEY"] = "sk-..." # add your API key here!
openai.api_key = os.environ["OPENAI_API_KEY"]
```

<br><br>

## Download Embeddings from Fleet Context

We will be using Fleet Context to download the embeddings for the
entirety of LlamaIndex\'s documentation (\~12k chunks, \~100mb of
content). You can download for any of the top 1220 libraries by
specifying the library name as a parameter. You can view the full list
of supported libraries [here](https://fleet.so/context) at the bottom of
the page.

We do this because Fleet has built a embeddings pipeline that preserves
a lot of important information that will make the retrieval and
generation better including position on page (for re-ranking), chunk
type (class/function/attribute/etc), the parent section, and more. You
can read more about this on their [Github
page](https://github.com/fleet-ai/context/tree/main).

```python
from context import download_embeddings

df = download_embeddings("llamaindex")
```

**Output**:

```shell
100%|██████████| 83.7M/83.7M [00:03<00:00, 27.4MiB/s]
id \
0 e268e2a1-9193-4e7b-bb9b-7a4cb88fc735
1 e495514b-1378-4696-aaf9-44af948de1a1
2 e804f616-7db0-4455-9a06-49dd275f3139
3 eb85c854-78f1-4116-ae08-53b2a2a9fa41
4 edfc116e-cf58-4118-bad4-c4bc0ca1495e
```

```python
# Show some examples of the metadata
df["metadata"][0]
display(Markdown(f"{df['metadata'][8000]['text']}"))
```

**Output**:

```shell
classmethod from_dict(data: Dict[str, Any], kwargs: Any) → Self classmethod from_json(data_str: str, kwargs: Any) → Self classmethod from_orm(obj: Any) → Model json(, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True*, dumps_kwargs: Any) → unicode Generate a JSON representation of the model, include and exclude arguments as per dict().
```

<br><br>

## Create Pinecone Index for Hybrid Search in LlamaIndex

We\'re going to create a Pinecone index and upsert our vectors there so
that we can do hybrid retrieval with both sparse vectors and dense
vectors. Make sure you have a [Pinecone account](https://pinecone.io)
before you proceed.

```python
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
```

```python
import pinecone

api_key = "..." # Add your Pinecone API key here
pinecone.init(
api_key=api_key, environment="us-east-1-aws"
) # Add your db region here
```

```python
# Fleet Context uses the text-embedding-ada-002 model from OpenAI with 1536 dimensions.

# NOTE: Pinecone requires dotproduct similarity for hybrid search
pinecone.create_index(
"quickstart-fleet-context",
dimension=1536,
metric="dotproduct",
pod_type="p1",
)

pinecone.describe_index(
"quickstart-fleet-context"
) # Make sure you create an index in pinecone
```

<br>

```python
from llama_index.vector_stores import PineconeVectorStore

pinecone_index = pinecone.Index("quickstart-fleet-context")
vector_store = PineconeVectorStore(pinecone_index, add_sparse_vector=True)
```

<br><br>

## Batch upsert vectors into Pinecone

Pinecone recommends upserting 100 vectors at a time. We\'re going to do that after we modify the format of the data a bit.

```python
import random
import itertools


def chunks(iterable, batch_size=100):
"""A helper function to break an iterable into chunks of size batch_size."""
it = iter(iterable)
chunk = tuple(itertools.islice(it, batch_size))
while chunk:
yield chunk
chunk = tuple(itertools.islice(it, batch_size))


# generator that generates many (id, vector, metadata, sparse_values) pairs
data_generator = map(
lambda row: {
"id": row[1]["id"],
"values": row[1]["values"],
"metadata": row[1]["metadata"],
"sparse_values": row[1]["sparse_values"],
},
df.iterrows(),
)

# Upsert data with 1000 vectors per upsert request
for ids_vectors_chunk in chunks(data_generator, batch_size=100):
print(f"Upserting {len(ids_vectors_chunk)} vectors...")
pinecone_index.upsert(vectors=ids_vectors_chunk)
```

<br><br>

## Build Pinecone Vector Store in LlamaIndex

Finally, we\'re going to build the Pinecone vector store via LlamaIndex
and query it to get results.

```python
from llama_index import VectorStoreIndex
from IPython.display import Markdown, display
```

```python
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
```

<br><br>

## Query Your Index!

```python
query_engine = index.as_query_engine(
vector_store_query_mode="hybrid", similarity_top_k=8
)
response = query_engine.query("How do I use llama_index SimpleDirectoryReader")
```

```python
display(Markdown(f"<b>{response}</b>"))
```

**Output**:

```shell
<b>To use the SimpleDirectoryReader in llama_index, you need to import it from the llama_index library. Once imported, you can create an instance of the SimpleDirectoryReader class by providing the directory path as an argument. Then, you can use the `load_data()` method on the SimpleDirectoryReader instance to load the documents from the specified directory.</b>
```
Loading
Loading