JSON list store into VectorDB and query later. #15684

mraguth · 2024-08-28T04:37:24Z

mraguth
Aug 28, 2024

I have Json_list like shown below example. My objective is embed via AzureOpenAI and store into to vector DB. Once I store in to the vector DB, I recall and use for query purposes.

Can you share a sample code please?

How to save json_list to vectordb (what are the best vectordbs to store json_list. Preferbly opensource vectordbs)
Embed before store
Once we store, how to retreive and use llama_index for query puproses. Thank you so much for your help.

https://docs.llamaindex.ai/en/stable/examples/workflow/JSONalyze_query_engine/#create-json-list

@dosu · 2024-08-28T04:41:10Z

dosubot[bot]
bot Aug 28, 2024

I found a similar solved discussion that might be helpful: How I store my Vectors in ChromaDB for later use via RetriverQuery Engine? ^[1]. This discussion provides a detailed example of storing vectors in ChromaDB and retrieving them for future queries.

Here's a step-by-step guide to achieve your objective using Azure OpenAI for embedding and an open-source vector database like Weaviate, Pinecone, or Milvus:

1. Install Required Libraries

First, ensure you have the necessary libraries installed:

pip install llama_index weaviate-client pinecone-client pymilvus

2. Embed JSON Data Using Azure OpenAI

You need to embed your JSON data using Azure OpenAI before storing it in a vector database.

from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import VectorStoreIndex, StorageContext
import weaviate

# Azure OpenAI configuration
api_key = "<api-key>"
azure_endpoint = "https://<your-resource-name>.openai.azure.com/"
api_version = "2023-07-01-preview"

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="my-custom-embedding",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# Sample JSON data
json_list = [
    {
        "name": "John Doe",
        "age": 25,
        "major": "Computer Science",
        "email": "[email protected]",
        "address": "123 Main St",
        "city": "New York",
        "state": "NY",
        "country": "USA",
        "phone": "+1 123-456-7890",
        "occupation": "Software Engineer",
    },
    # ... other entries ...
]

# Convert JSON data to embeddings
embeddings = [embed_model.embed(json.dumps(entry)) for entry in json_list]

3. Store Embeddings in a Vector Database

Choose a vector database to store the embeddings. Here are examples for Weaviate, Pinecone, and Milvus:

Weaviate

client = weaviate.Client("https://<cluster-id>.semi.network/")
vector_store = WeaviateVectorStore(weaviate_client=client)

# Store embeddings
for embedding in embeddings:
    vector_store.add_vector(embedding)

Pinecone

from pinecone import PineconeClient

client = PineconeClient(api_key="<api-key>", environment="us-west1-gcp")
vector_store = PineconeVectorStore(client=client)

# Store embeddings
for embedding in embeddings:
    vector_store.add_vector(embedding)

Milvus

from pymilvus import connections, Collection

connections.connect("default", host="foo.vectordb.zillizcloud.com", port="19530")
collection = Collection("your_collection_name")

# Store embeddings
for embedding in embeddings:
    collection.insert([embedding])

4. Retrieve and Query Using LlamaIndex

After storing the embeddings, you can retrieve and query them using LlamaIndex.

from llama_index.core.query_engine import JSONQueryEngine
from llama_index.llms.openai import OpenAI

# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo")

# Create the JSON Query Engine instance
json_query_engine = JSONQueryEngine(
    json_value=json_list,
    json_schema=None,  # Optional: provide a JSON schema for better results
    llm=llm,
    verbose=True,
)

# Example question
question = "What is the average age of the individuals in the dataset?"

# Query the engine
answer = json_query_engine.query(question)

# Display the answer
print(f"Question: {question}")
print(f"Answer: {answer}")

This setup allows you to embed JSON data using Azure OpenAI, store it in a vector database, and query it using LlamaIndex. Make sure to replace placeholders with your actual data and configurations.

For more details on integrating with specific vector databases, refer to the LlamaIndex documentation ^[2].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON list store into VectorDB and query later. #15684

{{title}}

Replies: 1 comment

{{title}}

Select a reply

JSON list store into VectorDB and query later. #15684

mraguth Aug 28, 2024

Replies: 1 comment

dosubot[bot] bot Aug 28, 2024

1. Install Required Libraries

2. Embed JSON Data Using Azure OpenAI

3. Store Embeddings in a Vector Database

4. Retrieve and Query Using LlamaIndex

mraguth
Aug 28, 2024

dosubot[bot]
bot Aug 28, 2024