Replies: 1 comment
-
I found a similar solved discussion that might be helpful: How I store my Vectors in ChromaDB for later use via RetriverQuery Engine? [1]. This discussion provides a detailed example of storing vectors in ChromaDB and retrieving them for future queries. Here's a step-by-step guide to achieve your objective using Azure OpenAI for embedding and an open-source vector database like Weaviate, Pinecone, or Milvus: 1. Install Required LibrariesFirst, ensure you have the necessary libraries installed: pip install llama_index weaviate-client pinecone-client pymilvus 2. Embed JSON Data Using Azure OpenAIYou need to embed your JSON data using Azure OpenAI before storing it in a vector database. from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import VectorStoreIndex, StorageContext
import weaviate
# Azure OpenAI configuration
api_key = "<api-key>"
azure_endpoint = "https://<your-resource-name>.openai.azure.com/"
api_version = "2023-07-01-preview"
embed_model = AzureOpenAIEmbedding(
model="text-embedding-ada-002",
deployment_name="my-custom-embedding",
api_key=api_key,
azure_endpoint=azure_endpoint,
api_version=api_version,
)
# Sample JSON data
json_list = [
{
"name": "John Doe",
"age": 25,
"major": "Computer Science",
"email": "[email protected]",
"address": "123 Main St",
"city": "New York",
"state": "NY",
"country": "USA",
"phone": "+1 123-456-7890",
"occupation": "Software Engineer",
},
# ... other entries ...
]
# Convert JSON data to embeddings
embeddings = [embed_model.embed(json.dumps(entry)) for entry in json_list] 3. Store Embeddings in a Vector DatabaseChoose a vector database to store the embeddings. Here are examples for Weaviate, Pinecone, and Milvus: Weaviate client = weaviate.Client("https://<cluster-id>.semi.network/")
vector_store = WeaviateVectorStore(weaviate_client=client)
# Store embeddings
for embedding in embeddings:
vector_store.add_vector(embedding) Pinecone from pinecone import PineconeClient
client = PineconeClient(api_key="<api-key>", environment="us-west1-gcp")
vector_store = PineconeVectorStore(client=client)
# Store embeddings
for embedding in embeddings:
vector_store.add_vector(embedding) Milvus from pymilvus import connections, Collection
connections.connect("default", host="foo.vectordb.zillizcloud.com", port="19530")
collection = Collection("your_collection_name")
# Store embeddings
for embedding in embeddings:
collection.insert([embedding]) 4. Retrieve and Query Using LlamaIndexAfter storing the embeddings, you can retrieve and query them using LlamaIndex. from llama_index.core.query_engine import JSONQueryEngine
from llama_index.llms.openai import OpenAI
# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo")
# Create the JSON Query Engine instance
json_query_engine = JSONQueryEngine(
json_value=json_list,
json_schema=None, # Optional: provide a JSON schema for better results
llm=llm,
verbose=True,
)
# Example question
question = "What is the average age of the individuals in the dataset?"
# Query the engine
answer = json_query_engine.query(question)
# Display the answer
print(f"Question: {question}")
print(f"Answer: {answer}") This setup allows you to embed JSON data using Azure OpenAI, store it in a vector database, and query it using LlamaIndex. Make sure to replace placeholders with your actual data and configurations. For more details on integrating with specific vector databases, refer to the LlamaIndex documentation [2]. |
Beta Was this translation helpful? Give feedback.
-
I have Json_list like shown below example. My objective is embed via AzureOpenAI and store into to vector DB. Once I store in to the vector DB, I recall and use for query purposes.
Can you share a sample code please?
https://docs.llamaindex.ai/en/stable/examples/workflow/JSONalyze_query_engine/#create-json-list
Beta Was this translation helpful? Give feedback.
All reactions