I need help with the documents to be used and then query the vector index for answers. |
Checked other resources
Commit to Help
Example Code
The embeddings are being created in vector index(databricks) and when queried I get the below error.
import requests
from bs4 import BeautifulSoup
from langchain_core.documents import Document
Function to fetch webpage content
def fetch_filtered_content(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Extract only paragraphs
paragraphs = [p.get_text() for p in soup.find_all('p')]
return "\n".join(paragraphs)
Define URLs
urls = [
Fetch content and create documents
documents = []
for idx, url in enumerate(urls, start=1):
page_content = fetch_webpage_content(url)
document = Document(page_content=page_content, metadata={"source": url})
Add documents to vector store (example)
vector_store.add_documents(documents=documents, ids=[str(i) for i in range(1, len(documents) + 1)])
System Info
