Replies: 1 comment
-
I need help with the documents to be used and then query the vector index for answers. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked other resources
Commit to Help
Example Code
Description
The embeddings are being created in vector index(databricks) and when queried I get the below error.
79 vector_store.add_documents(documents=documents, ids=[str(i) for i in range(1, len(documents) + 1)])
83 #vector_store.add_documents(documents=documents, ids=["2"])
---> 84 results = vector_store.similarity_search(
85 query="Capital of India?", k=1, filter={"source": "https://en.wikipedia.org/wiki/India" target="_blank" rel="noopener noreferrer">https://en.wikipedia.org/wiki/India"}
86 )
87 for doc in results:
88 print(f"* {doc.page_content} [{doc.metadata}]")
Sample:
import requests
from bs4 import BeautifulSoup
from langchain_core.documents import Document
Function to fetch webpage content
def fetch_filtered_content(url):
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Extract only paragraphs
paragraphs = [p.get_text() for p in soup.find_all('p')]
return "\n".join(paragraphs)
Define URLs
urls = [
"https://en.wikipedia.org/wiki/India",
"https://en.wikipedia.org/wiki/Japan",
"https://en.wikipedia.org/wiki/Australia",
]
Fetch content and create documents
documents = []
for idx, url in enumerate(urls, start=1):
page_content = fetch_webpage_content(url)
document = Document(page_content=page_content, metadata={"source": url})
documents.append(document)
Add documents to vector store (example)
vector_store.add_documents(documents=documents, ids=[str(i) for i in range(1, len(documents) + 1)])
System Info
Package Version
aiohappyeyeballs 2.4.4
aiohttp 3.11.10
aiosignal 1.3.1
alembic 1.14.0
annotated-types 0.7.0
anyio 4.7.0
asttokens 2.0.5
astunparse 1.6.3
attrs 24.2.0
azure-core 1.30.2
azure-storage-blob 12.19.1
azure-storage-file-datalake 12.14.0
backcall 0.2.0
beautifulsoup4 4.12.3
black 23.3.0
blinker 1.9.0
boto3 1.34.39
botocore 1.34.39
bs4 0.0.2
cachetools 5.3.3
certifi 2023.7.22
cffi 1.15.1
chardet 4.0.0
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 2.2.1
comm 0.1.2
contourpy 1.0.5
cryptography 41.0.3
cycler 0.11.0
Cython 0.29.32
databricks-ai-bridge 0.0.3
databricks-langchain 0.0.3
databricks-sdk 0.38.0
databricks-vectorsearch 0.43
dataclasses-json 0.6.7
dbus-python 1.2.18
debugpy 1.6.7
decorator 5.1.1
Deprecated 1.2.15
deprecation 2.1.0
distlib 0.3.8
distro 1.7.0
distro-info 1.1+ubuntu0.2
docker 7.1.0
entrypoints 0.4
executing 0.8.3
facets-overview 1.1.1
filelock 3.13.4
Flask 3.1.0
fonttools 4.25.0
frozenlist 1.5.0
gitdb 4.0.11
GitPython 3.1.43
google-api-core 2.18.0
google-auth 2.31.0
google-cloud-core 2.4.1
google-cloud-storage 2.17.0
google-crc32c 1.5.0
google-resumable-media 2.7.1
googleapis-common-protos 1.63.2
graphene 3.4.3
graphql-core 3.2.5
graphql-relay 3.2.0
greenlet 3.1.1
grpcio 1.60.0
grpcio-status 1.60.0
gunicorn 23.0.0
h11 0.14.0
httpcore 1.0.7
httplib2 0.20.2
httpx 0.28.1
httpx-sse 0.4.0
idna 3.4
importlib-metadata 6.0.0
ipyflow-core 0.0.198
ipykernel 6.25.1
ipython 8.15.0
ipython-genutils 0.2.0
ipywidgets 7.7.2
isodate 0.6.1
itsdangerous 2.2.0
jedi 0.18.1
jeepney 0.7.1
Jinja2 3.1.4
jiter 0.8.2
jmespath 0.10.0
joblib 1.2.0
jsonpatch 1.33
jsonpointer 3.0.0
jupyter_client 7.4.9
jupyter_core 5.3.0
keyring 23.5.0
kiwisolver 1.4.4
langchain 0.3.10
langchain-community 0.3.10
langchain-core 0.3.23
langchain-databricks 0.1.1
langchain-openai 0.2.12
langchain-text-splitters 0.3.2
langsmith 0.1.147
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
Mako 1.3.8
Markdown 3.7
MarkupSafe 3.0.2
marshmallow 3.23.1
matplotlib 3.7.2
matplotlib-inline 0.1.6
mlflow 2.18.0
mlflow-skinny 2.18.0
more-itertools 8.10.0
multidict 6.1.0
mypy-extensions 0.4.3
nest-asyncio 1.5.6
numpy 1.26.4
oauthlib 3.2.0
openai 1.57.1
opentelemetry-api 1.28.2
opentelemetry-sdk 1.28.2
opentelemetry-semantic-conventions 0.49b2
orjson 3.10.12
packaging 23.2
pandas 1.5.3
parso 0.8.3
pathspec 0.10.3
patsy 0.5.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.2.1
platformdirs 3.10.0
plotly 5.9.0
prompt-toolkit 3.0.36
propcache 0.2.1
proto-plus 1.24.0
protobuf 4.24.1
psutil 5.9.0
psycopg2 2.9.3
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 14.0.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyccolo 0.0.52
pycparser 2.21
pydantic 2.10.3
pydantic_core 2.27.1
pydantic-settings 2.6.1
Pygments 2.15.1
PyGObject 3.42.1
PyJWT 2.3.0
pyodbc 4.0.38
pyparsing 3.0.9
python-apt 2.4.0+ubuntu3
python-dateutil 2.8.2
python-dotenv 1.0.1
python-lsp-jsonrpc 1.1.1
pytz 2022.7
PyYAML 6.0
pyzmq 23.2.0
regex 2024.11.6
requests 2.31.0
requests-toolbelt 1.0.0
rsa 4.9
s3transfer 0.10.2
scikit-learn 1.3.0
scipy 1.11.1
seaborn 0.12.2
SecretStorage 3.3.1
setuptools 68.0.0
six 1.16.0
smmap 5.0.1
sniffio 1.3.1
soupsieve 2.6
SQLAlchemy 2.0.36
sqlparse 0.5.0
ssh-import-id 5.11
stack-data 0.2.0
statsmodels 0.14.0
tabulate 0.9.0
tenacity 8.2.2
threadpoolctl 2.2.0
tiktoken 0.8.0
tokenize-rt 4.2.1
tornado 6.3.2
tqdm 4.67.1
traitlets 5.7.1
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2022.1
ujson 5.4.0
unattended-upgrades 0.1
urllib3 1.26.16
virtualenv 20.24.2
wadllib 1.3.6
wcwidth 0.2.5
Werkzeug 3.1.3
wheel 0.38.4
wrapt 1.17.0
yarl 1.18.3
zipp 3.11.0
Beta Was this translation helpful? Give feedback.
All reactions