-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created Python Code to set up the entire pipeline for benchmarking faiss-index with DuckDB and PGVector #1984
base: master
Are you sure you want to change the base?
Conversation
adopted hnsw index and fixed incorrect sql queries
…/pyserini into performance_benchmark
Modified the benchmark files to record performance
Refactored Benchmark Scripts + Added Faiss Dense Vector Extractor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please better organize files, e.g., scripts should go in scripts/
, docs should go in docs/
, some files should be checked it, etc.
collections/.gitkeep
Outdated
@@ -1 +0,0 @@ | |||
# This is the default directory for document collections. Placeholder so that directory is kept in git. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file shouldn't be removed.
The entire process may take over a day to complete, depending on your hardware set up. This code will download the index, extract the embedded vectors of the index, build the table in duckdb and run the benchmark. | ||
|
||
## PGVector | ||
PGVector is an extension of PostgreSQL, so you will need to install PostgreSQL and PGVector. Here, it is assumed that you have a PostgreSQL server running on your local machine, and you have the PGVector extension installed and enabled in PostgreSQL. Make sure you supply the correct database configuration in the `db_config.txt` file. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we install Postgres using conda? e.g., https://anaconda.org/anaconda/postgresql
If so, please provide instructions?
Performance benchmark
Performance benchmark: readded content of gitkeep
filename change
This PR created multiple python classes and facilities aimed at making the benchmarking process of DuckDB and PGVector on HNSW indexes easier. The faiss_index_adaptor.py is the base class for duckdb_faiss_index_adaptor and pgvector_faiss_index_adaptor. It uses the faiss_index_extractor class to download and extract a 768 dimension Faiss index, then creates a database table in DuckDB/PGVector. Currently run_benchmark.py invokes the benchmarking process, see experiment_vectordbs.md for detailed instructions.