Search the NIV and NKJV semantically here.
Generates OpenAI Ada v2 and all-mpnet-base-v2
embeddings for the NIV and NKJV Bible translations, and makes it searchable with a simple API.
make init
make embeddings
Important: You must have an OPENAI_API_KEY
in a .env
file in the project directory when running this command.
make embeddings
does two things:
- Generates OpenAI
text-embedding-ada-002
embeddings using the OpenAI API. - Generates
SentenceTransformer
embeddings using theST_EMBEDDING_MODEL
defined inconfig.py
. You can change this to any encoder model from their docs. Default:all-mpnet-base-v2
.
This takes around 20-25 minutes.
You can also download the embeddings I've generated for NIV, NKJV.
To build a Pinecone index, you will need a PINECONE_API_KEY
and PINECONE_ENV
in your .env
file. Then, simply run:
make index
After make embeddings
,
from search import SearchEngine, EmbeddingType
engine = SearchEngine({
"NKJV": "data/NKJV_clean.parquet",
"NIV": "data/NIV_clean.parquet"
})
top_verses = engine.search(
"what is the meaning of life?",
translation="NKJV",
emb_type=EmbeddingType.SentenceTransfomer,
only_text=True
)
# top_verses output
[
('For what has man for all his labor and for the striving of his heart with which he has toiled under the sun?',),
('to cast out all your enemies from before you as the LORD has spoken.',),
('to know the love of Christ which passes knowledge; that you may be filled with all the fullness of God.',),
('to do whatever Your hand and Your purpose determined before to be done.',),
('For to me to live is Christ and to die is gain.',),
('and to make all see what is the fellowship of the mystery which from the beginning of the ages has been hidden in God who created all things through Jesus Christ;',),
('knowing that from the Lord you will receive the reward of the inheritance; for you serve the Lord Christ.',),
('To do justice to the fatherless and the oppressed That the man of the earth may oppress no more.',),
('That Your way may be known on earth Your salvation among all nations.',),
('being filled with the fruits of righteousness which are by Jesus Christ to the glory and praise of God.',)
]
If you are self-hosting the app on a somewhat powerful machine, you can set PINECONE_DEPLOYMENT = False
in app.py
, This uses the parquet files as the embedding store. Otherwise, if you're running on a tiny pod with limited computed (e.g. Streamlit Cloud), then you'll want to make use of a vector database, in this case, Pinecone.
- Bible versions (NIV, NKJV) are sourced from my-bible-study
- Inspired by Christoffer Rehn's
bible-search
.
Soli Deo Gloria.