PDF to Text Chroma Search

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

Requirements

Python 3.x
PyPDF2
chromadb
langchain

Installation

Clone the repository:

git clone https://github.com/your-username/pdf-to-text-chroma-search.git

Install the required dependencies:

pip install PyPDF2 chromadb langchain

Usage

Script 1: Convert PDFs to text, split into chunks, and store in Chroma DB

Place your PDF files in the input directory.
Run the following command to convert the PDFs to text, split them into chunks, and store their vector representations in the Chroma DB:

python write_script.py

Script 2: Load Chroma DB and query user input

Run the following command to load the Chroma DB and query user input:

python read_script.py

Enter your query when prompted.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
input		input
README.md		README.md
read_script.py		read_script.py
write_script.py		write_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Text Chroma Search

Requirements

Installation

Usage

Script 1: Convert PDFs to text, split into chunks, and store in Chroma DB

Script 2: Load Chroma DB and query user input

About

Releases

Packages

Languages

Govind-S-B/pdf-to-text-chroma-search

Folders and files

Latest commit

History

Repository files navigation

PDF to Text Chroma Search

Requirements

Installation

Usage

Script 1: Convert PDFs to text, split into chunks, and store in Chroma DB

Script 2: Load Chroma DB and query user input

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages