PaperScribe is an LLM-powered chatbot designed to assist with reading and understanding PDF documents. It utilizes Streamlit for the user interface, LangChain for text processing, and OpenAI for language model capabilities.
This application offers the following features:
- Upload PDF: Users can upload a PDF document.
- Text Extraction: The uploaded PDF is processed to extract text.
- Text Splitting: Text is split into smaller chunks for efficient processing.
- Embeddings: Text embeddings are generated using OpenAI's embeddings model.
- Vector Storage: Embeddings are stored and indexed using FAISS for quick retrieval.
- Question Answering: Users can ask questions about the PDF content, and PaperScribe provides relevant answers.
- Interactive Interface: The user interacts with PaperScribe through an intuitive web interface powered by Streamlit.
Created by Madhumitha Kolkar, 2024.
- Upload a PDF document.
- Ask questions about the content of the PDF.
- PaperScribe will provide relevant answers based on the text extracted from the PDF.
To run PaperScribe locally, follow these steps:
-
Clone the repository:
-
Install the required dependencies:
- Streamlit
- PyPDF2
- LangChain
- FAISS
- dotenv
- streamlit-extras
- openai
- Create a .env file with your OPENAI_API_KEY and pass it to your llm instance and OpenAIEmbeddings call.
- Run the Streamlit app:
streamlit run main.py