Document Information Extraction using OCR and LLMs

Our project aimed to develop a flexible and efficient solution for extracting information from a variety of document formats. By leveraging Optical Character Recognition (OCR) and Large Language Models (LLMs), such as Llama 3, we explored how advanced contextual capabilities of LLMs could enhance the accuracy and adaptability of information extraction.

Key Features

Multi-format Document Support: Our approach is adaptable to numerous document types (pdf,png,jpg,docs), whether they are scanned images, PDFs, or other formats, without requiring specific pre-training or rule-based configurations for each type.
Optical Character Recognition (OCR) (PaddleOCR): Extracts textual data from scanned documents or images, transforming it into machine-readable content.
Large Language Models (LLMs): Utilizes advanced language models to interpret and analyze the extracted text, offering contextual understanding for better information extraction.

Innovation

The innovation of this method lies in its adaptability. Unlike traditional systems that require extensive rule-based settings or format-specific training, our solution can process various document formats with minimal configuration.
Also Privacy in Our approach we used Local LLMs so there no the third party.

Documentation

For more detailed documentation, please refer to the official project documentation at the following link:

Project Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.streamlit		.streamlit
__pycache__		__pycache__
images		images
Chat.py		Chat.py
Key.json		Key.json
README.md		README.md
Use.py		Use.py
Your_Invoices.py		Your_Invoices.py
account.py		account.py
arial.ttf		arial.ttf
data.json		data.json
home.py		home.py
main.py		main.py
requirements.txt		requirements.txt
utilitis.py		utilitis.py
utilitis1.py		utilitis1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Information Extraction using OCR and LLMs

Key Features

Innovation

Documentation

About

Releases

Packages

Languages

ITSAIDI/Textra_App

Folders and files

Latest commit

History

Repository files navigation

Document Information Extraction using OCR and LLMs

Key Features

Innovation

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages