Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC to extract the information in structured form from a identity physical copy to json #9

Open
HarishGangula opened this issue Apr 15, 2024 · 0 comments
Assignees

Comments

@HarishGangula
Copy link
Collaborator

We are exploring the feasibility of automating the extraction of information from physical identity proofs such as Aadhaar card, PAN card, voter ID card, and driving license. The goal of this proof-of-concept (POC) is to develop a solution that can accurately extract relevant information from these documents and represent it in JSON format for further processing.

Objective:

Develop a POC to extract information from physical identity proofs.
Extract key fields such as name, address, date of birth, photograph, document number, etc.
Represent the extracted information in JSON format.
Ensure accuracy and reliability of the extraction process.

Scope:

The POC will focus on extracting information from Aadhaar card, PAN card, voter ID card, and driving license.
Initially, we will target documents issued within a specific region or format to keep the scope manageable.
The POC will not cover non-standard documents.

Approach:

Research and identify suitable libraries or APIs for document processing and OCR.
Develop scripts or applications to process the documents and extract relevant information.
Validate the accuracy of the extracted information against sample data.
Generate JSON output containing the extracted fields in a structured format.
Conduct thorough testing and validation to ensure reliability and accuracy.
Deliverables:

Script or application for extracting information from physical identity proofs.
JSON output files containing extracted information for sample documents.
Documentation detailing the extraction process, dependencies, and usage instructions.
Success Criteria:

The extracted information matches the data on the physical identity proofs with a high degree of accuracy.
JSON output is well-structured and contains all relevant fields in a consistent format.
The extraction process is reliable, efficient, and can handle variations in document formats.
Next Steps:

Review and refine the POC based on feedback and test results.
Explore scalability and performance considerations for handling large volumes of documents.
Plan for further development and implementation based on the success of the POC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

2 participants