Fast MRZ

This repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.

️Features:

Detects and extracts the MRZ region from document images
Contour detection to accurately identify the MRZ area
Custom trained models using ONNX
Contains checksum logics for data validation
Outputs the extracted MRZ region as text/json for further processing or analysis

🛠️Built With

⚙️Installation

Install Tesseract OCR engine. And set PATH variable with the executable.

Install fastmrz

pip install fastmrz

This can be done through conda too if you prefer.

conda create -n fastmrz tesseract -c conda-forge
conda activate fastmrz

Copy the mrz.traineddata file from the tessdata folder of the repository into the tessdata folder of the Tesseract installation on YOUR MACHINE

💡Example

from fastmrz import FastMRZ
import json

fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg")
print("JSON:")
print(json.dumps(passport_mrz, indent=4))

print("\n")

passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg", raw=True)
print("TEXT:")
print(passport_mrz)

OUTPUT:

JSON:
{
    "mrz_type": "TD3",
    "document_type": "P",
    "country_code": "GBR",
    "surname": "PUDARSAN",
    "given_name": "HENERT",
    "document_number": "707797979",
    "nationality": "GBR",
    "date_of_birth": "1995-05-20",
    "sex": "M",
    "date_of_expiry": "2017-04-22",
    "status": "SUCCESS"
}


TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00

📃MRZ Wiki

MRZ Types & Format

The standard for MRZ code is strictly regulated and has to comply with Doc 9303. Machine Readable Travel Documents published by the International Civil Aviation Organization.

There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:

TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.

Now, based on the example of a national passport, let us take a closer look at the MRZ composition.

🗹ToDo

⚖️License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

⭐Show your support

Give a ⭐️ if this project helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
data		data
docs		docs
fastmrz		fastmrz
tessdata		tessdata
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast MRZ

🛠️Built With

⚙️Installation

💡Example

📃MRZ Wiki

🗹ToDo

⚖️License

⭐Show your support

About

Releases 4

Contributors 2

Languages

License

sivakumar-mahalingam/fastmrz

Folders and files

Latest commit

History

Repository files navigation

Fast MRZ

🛠️Built With

⚙️Installation

💡Example

📃MRZ Wiki

🗹ToDo

⚖️License

⭐Show your support

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Contributors 2

Languages