This is a simple implementation of a MADALINE network for identification of letters and digits based on a ttf file. The network is trained using the simple single-layer network. The network is implemented in Python using the numpy library for matrix operations.
-
Create
conda
environment usingenvironment.yml
file:conda env create -f environment.yml
-
Activate the environment:
conda activate madaline
-
Download the ttf file and put it in data/ directory. Exemplary website. Sample Times Roman font is already included in the repository.
-
Generate the training and test data using
ttf
file. Example:python font_generator.py --width 32 --height 32 --x_position 16 --y_position 16 --font_file data/times-ro.ttf --noise_level 0 --output_dir data/train --overwrite --all_letters``` python font_generator.py --width 32 --height 32 --x_position 16 --y_position 16 --font_file data/times-ro.ttf --noise_level 30 --output_dir data/test_noise_30 --overwrite --all_digits
You can also use
--all_letters
or use multiletter string like ABCDEFabcdef to generate only specified letters or digits -
Train and test the network:
python madaline_ocr.py --train_path data/train/ --test_path data/test_90
-
Enjoy the results!
python font_generator.py --width 128 --height 128 --x_position 0 --y_position 0 --font_size 32 --font_file data/times-ro.ttf --noise_level 50 --output_dir data/test --overwrite --letter atyukjxZCVF1678
python3 font_generator.py --width 128 --height 128 --x_position 0 --y_position 0 --font_size 32 --font_file data/times-ro.ttf --noise_level 0 --output_dir data/train --overwrite -
-letter atyukjxZCVF1678
python3 madaline_ocr.py --train_path data/train --test_path data/test --plot_results
The sample letters (noisy and clean) are shown below (for the same font and noise level as in the example above):
- Add all tests