Try To Finetune CRNN_VGG16_BN with custom font, but the infer was so bad #1285

dhifafaz · 2023-08-23T04:03:10Z

dhifafaz
Aug 23, 2023

What I want to do

I want to finetune the mentioned model on custom fonts cause the document i want to extract using some fonts that looks like a typing machine.
I want to finetune the model so it will be so much better to achieve almost perfect ocr character to minimize the some miss recognized character, because the document is will be use as a source of digital knowledge.

What I have done and the problem

I try to use Text Recognition Data Generator package to produce the dataset, its arround 150000 of data. The command that i run is this

trdg -c 108000 -dt id-2.txt -fd /doc-ext/docTR-finetune/fonts -w 3 -b 0 -k 3 -rk -rs -tc '#000000,#888888' --output_dir /doc-ext/docTR-finetune/dataset-text-reco-v3/train_set/images -f 32 -t 20 -sw 0 -na 2 --margins 3,3,3,3

I do the same things for the val_set with much lower amount. And i try to fintuned it with this command on doctr

python references/recognition/train_pytorch.py crnn_vgg16_bn --train_path /doc-ext/docTR-finetune/dataset-v2/train_set --val_path /doc-ext/docTR-finetune/dataset-v2/val_set --epochs 450 -b 64 --device 0 --input_size 32 --pretrained --wb --name text-uu-ocr-custom-data-v2

But because the missing image (so it's TRDG's fault) the training process stopped in the middle. But it manage to save the best model. And when i try to use it with a end-to-end doctr extraction, but with my finetuned model.pt, the results are so different than it should. This the way i load the finetuned model

reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False, vocab=VOCABS["french"])
reco_params = torch.load(self.reco_arch_ocr, map_location="cuda:1")
reco_model.load_state_dict(reco_params)
self.doc_ocr = ocr_predictor(det_arch=self.det_arch_ocr, reco_arch=reco_model, pretrained=True)

ps: i also try to change the vocab parameter and even not to use it. It still give me the bad result, while the exact and partial match is almost 90% with loss 0.05 on training process.

Because of the issue with TRDG. I try to use the wordgeneration from doctr when starting the train command. And attach a list of custom fonts. The training is completed with exact and partial match almost 90% too and loss value 0.04. This is the training command:

python references/recognition/train_pytorch.py crnn_vgg16_bn --pretrained --name text-ocr-uu-font-v2-1 --epochs 350 --train-samples 945 --val-samples 105 --device 0 --input_size 32 --max-chars 32 -b 64 --vocab latin --wb --font 'BKMNOS.ttf,OpenSans-SemiBold.ttf,OpenSans_SemiCondensed-Light.ttf,OpenSans-Italic.ttf,OpenSans_Condensed-LightItalic.ttf,OpenSans_Condensed-ExtraBold.ttf,OpenSans_Condensed-Italic.ttf,OpenSans_SemiCondensed-SemiBold.ttf,OpenSans_Condensed-SemiBold.ttf,OpenSans_SemiCondensed-ExtraBold.ttf,OpenSans-Regular.ttf,OpenSans_SemiCondensed-SemiBoldItalic.ttf,OpenSans_Condensed-Medium.ttf,OpenSans-Medium.ttf,times new roman bold italic.ttf,OpenSans_SemiCondensed-BoldItalic.ttf,OpenSans-MediumItalic.ttf,CourierPrime-Bold.ttf,OpenSans-ExtraBold.ttf,OpenSans_SemiCondensed-ExtraBoldItalic.ttf,OpenSans_Condensed-Regular.ttf,CourierPrime-Italic.ttf,OpenSans_Condensed-ExtraBoldItalic.ttf,OpenSans_SemiCondensed-Bold.ttf,bookman old style fett kursiv.ttf,times new roman.ttf,OpenSans_Condensed-BoldItalic.ttf,OpenSans_SemiCondensed-Italic.ttf,OpenSans_Condensed-MediumItalic.ttf,OpenSans-Bold.ttf,OpenSans_SemiCondensed-LightItalic.ttf,OpenSans_SemiCondensed-Regular.ttf,OpenSans_SemiCondensed-MediumItalic.ttf,OpenSans_Condensed-Bold.ttf,OpenSans-LightItalic.ttf,OpenSans-SemiBoldItalic.ttf,OpenSans-BoldItalic.ttf,CourierPrime-Regular.ttf,bookman old style.ttf,times new roman italic.ttf,OpenSans_SemiCondensed-Medium.ttf,CourierPrime-BoldItalic.ttf,OpenSans-Light.ttf,times new roman bold.ttf,OpenSans-ExtraBoldItalic.ttf,OpenSans_Condensed-Light.ttf,OpenSans_Condensed-SemiBoldItalic.ttf

The result with the same way to load the finetuned model as i mention earlier are pretty much the same. It is so different from the pdf that i tested. This is the result that i get

this the pdf document

What confuse me is, when i try to infer with only text recognition model only on my generated data that look like this

with the code that i use look like this

import torch
from doctr.models import crnn_vgg16_bn, recognition_predictor
from doctr.datasets import VOCABS
from doctr.io import DocumentFile
reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False, vocab=VOCABS["latin"])
reco_params = torch.load('/data-model/doc-ext/text_reco_doctr/text-uu-ocr-custom-data-v3-1.pt', map_location="cuda:1")
reco_model.load_state_dict(reco_params)
predictor = recognition_predictor(arch=reco_model, pretrained=True)

And i calculate the result from 1000 data with CER (Character Error Rate) from evaluate package on HF, it gives error value only 0.05.

My question is what is happening in here ? how to reproduce the current CRNN_VGG_16 capability with my custom fonts, i need to know where is my fault.

Thank you so much!

felixT2K · 2023-08-23T08:22:03Z

felixT2K
Aug 23, 2023

Hi @dhifafaz 👋,

Firstly i suggest to use crnn_mobilenet_v3_large instead of crnn_vgg16_bn it is faster and more accure.
The WordGenerator is still let's say experimential and more for internal debugging used currently.

The best option would be to provide some real samples for fine tuning (~2K train / 400 val) by keeping the french vocab (this will not reset the classifier head).

Could you please attach some samples from the try with the WordGenerator you can do this by passing --show-samples to the command

BTW: no need to augment your images in front. We do this internal: (please check that you are on the actual docTR version from main branch -> 0.7.0)

[
                    T.Resize((args.input_size, 4 * args.input_size), preserve_aspect_ratio=True),
                    # Augmentations
                    T.RandomApply(T.ColorInversion(), 0.1),
                    RandomGrayscale(p=0.1),
                    RandomPhotometricDistort(p=0.1),
                    T.RandomApply(T.RandomShadow(), p=0.4),
                    T.RandomApply(T.GaussianNoise(mean=0, std=0.1), 0.1),
                    T.RandomApply(GaussianBlur(3), 0.3),
                    RandomPerspective(distortion_scale=0.2, p=0.3),
]

8 replies

dhifafaz Aug 23, 2023
Author

yes i have torch on my conda environtment that i install through pip

felixT2K Aug 23, 2023

you mean pytorch is installed with conda install ... correct ?

dhifafaz Aug 23, 2023
Author

humm for this i use pip install torch on conda environtment

felixT2K Aug 23, 2023

Definitly not recommended :)

Please uninstall all the pip installed pytorch libs and do it again with:

https://pytorch.org/get-started/locally/

dhifafaz Aug 23, 2023
Author

okay i will try to do that, and come back as soon as possible if found any error, thanks btw

dhifafaz · 2023-08-23T14:40:13Z

dhifafaz
Aug 23, 2023
Author

Hai thank you for your answer,

This is my current doctr version and i install it from the main branch
I still not able to show the samples using this command

python references/recognition/train_pytorch.py crnn_vgg16_bn --show-samples --train-samples 10 --val-samples 3 --epochs 5 --font 'BKMNOS.ttf,OpenSans-SemiBold.ttf,OpenSans_SemiCondensed-Light.ttf,OpenSans-Italic.ttf,OpenSans_Condensed-LightItalic.ttf,OpenSans_Condensed-ExtraBold.ttf,OpenSans_Condensed-Italic.ttf,OpenSans_SemiCondensed-SemiBold.ttf,OpenSans_Condensed-SemiBold.ttf,OpenSans_SemiCondensed-ExtraBold.ttf,OpenSans-Regular.ttf,OpenSans_SemiCondensed-SemiBoldItalic.ttf,OpenSans_Condensed-Medium.ttf,OpenSans-Medium.ttf,times new roman bold italic.ttf,OpenSans_SemiCondensed-BoldItalic.ttf,OpenSans-MediumItalic.ttf,CourierPrime-Bold.ttf,OpenSans-ExtraBold.ttf,OpenSans_SemiCondensed-ExtraBoldItalic.ttf,OpenSans_Condensed-Regular.ttf,CourierPrime-Italic.ttf,OpenSans_Condensed-ExtraBoldItalic.ttf,OpenSans_SemiCondensed-Bold.ttf,bookman old style fett kursiv.ttf,times new roman.ttf,OpenSans_Condensed-BoldItalic.ttf,OpenSans_SemiCondensed-Italic.ttf,OpenSans_Condensed-MediumItalic.ttf,OpenSans-Bold.ttf,OpenSans_SemiCondensed-LightItalic.ttf,OpenSans_SemiCondensed-Regular.ttf,OpenSans_SemiCondensed-MediumItalic.ttf,OpenSans_Condensed-Bold.ttf,OpenSans-LightItalic.ttf,OpenSans-SemiBoldItalic.ttf,OpenSans-BoldItalic.ttf,CourierPrime-Regular.ttf,bookman old style.ttf,times new roman italic.ttf,OpenSans_SemiCondensed-Medium.ttf,CourierPrime-BoldItalic.ttf,OpenSans-Light.ttf,times new roman bold.ttf,OpenSans-ExtraBoldItalic.ttf,OpenSans_Condensed-Light.ttf,OpenSans_Condensed-SemiBoldItalic.ttf'

this is what i get, the process just stop there

@felixT2K Hai i'm back, im sorry i still get the same problem, the cuda and all of that is freshly installed, because i'm using a different envitontment...
And this is the result of collect_env.py

0 replies

dhifafaz · 2023-08-23T15:04:11Z

dhifafaz
Aug 23, 2023
Author

Hi @dhifafaz 👋,

Firstly i suggest to use crnn_mobilenet_v3_large instead of crnn_vgg16_bn it is faster and more accure. The WordGenerator is still let's say experimential and more for internal debugging used currently.

The best option would be to provide some real samples for fine tuning (~2K train / 400 val) by keeping the french vocab (this will not reset the classifier head).

Could you please attach some samples from the try with the WordGenerator you can do this by passing --show-samples to the command

BTW: no need to augment your images in front. We do this internal: (please check that you are on the actual docTR version from main branch -> 0.7.0)
[
                    T.Resize((args.input_size, 4 * args.input_size), preserve_aspect_ratio=True),
                    # Augmentations
                    T.RandomApply(T.ColorInversion(), 0.1),
                    RandomGrayscale(p=0.1),
                    RandomPhotometricDistort(p=0.1),
                    T.RandomApply(T.RandomShadow(), p=0.4),
                    T.RandomApply(T.GaussianNoise(mean=0, std=0.1), 0.1),
                    T.RandomApply(GaussianBlur(3), 0.3),
                    RandomPerspective(distortion_scale=0.2, p=0.3),
]

Currently i'm trying your approach that says "provide some real examples", i wonder how the train command to achieve the doctr pretrained text recognition capability using my generated dataset with custom font..
like how many epoch, or any argument that i need to parse. Or even like the size of the image of the dataset, etc.
Thanks a lot, i really appreciate your response..

0 replies

felixT2K · 2023-09-27T06:25:10Z

felixT2K
Sep 27, 2023

@dhifafaz short update:

You should try the pretrained parseq recognition model (only on main branch available currently).
I have tested it with your provided image and it was nearly perfect.

b. was not detected correctly WALIKOTA was recognized as WALKOTA. All other looks correctly.

WALIKOTA SUBULUSSALAM
PROVINSI ACEH

PERATURAN WALIKOTA SUBULUSSALAM

NOMOR 12 TAHUN 2022

TENTANG

PERUBAHAN ATAS PERATURAN WALIKOTA SUBULUSSALAM NOMOR 124
TABUN 2020 TENTANG HASIL ANALISIS JABATAN STRUKTURAL DAN NON
STRUKTURAL UMUM DI SEKRETARIAT DAERAH KOTA SUBULUSSALAM

ATAS RAHMAT ALLAH YANG MAHA KUASA

WALKOTA SUBULUSSALAM,

Menimbang

: a. bahwa dalam rangka mendukung kelancaran
pelaksanaan tugas dan fungsi di Satuan Kerja
Perangkat Daerah di Lingkungan Pemerintah Kota
Subulussalam diperlukan penetapan kembali analisis
jabatan pelaksana pada Sekretariat Daerah Kota

Subulussalam;

. bahwa untuk menindaklanjuti Peraturan Menteri
Peraturan Menteri Pendayagunaan Aparatur Negara
dan Reformasi Birokrasi Nomor 41 Tahun 2018 tentang
Nomenklatur Jabatan Pelaksana Bagi Pegawai Negeri
Sipil Di Lingkungan Instansi Pemerintah, perlu

ditetapkan analisis jabatan pelaksana;

C. bahwa berdasarkan pertimbangan sebagaimana
dimaksud dalam huruf a dan huruf b perlu
menetapkan Peraturan Walikota tentang Hasil Analisis
Jabatan Struktural dan Non Struktural Umum pada

Sekretariat Daerah Kota Subulussalam;

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try To Finetune CRNN_VGG16_BN with custom font, but the infer was so bad #1285

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Try To Finetune CRNN_VGG16_BN with custom font, but the infer was so bad #1285

dhifafaz Aug 23, 2023

Replies: 4 comments · 8 replies

felixT2K Aug 23, 2023

dhifafaz Aug 23, 2023 Author

felixT2K Aug 23, 2023

dhifafaz Aug 23, 2023 Author

felixT2K Aug 23, 2023

dhifafaz Aug 23, 2023 Author

dhifafaz Aug 23, 2023 Author

dhifafaz Aug 23, 2023 Author

felixT2K Sep 27, 2023

dhifafaz
Aug 23, 2023

Replies: 4 comments 8 replies

felixT2K
Aug 23, 2023

dhifafaz Aug 23, 2023
Author

dhifafaz Aug 23, 2023
Author

dhifafaz Aug 23, 2023
Author

dhifafaz
Aug 23, 2023
Author

dhifafaz
Aug 23, 2023
Author

felixT2K
Sep 27, 2023