Poor text detection performance after training fast_base on custom dataset #1704

GrayChan813 · 2024-08-25T08:08:00Z

GrayChan813
Aug 25, 2024

I prepared my custom data according to the training tutorial and trained the fast_base model using the following script:

python references/detection/train_pytorch.py data/train/ data/test/ fast_base --name fast_base_20240819 --epochs 100 --batch_size 16 --device 0 --lr 0.0001 --workers 16 --rotation

After iterating for multiple epochs, I evaluated the performance of the model and the indicators obtained are as follows：
（Validation loss: 0.3952 (Recall: 41.16% | Precision: 1.34% | Mean IoU: 2.00%)）

I don’t know where the problem lies. I followed the training instructions completely when preparing the dataset. I also visualized the dataset in text lines's bounding box to confirm that there was no problem with the data format. In addition, I also used the evaluation script "references/detection/evaluate_pytorch.py" to evaluate the performance of my trained model on the FUNSD dataset, and also got a very poor detection performance.

Therefore, I would like to ask if I have overlooked any key steps in the training process, or if some training parameters will affect the performance, such as --ratation, etc. I really hope someone can answer my doubts, thank you!

GrayChan813 · 2024-08-25T08:14:46Z

GrayChan813
Aug 25, 2024
Author

Another additional detail is that during the training process, the loss will drop from 0.8 to 0.4 after 2-3 epochs, and then stop decreasing.

0 replies

felixdittrich92 · 2024-08-25T08:56:16Z

felixdittrich92
Aug 25, 2024
Maintainer

Hi @GrayChan813 👋🏼,
With the current command you train the model from scratch (--pretrained missing if you want to fine tune) is this planned ?
How much data samples are in your train / val sets ?
Real or synth data ?
With text line bounding boxes you mean your dataset contains annotations on text line level and not bounding box coords for each word ?

Best regards
Felix :)

1 reply

GrayChan813 Aug 25, 2024
Author

Hi @felixdittrich92,
Is --pretrained required? I ignored this parameter in the previous training.

My custom dataset has about 100,000 samples, including real scene and synthetic data. Is it important to load pre-trained weights for datasets of this magnitude? The ratio of the validation set to the training set is 1:9.

My annotations are all text line-level annotations. Each text line uses the coordinates of four points to annotate a polygonal text line, rather than word-level annotations.

GrayChan813 · 2024-08-26T03:35:54Z

GrayChan813
Aug 26, 2024
Author

I run inference on some samples using the detection model I trained, and their results are weird. It seems that the model is able to detect the text lines, but there are many small boxes in the detection results that should not appear.

3 replies

felixdittrich92 Aug 26, 2024
Maintainer

Hi @GrayChan813 👋,

In general docTR (train + inference) is designed for word level detection and recognition, so we have never tried to train a model on text line level. 😅

You plan to use only the detection_predictor from docTR afterwards ? Because i think the rest of the pipe will not work as expected with an text line detection model.

Your shared training results doesn't looks good: （Validation loss: 0.3952 (Recall: 41.16% | Precision: 1.34% | Mean IoU: 2.00%)）
About the data: 100K samples should be enough we have pretrained the models on ~250K real samples

I would switch to a StepLR scheduler to get a first feeling if you say it stagnant after 3 epochs.
I would also try to train with --pretrained.

This needs a bit experimentation. But i agree the small boxes looks super weird.

GrayChan813 Aug 27, 2024
Author

Hi, @felixdittrich92,
Thank you for your patience!

I am currently trying to train a text line detection model based on FAST. In fact, I really only plan to use the detector. I also read the code of the project, and I think the detector can be decoupled from the entire pipeline.

I read the source code of FAST in the project and did some visual inspections, and did not find any problems. Maybe I should try to check my data carefully and try your suggestions. I will do some experiments later and update if there is progress.

Thank you again for your help!

felixT2K Aug 27, 2024

Hi, @felixdittrich92, Thank you for your patience!

I am currently trying to train a text line detection model based on FAST. In fact, I really only plan to use the detector. I also read the code of the project, and I think the detector can be decoupled from the entire pipeline.

I read the source code of FAST in the project and did some visual inspections, and did not find any problems. Maybe I should try to check my data carefully and try your suggestions. I will do some experiments later and update if there is progress.

Thank you again for your help!

Yeah sure you can use both the detection_predictor and recognition_predictor standalone, it's not required to use the whole pipeline that's the initial design decision behind docTR :)

Good luck with the experiments happy about any update or feedback 👍🏼

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor text detection performance after training fast_base on custom dataset #1704

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Poor text detection performance after training fast_base on custom dataset #1704

GrayChan813 Aug 25, 2024

Replies: 3 comments · 4 replies

GrayChan813 Aug 25, 2024 Author

felixdittrich92 Aug 25, 2024 Maintainer

GrayChan813 Aug 25, 2024 Author

GrayChan813 Aug 26, 2024 Author

felixdittrich92 Aug 26, 2024 Maintainer

GrayChan813 Aug 27, 2024 Author

felixT2K Aug 27, 2024

GrayChan813
Aug 25, 2024

Replies: 3 comments 4 replies

GrayChan813
Aug 25, 2024
Author

felixdittrich92
Aug 25, 2024
Maintainer

GrayChan813 Aug 25, 2024
Author

GrayChan813
Aug 26, 2024
Author

felixdittrich92 Aug 26, 2024
Maintainer

GrayChan813 Aug 27, 2024
Author