-
Hello, I have fine-tuned a Doctr model for text recognition, which you can find here (model). During training and evaluation, I achieved very high validation scores, with near-perfect metrics, as shown below: To further validate the model's performance, I generated a PDF file containing words solely from the validation dataset. However, when I run inference on this PDF, the model's performance is significantly worse than expected. Based on the high validation scores, I anticipated full recognition of the words, but this was not the case. For example: 1. Single word case: 2. Multiple words case: These detections are the most accurate I could achieve by setting the I also noticed the model adds extra punctuation marks. Initially, I thought this was due to overlapping detection boxes, but the issue persisted even when testing with a PDF containing one word per line. Am I wrong to expect 99% accuracy during inference on the validation dataset, given the near-perfect validation scores achieved during training? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 13 replies
-
Hi @ep0p 👋, Yeah that's correct. In the meanwhile you can check out the dataset i used to fine tune the multilingual model: synth_multilingual_dataset |
Beta Was this translation helpful? Give feedback.
@ep0p Update :)
I tweaked a bit around with your dataset and the model seems to bias hardly on the punctuation samples additional the dataset is really easy so it seems to overfit slightly with to much samples.
It's still not 100% solved but goes in the right direction.
I trained only for one epoch to check.
dirty code i used to clean only the train data (val data unchanged)