Unexpected Poor Inference Despite High Evaluation Scores: Validation Loss 0.0031 (Exact: 99.70% | Partial: 99.74%) #1677
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 22 replies
-
@ep0p Short question the dataset contains "real" word crops or synth generated ? |
Beta Was this translation helpful? Give feedback.
-
Hi @felixdittrich92 , After fixing my dataset and fine-tuning parseq again, I'm still not getting good results (words are messed up, and there are a lot of punctuation marks):
I have also created a character map for my dataset: The only custom parameter I've used is the vocab. I was wondering if there is a cache or something of the models and maybe i am tuning a bad model (that was badly tuned before) ... or some other crazy explanation of why this is not working for me. My next, and last step in all this problem would be creating a synthetic ds to test. Edit: Or can it be because of the size of the image in the dataset? here is an example: At this stage i am doubting everything ... |
Beta Was this translation helpful? Give feedback.
Hi @ep0p 👋,
What i see directly is
--lr 0.01
what was the reason to use such a high value ? 😅Mh.. we pretrained the models
with the default args (only epochs increased to 20 - so maybe with less data set to 40 or 50)
from scratch on ~10M samples (3M are definitely enough especially for fine tuning).I stuck a bit because i have fine tuned
parseq
also (https://huggingface.co/Felix92/doctr-torch-parseq-multilingual-v1) only on ~1M synth samples without trouble. (with--pretrained
and50 epochs
other args unchanged)