Skip to content

Question Regarding Model Evaluation vs. Inference Performance on the Validation Dataset #1718

Answered by felixT2K
ep0p asked this question in Q&A
Discussion options

You must be logged in to vote

@ep0p Update :)
I tweaked a bit around with your dataset and the model seems to bias hardly on the punctuation samples additional the dataset is really easy so it seems to overfit slightly with to much samples.

It's still not 100% solved but goes in the right direction.
I trained only for one epoch to check.

dirty code i used to clean only the train data (val data unchanged)

import json
from string import punctuation
from collections import defaultdict
from tqdm import tqdm

new = {}
unique_words = set()
word_count = defaultdict(int)
count_with_ending_dot = 0

with open("/home/felix/Desktop/doctr_test_data/X_TEST/Recognition_ds/train_val_archive/train/original_labels.json", "r", encoding=…

Replies: 1 comment 13 replies

Comment options

You must be logged in to vote
13 replies
@felixT2K
Comment options

@felixdittrich92
Comment options

@ep0p
Comment options

@felixT2K
Comment options

Answer selected by ep0p
@ep0p
Comment options

@felixT2K
Comment options

@ep0p
Comment options

@ep0p
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants