Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of validation set? #13

Open
guozixunnicolas opened this issue May 1, 2022 · 3 comments
Open

Lack of validation set? #13

guozixunnicolas opened this issue May 1, 2022 · 3 comments

Comments

@guozixunnicolas
Copy link

Hi there,

Thanks for the implementation! Appreciate if you could share more insight on why there's no valiadtion/test set involved during training?

Best,

@wayne391
Copy link
Member

wayne391 commented May 3, 2022

Hi,

It's an interesting question. We did have such kind of discussion at the early stages. We used to run validation during training and found the validation loss would be extremely high and could not reflect the quality of generated results.

The conclusion is that "overfitting" is somewhat an important or necessary factor of a good generative LM model. Models with higher validation loss might generate better results because they have higher probabilities of "remembering" good sentences from humans. I recall that a paper mentioned this phenomenon as well (but I forget its title...).

Furthermore, the quality hugely depends on another factor - "sampling" at the "inference" stage. Combining the two factors, we considered that the runtime validation loss might not be very useful, so we discarded it in every following work.

@guozixunnicolas
Copy link
Author

Hi,

Thanks for the detailed reply.

I remember in a beginner course project where I supervised some students training the bach chorale dataset using CNN. The results turn out to be pretty good, with all the kinda voice leading and counterpointal movement. I was a bit surprised to see CNN could produce such good results. After diving deep into the code and I realize that there's no validation set involved. After some exploration, the generated results are basically "copying" whatever they've seen from the training set which couldnt reflect the generation & generalizing ability of the model. Have you checked such "plagiarism" effect on the generated results?

I still believe a validation/test set is needed during training. Else, why bother using the SOTA model (i.e. transformer) right? Why not just using a super-overfitting CNN with much more parameters which would result in equally good results?

Regarding sampling, I believe you only used top-k/top-p/temperature-regularized sampling right(correct me if im wrong)? given the overfitting behavior, the logits would tend to heavily distributed to the overfitting token(e.g. [1e4, 1e1, 1e-1, 1e-2]), hence top-p/top-k wouldn't affect much I believe unless you apply a super-high temperature?

Happy to discuss!

@dedededefo
Copy link

Hi,

It's an interesting question. We did have such kind of discussion at the early stages. We used to run validation during training and found the validation loss would be extremely high and could not reflect the quality of generated results.

The conclusion is that "overfitting" is somewhat an important or necessary factor of a good generative LM model. Models with higher validation loss might generate better results because they have higher probabilities of "remembering" good sentences from humans. I recall that a paper mentioned this phenomenon as well (but I forget its title...).

Furthermore, the quality hugely depends on another factor - "sampling" at the "inference" stage. Combining the two factors, we considered that the runtime validation loss might not be very useful, so we discarded it in every following work.

Hi,How to generate validation_songs.json?There seems to be no mention in the description of the dataset file.I would appreciate it if you could answer me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants