-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad quality of generated speech after training #5
Comments
Should I preprocess f0 features with high and low frequencies cutoff, linear interpolation for zero values segments, normalizing? |
hi,which dataset you use for training? |
Tried on both Librispeech dev or NUS-48E |
@SolomidHero hi, have you solve this problem? I also had a bad quality, my STFT loss doesn't converge (nearly 3), and can even not fit a single song... |
@leerumor, Hi About this repository particularly I couldn't train good model based on dataset above and moved on |
Hello! I made some preprocessing to get features of wavs in dataset for training EA-SVC. Actually, I get the following features:
pyannote.audio
I tried training for first 2 stages (i.e. without adversarial generator training and then with it) on both LibriSpeech dev-clean and NUS48E singing. Disentaglement loss wasn't used in experiment. So, for the 1st stage
loss_g
(g_mag
+g_sc
) is about 1.0; for the 2nd:loss_g
increased to 5.0 (g_mag
+g_sc
+g_adv
+g_feat
),loss_d
is about 3.0e-01 (d_real
+d_fake
). Model wasn't trained for 3rd stage. In both dataset experiments results are quite the same.Because generated audio on both stages are not good, I wonder if I made a mistake in training process or something. I believe losses values above will give you a better view of this situation.
P.S. Number of stage refers to such parameter in config:
"adv_ag": false
,"adv_fd": false
"adv_ag": true
,"adv_fd": false
"adv_ag": true
,"adv_fd": true
The text was updated successfully, but these errors were encountered: