Bad quality of generated speech after training #5

SolomidHero · 2021-01-29T13:24:55Z

Hello! I made some preprocessing to get features of wavs in dataset for training EA-SVC. Actually, I get the following features:

PPG from hidden state of model trained on TIMIT dataset (768 dim)
f0 with WORLD by direct use of pyworld (1 dim, zeros in f0 are not processed)
spk embeds using pyannote.audio

I tried training for first 2 stages (i.e. without adversarial generator training and then with it) on both LibriSpeech dev-clean and NUS48E singing. Disentaglement loss wasn't used in experiment. So, for the 1st stage loss_g(g_mag + g_sc) is about 1.0; for the 2nd: loss_g increased to 5.0 (g_mag + g_sc + g_adv + g_feat), loss_d is about 3.0e-01 (d_real + d_fake). Model wasn't trained for 3rd stage. In both dataset experiments results are quite the same.

Because generated audio on both stages are not good, I wonder if I made a mistake in training process or something. I believe losses values above will give you a better view of this situation.

P.S. Number of stage refers to such parameter in config:

"adv_ag": false, "adv_fd": false
"adv_ag": true, "adv_fd": false
"adv_ag": true, "adv_fd": true

The text was updated successfully, but these errors were encountered:

SolomidHero · 2021-02-01T13:46:08Z

Should I preprocess f0 features with high and low frequencies cutoff, linear interpolation for zero values segments, normalizing?

980202006 · 2021-02-08T03:19:29Z

hi，which dataset you use for training？

SolomidHero · 2021-02-08T10:41:36Z

Tried on both Librispeech dev or NUS-48E

leerumor · 2022-01-26T12:19:44Z

@SolomidHero hi, have you solve this problem? I also had a bad quality, my STFT loss doesn't converge (nearly 3), and can even not fit a single song...

SolomidHero · 2022-07-21T14:46:03Z

@leerumor, Hi
Sorry for the late answer. I managed that it is hard task to learn wav2wav conversion.
It should have been trained many epochs and have large enough dataset 50hrs+ minimum.
If GAN model is used it might not converge :(

About this repository particularly I couldn't train good model based on dataset above and moved on

SolomidHero changed the title ~~Bad quality result after training on LibriSpeech~~ Bad quality result after training Jan 29, 2021

SolomidHero changed the title ~~Bad quality result after training~~ Bad quality of generated speech after training Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad quality of generated speech after training #5

Bad quality of generated speech after training #5

SolomidHero commented Jan 29, 2021

SolomidHero commented Feb 1, 2021 •

edited

Loading

980202006 commented Feb 8, 2021

SolomidHero commented Feb 8, 2021

leerumor commented Jan 26, 2022

SolomidHero commented Jul 21, 2022

Bad quality of generated speech after training #5

Bad quality of generated speech after training #5

Comments

SolomidHero commented Jan 29, 2021

SolomidHero commented Feb 1, 2021 • edited Loading

980202006 commented Feb 8, 2021

SolomidHero commented Feb 8, 2021

leerumor commented Jan 26, 2022

SolomidHero commented Jul 21, 2022

SolomidHero commented Feb 1, 2021 •

edited

Loading