Replies: 19 comments
-
>>> georroussos |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> georroussos |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> georroussos |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> georroussos |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> julian.weber
[August 9, 2020, 10:21am]
Hello,
Since training a vocoder takes time and compute, I'd like to train and
contribute a universal vocoder that works for most use cases. slash
I have compute but I'm no expert on TTS and I'd like help choosing hyper
parameters and tuning the config file.
did it with WaveRNN and it worked
very well. slash
I'd like to do the same with faster inference speed to cover more use
cases by using either MelGAN or PWGAN on the same LibriTTS dataset.
According to my understanding, the sample-rate of the dataset used to
train Tacotron doesn't really matter because it shouldn't affect the mel
spectogram (I'm not so sure about that), the only parameters that should
affect it are :
should be fixable without retraining)
And so, still according to my understanding, these are the parameters
that must be shared with all models that use the same vocoder.
The vocoder's output sample rate shouldn't matter too much but I think
that 16kHz instead of the LibriTTS's 24kHz should give a 33% boost in
inference performance. (I'm not so sure about that as well since PWGAN
and MelGAN are far more parallelised than WaveRNN)
What do you think about that ? Is it a good idea ? Am I off in my
understanding of the TTS process ?
[This is an archived TTS discussion thread from discourse.mozilla.org/t/training-a-universal-vocoder]
Beta Was this translation helpful? Give feedback.
All reactions