Taking Tacotron2 output to wavenet vocoder #30

danshirron · 2018-03-06T14:48:27Z

Anyone had experience with the above?
I guess that audio hparams need to be same for both. My intuition for using ljspeech:

Settings for tacotron 2 implementation(https://github.com/Rayhane-mamah/Tacotron-2):
num_mels=80
num_freq=1025; In wavenet code fft_size=1024 in t2 fft_size=(1025-1)*2=2048. As far as i understand i can keep this as is since anyways this accumulates to mel bands
sample_rate=20050 (As the ljspeech dataset)
frame_length_ms=46.44 (correlates to wavenet's fft_size/22050).
frame_shift_ms=11.61 (correlates to wavenet's hop_size=256, 256/22050=11.61ms)
preemphasis, not available in wavenet r9y9 implementation
Others: in t2 i dont have fmin(125 in wavenet) and fmax (7600 in wavenet). looking into t2 code,
the spectrogram fmin is set to 0 and fmax is set to 2/fsample = 22050/2=11025Hz. Since im using a pre-trained wavenet model i guess ill need to change params in t2 code.

Any remarks, suggestions?

neverjoe · 2018-03-07T06:56:37Z

At least, u should adjust the upsample layer with your hop_size or frame_shift_ms, in my experience, just freezing other layer and retraining can work.

imdatceleste · 2018-03-14T14:18:55Z

@danshirron : just did that. I used the same training data for both, did NOT change much in hparams (apart from samplerate in my case and batch-size). Generated a mel in T2, reshape'd it (there is a bug in T2) and fed it to wavenet_vocoder. It works very nicely, though my results are not yet really good (incomprehensible audio in T2 as well as in wavenet). But the incomprehensible audio sounded identical, though the output of wavenet is significantly smoother :)

r9y9 · 2018-05-09T16:28:05Z

Good news: I just tried this and found it works nicely. I'm planning to make a notebook for Tacotron2 + WaveNet text-to-speech demo that can be run on Google colab. Two samples attached.

taco2.zip

nikita-smetanin · 2018-05-09T21:47:11Z

@r9y9 samples sound amazing, can you please share the recipe how to train this model(s)? This doesn't look like default hparams.

rafaelvalle · 2018-05-09T21:57:17Z

Samples sound great! Would love to see the Taco2 and Wavenet params.

finally...! ref: #30 ref: r9y9/deepvoice3_pytorch#11 ref: Rayhane-mamah/Tacotron-2#30 (comment)

r9y9 · 2018-05-10T16:46:33Z

https://r9y9.github.io/wavenet_vocoder/ Uploaded Tacotron2 samples. You can find links to hyper params on the page, but here you are:

WaveNet params are same as default except for max_time_steps (I tried 10000 this time instead of 8000). I think 8000 should work too.

Training recipe:

Taco2: Fine turning the pretrained model provided at Pre-trained model and audio samples. Rayhane-mamah/Tacotron-2#30 (comment) for a few hours. This was needed since I changed the audio extraction pipeline (audio: Support for lws Rayhane-mamah/Tacotron-2#36). Found that a few hours are enough to get good quality.
WaveNet: Fine turning the pretrained model for 300k steps (approx. 3 days to train). Haven't tried training from scratch due to time constraints but I'm guessing it requires over 1000k steps to get good quality.

Synthesis recipe: combine them sequentially.

Does this help you?:)

nikita-smetanin · 2018-05-10T16:52:18Z

@r9y9 Great, thanks! Will try to reproduce it. Do you mean WaveNet fine-tuning with teacher-forcing like you did in this r9y9/deepvoice3_pytorch#21 pull or a simple fine-tuning on some dataset?

r9y9 · 2018-05-10T17:02:29Z

I meant just continuing training with the same dataset (LJSpeech) which the pretrained model trained on. This is due to there was a bug (#33) when I trained the pretrained model. I didn't use predicted mel-spectrograms for training WaveNet like I did in r9y9/deepvoice3_pytorch#21. That should improve quality but I wanted to try simpler case first.

rafaelvalle · 2018-05-10T23:08:44Z

@r9y9 Did you train the Taco 2 model with ARPAbet?

PetrochukM · 2018-05-12T06:05:33Z

@r9y9 What was your loss at the end of training?

r9y9 · 2018-05-12T06:52:41Z

@rafaelvalle I used https://github.com/Rayhane-mamah/Tacotron-2. I believe this uses ARPAbet for the network input.

r9y9 · 2018-05-12T06:55:09Z

@PetrochukM The last training curve(~ 300ksteps) for training WaveNet used for my demo.

neverjoe · 2018-05-14T06:53:51Z

@r9y9 Do you use np.interp to scale the mel feature? I found my result not performed perfect.

r9y9 · 2018-05-14T11:09:00Z

@neverjoe Check out https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/Tacotron2_and_WaveNet_text_to_speech_demo.ipynb. It's the complete recipe used for my experiment. From that you will find:

  # Range [0, 4] was used for training Tacotron2 but WaveNet vocoder assumes [0, 1]
  c = np.interp(c, (0, 4), (0, 1))

rafaelvalle · 2018-05-14T13:16:24Z

fyi, it's also possible to denormalize and normalize back again...

stale · 2019-05-30T02:21:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ahmed-fau · 2020-07-08T23:56:27Z

@r9y9 is it required to use that line in case of copy-synthesis too?
c = np.interp(c, (0, 4), (0, 1))
Or just use the mel calculated by the preprocess.py script and feed it to the model directly?

r9y9 · 2020-07-09T01:08:02Z

Not required

ahmed-fau · 2020-07-09T07:26:42Z

@r9y9 I am trying to use the WaveNet vocoder for copy-synthesis but still not able to generate the target signal correctly as the steps of using pretrained model are not clear as in the TTS colab notebook. I used the preprocess.py file with wavallin option to create the mel of the signal and then fed it to the pretrained model of LJ according to implementation v0.1.1. However, the generated signal is a total noise. What do u think to be a problem?

BTW: I inspected the value range of the obtained mel of my LJ signal and it was bet [-5.0.1]. Does this mean that the mel is not normalized and I should use another preprocessing script to get it bet [0,1]?

r9y9 mentioned this issue May 9, 2018

Tacotron 2 r9y9/deepvoice3_pytorch#11

Closed

r9y9 added a commit that referenced this issue May 10, 2018

Add Tacotron2 demo samples

87008a3

finally...! ref: #30 ref: r9y9/deepvoice3_pytorch#11 ref: Rayhane-mamah/Tacotron-2#30 (comment)

Rayhane-mamah mentioned this issue May 16, 2018

Implementation Status and planned TODOs Rayhane-mamah/Tacotron-2#4

Closed

27 tasks

stale bot added the wontfix label May 30, 2019

stale bot closed this as completed Jun 6, 2019

wjqkkky pushed a commit to wjqkkky/wavenet_vocoder that referenced this issue Jul 21, 2020

https://github.com/r9y9/wavenet_vocoder/issues/30#issuecomment-388112412

e2fe5bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking Tacotron2 output to wavenet vocoder #30

Taking Tacotron2 output to wavenet vocoder #30

danshirron commented Mar 6, 2018

neverjoe commented Mar 7, 2018

imdatceleste commented Mar 14, 2018

r9y9 commented May 9, 2018

nikita-smetanin commented May 9, 2018

rafaelvalle commented May 9, 2018

r9y9 commented May 10, 2018

nikita-smetanin commented May 10, 2018

r9y9 commented May 10, 2018

rafaelvalle commented May 10, 2018

PetrochukM commented May 12, 2018

r9y9 commented May 12, 2018

r9y9 commented May 12, 2018

neverjoe commented May 14, 2018

r9y9 commented May 14, 2018

rafaelvalle commented May 14, 2018

stale bot commented May 30, 2019

ahmed-fau commented Jul 8, 2020

r9y9 commented Jul 9, 2020

ahmed-fau commented Jul 9, 2020

Taking Tacotron2 output to wavenet vocoder #30

Taking Tacotron2 output to wavenet vocoder #30

Comments

danshirron commented Mar 6, 2018

neverjoe commented Mar 7, 2018

imdatceleste commented Mar 14, 2018

r9y9 commented May 9, 2018

nikita-smetanin commented May 9, 2018

rafaelvalle commented May 9, 2018

r9y9 commented May 10, 2018

nikita-smetanin commented May 10, 2018

r9y9 commented May 10, 2018

rafaelvalle commented May 10, 2018

PetrochukM commented May 12, 2018

r9y9 commented May 12, 2018

r9y9 commented May 12, 2018

neverjoe commented May 14, 2018

r9y9 commented May 14, 2018

rafaelvalle commented May 14, 2018

stale bot commented May 30, 2019

ahmed-fau commented Jul 8, 2020

r9y9 commented Jul 9, 2020

ahmed-fau commented Jul 9, 2020