Skip to content

Commit

Permalink
Add Tacotron2 demo samples
Browse files Browse the repository at this point in the history
  • Loading branch information
r9y9 committed May 10, 2018
1 parent 2f6b569 commit 87008a3
Show file tree
Hide file tree
Showing 3 changed files with 181 additions and 12 deletions.
1 change: 1 addition & 0 deletions docs/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ author = "Ryuichi YAMAMOTO"

[params]
author = "Ryuichi YAMAMOTO"
project = "wavenet_vocoder"
logo = "/images/r9y9.jpg"
twitter = "r9y9"
github = "r9y9"
Expand Down
180 changes: 174 additions & 6 deletions docs/content/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ type = "index"
- Github: https://github.com/r9y9/wavenet_vocoder

This page provides audio samples for the open source implementation of the **WaveNet (WN)** vocoder.
Text-to-speech samples are found at the last section.

1. WN conditioned on mel-spectrogram (16-bit linear PCM, 22.5kHz)
2. WN conditioned on mel-spectrogram (8-bit mu-law, 16kHz)
3. WN conditioned on mel-spectrogram and speaker-embedding (16-bit linear PCM, 16kHz)
3. (Not yet) DeepVoice3 + WaveNet vocoder
- WN conditioned on mel-spectrogram (16-bit linear PCM, 22.5kHz)
- WN conditioned on mel-spectrogram (8-bit mu-law, 16kHz)
- WN conditioned on mel-spectrogram and speaker-embedding (16-bit linear PCM, 16kHz)
- Tacotron2 + WN text-to-speech (**New!**)

## WN conditioned on mel-spectrogram (16-bit linear PCM, 22.5kHz)

Expand Down Expand Up @@ -402,9 +403,175 @@ Your browser does not support the audio element.

[^1]: Note that mel-spectrogram used in local conditioning is dependent on speaker characteristics, so we cannot simply change the speaker identity of the generated audio samples using the model. It should work without speaker embedding, but it might have helped training speed.

## DeepVoice3 + WaveNet vocoder
## Tacotron2 + WN text-to-speech

- Tacotron2: trained 189k steps on LJSpeech dataset ([Pre-trained model](https://www.dropbox.com/s/vx7y4qqs732sqgg/pretrained.tar.gz?dl=0), [Hyper params](https://github.com/r9y9/Tacotron-2/blob/9ce1a0e65b9217cdc19599c192c5cd68b4cece5b/hparams.py)). The work has been done by [@Rayhane-mamah](https://github.com/Rayhane-mamah). See https://github.com/Rayhane-mamah/Tacotron-2 for details.
- WaveNet: trained over 1000k steps on LJSpeech dataset ([Pre-trained model](https://www.dropbox.com/s/zdbfprugbagfp2w/20180510_mixture_lj_checkpoint_step000320000_ema.pth?dl=0), [Hyper params](https://www.dropbox.com/s/0vsd7973w20eskz/20180510_mixture_lj_checkpoint_step000320000_ema.json?dl=0))


Scientists at the CERN laboratory say they have discovered a new particle.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00001.wav" autoplay/>
Your browser does not support the audio element.
</audio>


There's a way to measure the acute emotional intelligence that has never gone out of style.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00002.wav" autoplay/>
Your browser does not support the audio element.
</audio>


President Trump met with other leaders at the Group of 20 conference.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00003.wav" autoplay/>
Your browser does not support the audio element.
</audio>


The Senate's bill to repeal and replace the Affordable Care Act is now imperiled.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00004.wav" autoplay/>
Your browser does not support the audio element.
</audio>


Generative adversarial network or variational auto-encoder.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00005.wav" autoplay/>
Your browser does not support the audio element.
</audio>


Basilar membrane and otolaryngology are not auto-correlations.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00006.wav" autoplay/>
Your browser does not support the audio element.
</audio>


He has read the whole thing.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00007.wav" autoplay/>
Your browser does not support the audio element.
</audio>


He reads books.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00008.wav" autoplay/>
Your browser does not support the audio element.
</audio>


Don't desert me here in the desert!

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00009.wav" autoplay/>
Your browser does not support the audio element.
</audio>


He thought it was time to present the present.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00010.wav" autoplay/>
Your browser does not support the audio element.
</audio>

Thisss isrealy awhsome.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00011.wav" autoplay/>
Your browser does not support the audio element.
</audio>


Punctuation sensitivity, is working.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00012.wav" autoplay/>
Your browser does not support the audio element.
</audio>


Punctuation sensitivity is working.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00013.wav" autoplay/>
Your browser does not support the audio element.
</audio>


The buses aren't the problem, they actually provide a solution.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00014.wav" autoplay/>
Your browser does not support the audio element.
</audio>


The buses aren't the PROBLEM, they actually provide a SOLUTION.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00015.wav" autoplay/>
Your browser does not support the audio element.
</audio>


The quick brown fox jumps over the lazy dog.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00016.wav" autoplay/>
Your browser does not support the audio element.
</audio>

Does the quick brown fox jump over the lazy dog?

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00017.wav" autoplay/>
Your browser does not support the audio element.
</audio>


Peter Piper picked a peck of pickled peppers. How many pickled peppers did Peter Piper pick?

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00018.wav" autoplay/>
Your browser does not support the audio element.
</audio>


She sells sea-shells on the sea-shore. The shells she sells are sea-shells I'm sure.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00019.wav" autoplay/>
Your browser does not support the audio element.
</audio>


The blue lagoon is a nineteen eighty American romance adventure film.

<audio controls="controls" >
<source src="/wavenet_vocoder/audio/tacotron2/20180510_mixture_lj_checkpoint_step000320000_ema_speech-mel-00020.wav" autoplay/>
Your browser does not support the audio element.
</audio>


### On-line demo

A demonstration notebook supposed to be run on Google colab can be found at [Tacotron2 + WaveNet text-to-speech demo
](https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/Tacotron2_and_WaveNet_text_to_speech_demo.ipynb).

TODO

## References

Expand All @@ -413,3 +580,4 @@ TODO
- [Tamamori, Akira, et al. "Speaker-dependent WaveNet vocoder." Proceedings of Interspeech. 2017.](http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0314.PDF)
- [Jonathan Shen, Ruoming Pang, Ron J. Weiss, et al, "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", arXiv:1712.05884, Dec 2017.](https://arxiv.org/abs/1712.05884)
- [Wei Ping, Kainan Peng, Andrew Gibiansky, et al, "Deep Voice 3: 2000-Speaker Neural Text-to-Speech", arXiv:1710.07654, Oct. 2017.](https://arxiv.org/abs/1710.07654)
- [Jonathan Shen, Ruoming Pang, Ron J. Weiss, et al, "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", arXiv:1712.05884, Dec 2017.](https://arxiv.org/abs/1712.05884)
12 changes: 6 additions & 6 deletions docs/layouts/partials/header.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/skeleton.css">
<link rel="stylesheet" href="/css/custom.css">
<link rel="alternate" href="/index.xml" type="application/rss+xml" title="{{ .Site.Title }}">
<link rel="shortcut icon" href="/favicon.png" type="image/x-icon" />
<link rel="stylesheet" href="/{{ .Site.Params.Project }}/css/normalize.css">
<link rel="stylesheet" href="/{{ .Site.Params.Project }}/css/skeleton.css">
<link rel="stylesheet" href="/{{ .Site.Params.Project }}/css/custom.css">
<link rel="alternate" href="/{{ .Site.Params.Project }}/index.xml" type="application/rss+xml" title="{{ .Site.Title }}">
<link rel="shortcut icon" href="/{{ .Site.Params.Project }}/favicon.png" type="image/x-icon" />
<title>{{ $isHomePage := eq .Title .Site.Title }}{{ .Title }}{{ if eq $isHomePage false }} - {{ .Site.Title }}{{ end }}</title>
</head>
<body>
Expand All @@ -19,7 +19,7 @@

<header role="banner">
<div class="header-logo">
<a href="/"><img src="{{ .Site.Params.logo }}" width="70" height="70"></a>
<a href="/"><img src="/{{ .Site.Params.Project }}/{{ .Site.Params.logo }}" width="70" height="70"></a>
</div>
{{ if eq $isHomePage true }}<h1 class="site-title">{{ .Site.Title }}</h1>{{ end }}
</header>

0 comments on commit 87008a3

Please sign in to comment.