Home

Welcome to the Tacotron-2 wiki!

While browsing the Internet, I have noticed a large number of people claiming that Tacotron-2 is not reproducible, or that it is not robust enough to work on other datasets than the Google internal speech corpus. Although some open-source works(1, 2) has proven to give good results with the original Tacotron or even with Wavenet, it still seemed a little harder to reproduce the Tacotron 2 results with high fidelity to the descriptions of Tacotron-2 (T2) paper.

In this complementary documentation, I will mostly try to cover some ambiguities where understandings might differ and proving in the process that T2 actually works with open source speech corpus like Ljspeech dataset. Also, due to the limitation in size of the paper, authors can't get in much detail so they usually reference to previous works, in this documentation I did the job of extracting the relevant information from the references to make life a bit easier. Last but not least, despite only being released now, this documentation has mostly been written in parallel with development so pardon the disorder, I did my best to make it clear enough.

Also feel free to correct any mistakes you might encounter or contribute with any added value (experiments results, plots, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally