Ask about Phoneme Segmentation and Phoneme Duration #12

toannhu · 2018-01-30T08:04:57Z

Hi, @r9y9. First of all, thank you for such a brilliant implementation of Wavenet. Now I study about how to detect a phoneme duration (start time and end time of phoneme) extracted from audio and align this thing with linguistic feature but i don't know how to do this. Can you show me the idea that you use to solve the problem and where is the code in this repo to do this job? Thanks!

P/s: Btw is this possible to train this repo with another language? Currently, I'm doing with Vietnamese with my own dataset (7 hours of audio and ARPABET linguistic feature extracted from text)

r9y9 · 2018-01-30T18:12:47Z

The repository focuses on the WaveNet vocoder as the name says. It doesn't provide any of phoneme duration estimation and linguistic features extraction, which are needed to replicate original WaveNet-based TTS. The vocoder can take arbitrary type of input assuming time resolution is adjusted, though.

Linguistic feature extraction (a.k.a text processing frontend) is the hard part of TTS, which often requires deep knowledge for the target language. The WaveNet vocoder itself is language independent but you will have to implement a text processing frontend if you want to condition the model by linguistic features.

imdatceleste · 2018-01-31T09:00:13Z

@toannhu , you might be interested in Aeneas if you are only looking for phoneme detection. It is not meant for phonemes but by adjusting various parameters, it might be helpful in understanding how to do what you want to do.

toannhu · 2018-02-06T10:32:00Z

@r9y9 @imdatsolak Thanks for support. I have found Montreal Forced Aligner Tool that help me with this problem. As I can see in this repo, @r9y9 uses another library nnmnkwii to do the frontend things. Please excuse my ignorance but can you explain for me what the input (after do the frontend things) feed to Wavenet vocoder for local condition? It's very helpful to know the basic ideas about how Wavenet vocoder works. I'm really confused when read this repo's code. Once again thank you!

r9y9 · 2018-02-06T10:46:13Z

There's no text processing frontend used in the repository. nnmnkwii has functionality to extract linguistic features from HTS-style context labels, though. In this repository nnmnkwii is used for mostly preprocesssing. e.g, mulaw or inv_mulaw. https://r9y9.github.io/nnmnkwii/latest/references/preprocessing.html

The WaveNet class in the repository doesn't assume any particular domain of the conditional features, but training / pre-processing scripts are written assuming mel-spectrogram is used for the conditional feature.

toannhu · 2018-02-06T11:09:08Z

@r9y9 Thanks for enlight me. Finally I got the key thing. One more question, is this possible to use this Wavenet Vocoder repo with Tacotron? Do you plan to do this thing in the future? Any idea suggestion?

r9y9 · 2018-02-06T13:56:06Z

Definitely it's possible. Tacotron2-like wavenet vocoder is WIP at r9y9/deepvoice3_pytorch#21.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ask about Phoneme Segmentation and Phoneme Duration #12

Ask about Phoneme Segmentation and Phoneme Duration #12

toannhu commented Jan 30, 2018 •

edited

Loading

r9y9 commented Jan 30, 2018

imdatceleste commented Jan 31, 2018

toannhu commented Feb 6, 2018 •

edited

Loading

r9y9 commented Feb 6, 2018

toannhu commented Feb 6, 2018 •

edited

Loading

r9y9 commented Feb 6, 2018

r9y9 commented May 16, 2018

toannhu commented May 22, 2018

toannhu commented May 27, 2018 •

edited

Loading

Ask about Phoneme Segmentation and Phoneme Duration #12

Ask about Phoneme Segmentation and Phoneme Duration #12

Comments

toannhu commented Jan 30, 2018 • edited Loading

r9y9 commented Jan 30, 2018

imdatceleste commented Jan 31, 2018

toannhu commented Feb 6, 2018 • edited Loading

r9y9 commented Feb 6, 2018

toannhu commented Feb 6, 2018 • edited Loading

r9y9 commented Feb 6, 2018

r9y9 commented May 16, 2018

toannhu commented May 22, 2018

toannhu commented May 27, 2018 • edited Loading

toannhu commented Jan 30, 2018 •

edited

Loading

toannhu commented Feb 6, 2018 •

edited

Loading

toannhu commented Feb 6, 2018 •

edited

Loading

toannhu commented May 27, 2018 •

edited

Loading