Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

How to train and infer without using forced alignment? #81

Open
Cardroid opened this issue Mar 3, 2022 · 3 comments
Open

How to train and infer without using forced alignment? #81

Cardroid opened this issue Mar 3, 2022 · 3 comments

Comments

@Cardroid
Copy link
Contributor

Cardroid commented Mar 3, 2022

First of all, I would like to express my gratitude for creating a wonderful project.👍

I saw that there are various tokenizer implementations under the text folder.
However, I couldn't find a recipe using these options.

I don't have a phoneme label in my own dataset.
You can make it, but it would be nice if you could use it without making it.

If possible, could you tell me how to train and inference models without a phonemic label?

@ftshijt
Copy link
Member

ftshijt commented Mar 3, 2022

Hi, Many thanks for your interest in our projects!

We currently did not intensively test the text modules. Also, given limited data concerning the SVS, we haven't found a working solution for directly removing the dependency of phoneme information.

One potential hacking method would be equally distributing the phoneme duration in the duration of your word and letting the seq2seq model decide its duration during training. But it still requires some alignment over the word level. We have tried that for CSD corpus and it goes well with Korean syllables. I'm preparing the PR for now and will update it here shortly

@Cardroid
Copy link
Contributor Author

Cardroid commented Mar 4, 2022

Thank you👍
I'm looking forward to it!

@Cardroid
Copy link
Contributor Author

Cardroid commented Mar 6, 2022

I tried MFA based on what I experienced when I made TTS.
Perhaps because the pitch is not constant, it is not well aligned.

The dataset used CSD and KoG2P as g2p modules.

image

I want to see your solution quickly. 😅

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants