-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AlignerNet instead of MAS #81
Comments
Check out pflow repo for guidance. |
It might be that it is still learning the durations. It think MAS is good enough. What is important is the duration_predictor module. |
So this above graph is after training for several days and thousands of steps. It seems like some bug - maybe in shape size or sth. The output is basically the first word is legible but the rest is basically gibberish. |
I see. Maybe it needs some fixing to do, if you can start a PR we can debug this together. I am busy with other stuff. |
Probably same problem as we have in pflow) |
@Tera2Space What was the problem? |
Code geass nice, problem was that it generate wrong aligment, i just though of reason: p0p4k/pflowtts_pytorch#24 (comment) in your model, what shape is input to alignernet? |
@Tera2Space @p0p4k I added a basic PR of the changes I have so far: #82 |
For a single batch - shapes look something like this:
|
hm than my guess was wrong/there are more problems. |
Iirc, I had yanked the alignernet from there. |
Yeah currently trying out various inputs to Aligner; possibly an input issue. It might be an issue with the masks. |
still not convinced about putting efforts in aligner net, we must focus on a better |
do you think that's a bottleneck right now? |
For vits2 it should be the duration predictor, and for pflow it should be both textencoder and duration predictor. MAS gives good alignments during training, it is during inference that these models perform worse. |
how do you think we can improve pflow's encoder? |
Is it possible to use AlignerNet (aligner.py in pflow-tts repo) instead of MAS in VITS2?
What should be changed in the code? I am a bit confused on what the inputs should be.
The text was updated successfully, but these errors were encountered: