Training multiple speakers - what is the point? #679
Unanswered
Pentalimbed
asked this question in
Q&A
Replies: 2 comments
-
I think it's a result of averaging some weights (speakers share some layers). In original VITS repo it was an interesting feature which allowed to do voice conversion between model speakers. Here, it seems to be a rudimentary option (as model itself is doing any-to-one conversion). |
Beta Was this translation helpful? Give feedback.
0 replies
-
Yep, after training for 36 epochs on VCTK dataset it looks like it's no better than QuickVC. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have attempted to train multiple speakers at the same time, but the resulting voices are less distinct from each other. Is this what people call voice leaking / 音色泄漏? Is it because I need to train for more epochs, even though all losses seem to converge at the time?
With such a disadvantage, what is the point of training multiple speakers together? Does it improves pitches or generalization when there are less training materials for each speakers, compared to training them separately?
Beta Was this translation helpful? Give feedback.
All reactions