Replies: 1 comment 1 reply
-
BERT model is validated in the PL-BERT paper and repo, so you can refer to https://github.com/yl4579/PL-BERT for more details. The use of diffusion model is more akin to stable diffusion (latent diffusion model), just the latent variable is a style vector. You can think of StyleTTS as an autoencoder where the style encoder encodes the speech into a latent space (style) and then the speech is reconstructed from the latent style. The diffusion model turns StyleTTS into a probabilistic generative model that samples the style directly. The style has two parts, one is acoustic (
|
Beta Was this translation helpful? Give feedback.
-
When comparing code of original distro with yl4579, the inference method has changed by adding a diffusion sampler and the Bert model. I have been puzzling over it. If you could, please shed some light on why this method was added, and what is achieved by it. Much appreciated, it is difficult to learn about these subjects.
Looking closer, I can see that a tensor is produced by the sampler function, then the tensor is split into two parts, s and ref, but ref seems to be unused. Would it be possible to get some comments on that part of the code?
Beta Was this translation helpful? Give feedback.
All reactions