Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More documentation? Long format audio? #73

Open
fivestones opened this issue May 31, 2023 · 1 comment
Open

More documentation? Long format audio? #73

fivestones opened this issue May 31, 2023 · 1 comment

Comments

@fivestones
Copy link

Hey! I like what you're doing with this! I was following your comments on hackernews a couple of weeks ago and you were saying that when you had a chance you'd make a youtube video or do something else to explain how you managed to get some things to work in bark (clone voices, etc). I'd especially like to know if there's any good way to do long format audio, like making an audiobook. I read somewhere where you said you aren't making audiobooks, but maybe you know some settings that would be good for this? When I've tried to use bark infinity to to tts on any longer piece of text, the final product sounds very choppy--there are lots of extra or cut off bits where the individual audio files were connected together. And the individual audio files sound enough different from each other that it feels like different people reading each sentence or so sometimes. Do you know how to make this better? I'd love to learn more from you if you have time to share this somewhere. Thanks!

@yatesdr
Copy link

yatesdr commented May 31, 2023

Bark is pretty much brand new, right now I haven't found any ways to increase the consistency. It's showing huge promise, but voice cloning seems to be nearly random to me, and as you've noted it's not always coherent from one output to the next. This is just the nature of generative AI, but you can try to tame it some and then later refine it.

Have you played with the temperature? I'm not real clear on waveform vs text temperature controls, but would recommend cooling them off a bit if you're looking for consistency.

The other thing you can do is run the full (joined) audio output through a speech encode / decode cycle in other tools that will help it mesh a bit better. I'd recommend looking into so-vits rvc as a place to start. You can also apply some voice filtering and other things there which would be useful for audio refinement for audio books.

Bark's expressiveness is really really good, but without fine tuning I think a very consistent voice will be difficult to achieve. Try to find a voice that's as consistent as possible, use bark for the natural sounding output, and then look at speech-to-speech encoding to get it all sounding proper. That's the path I'm currently on, anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants