Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support voice synthesis to Vec<u8> #30

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Bear-03
Copy link
Contributor

@Bear-03 Bear-03 commented Jul 22, 2022

This aims to solve #12.

@Bear-03
Copy link
Contributor Author

Bear-03 commented Jul 22, 2022

I've only implemented WinRT for now, but I'll look into how to implement it for the other backends.

@ndarilek
Copy link
Owner

neat, thanks! More backends would be great--what often happens is folks do one and I end up having to do the rest. :) You won't be able to do tolk, but if you can at least cover web, I'll look into the others.

Also, I wonder if we should use Vec<u8> or some other, slightly smarter container for audio? I'd like to be sure there's a known output format for whatever audio data we get, and I'm concerned that each synth might have its own concept of what format to use for synthesized audio. So we might end up with a situation where different platforms output different formats and the crate is no longer cross-platform.

Thanks again.

@Bear-03
Copy link
Contributor Author

Bear-03 commented Jul 22, 2022

I'd like to be sure there's a known output format for whatever audio data we get

For my current project (where I'm going to be using tts-rs) I use PCM, specifically PCM 16-bit. PCM 16-bit, floating point and unsigned 16-bit are the three formats that cpal supports, and since it is a popular crate, I'd assume implementing those would be more than enough.

I remember reading that the bytes WinRT returns are already PCM, but I'm not too sure, we could do some research. Of course, keeping the library cross-platform is a priority.

@ndarilek
Copy link
Owner

ndarilek commented Jul 22, 2022 via email

@Bear-03
Copy link
Contributor Author

Bear-03 commented Jul 22, 2022

Yes, now that you said it, that's true, you're often required to provide a lot of parameters to play audio or save it to a file. I'm pretty sure that those are constant for a given backend, so it would be a matter of creating something like a Spec struct that holds that data for each backend.

@ndarilek
Copy link
Owner

Does cpal not have some sort of audio container with all this data that we can return directly? I'm a bit hesitant to have the audio parameters be a separate thing you need access to--I'd rather the return value include everything necessary, if possible.

@Bear-03
Copy link
Contributor Author

Bear-03 commented Jul 22, 2022

cpal uses SupportedStreamConfig, which holds the info about your input/output device.

Returning the audio metadata every time you synthesize would be wasteful, in my opinion, as you're usingresources for things that aren't really needed. The audio metadata won't change during runtime, so generating it once and letting the developer store it is far more efficient.

@ndarilek
Copy link
Owner

ndarilek commented Jul 22, 2022 via email

@Bear-03
Copy link
Contributor Author

Bear-03 commented Jul 22, 2022

if there's some way we can autogenerate it once and cache it, that might be useful

And then return it in the container?

I'm a bit concerned about these formats changing, and of having to maintain/sync hard-coded structs

I am sure that it is impossible to retrieve the audio metadata from the audio bytes themselves, as you need the metadata first to then interpret them.

Afaik WinRT doesn't have any way to get the audio spec (I'll have a look), so the only alternative is hard-coding it. My idea is to have something like min_rate(), normal_rate() and max_rate(), that returns the Spec for each backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants