-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support voice synthesis to Vec<u8> #30
base: master
Are you sure you want to change the base?
Conversation
I've only implemented WinRT for now, but I'll look into how to implement it for the other backends. |
neat, thanks! More backends would be great--what often happens is folks do one and I end up having to do the rest. :) You won't be able to do tolk, but if you can at least cover web, I'll look into the others. Also, I wonder if we should use Thanks again. |
For my current project (where I'm going to be using tts-rs) I use PCM, specifically PCM 16-bit. PCM 16-bit, floating point and unsigned 16-bit are the three formats that I remember reading that the bytes WinRT returns are already PCM, but I'm not too sure, we could do some research. Of course, keeping the library cross-platform is a priority. |
Gotcha. I'm guessing the way forward is to synthesize to something other
than a raw vec but which indicates its format. Then, if we discover
everything just happens to use the same format, we can drop that
requirement and just send raw bytes. I feel like whenever I have to pipe
bytes to an audio library, I'm often required to know things about them
(I.e. sample rate, bit depth, etc.) I want to make sure we're giving
folks that information if it's going to differ from engine to engine so
they don't have to figure it out themselves.
Thanks again.
|
Yes, now that you said it, that's true, you're often required to provide a lot of parameters to play audio or save it to a file. I'm pretty sure that those are constant for a given backend, so it would be a matter of creating something like a |
Does cpal not have some sort of audio container with all this data that we can return directly? I'm a bit hesitant to have the audio parameters be a separate thing you need access to--I'd rather the return value include everything necessary, if possible. |
cpal uses Returning the audio metadata every time you synthesize would be wasteful, in my opinion, as you're usingresources for things that aren't really needed. The audio metadata won't change during runtime, so generating it once and letting the developer store it is far more efficient. |
Gotcha, I'd hoped it'd be part of the returned container. Anyhow, if
there's some way we can autogenerate it once and cache it, that might be
useful. I'm a bit concerned about these formats changing, and of having
to maintain/sync hard-coded structs. But maybe that's not warranted.
I'll see what you come up with.
|
And then return it in the container?
I am sure that it is impossible to retrieve the audio metadata from the audio bytes themselves, as you need the metadata first to then interpret them. Afaik WinRT doesn't have any way to get the audio spec (I'll have a look), so the only alternative is hard-coding it. My idea is to have something like |
This aims to solve #12.