-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Package streaming End-to-End STT to TTS #218
Comments
Hi @Katehuuh I have been considering adding whisper in to AllTalk in a couple of ways, so this could quite well fit into that :) So let me just ask a couple of questions on this:
Where this all gets complicated is multi-threading requests within Python and access to the GPU cores. Meaning, that if the LLM is controlling all the tensor cores of a GPU, it may not be happy also trying to generate TTS in the cores at the same time.... Ill have to think on this and look at it when we can play with the streaming generation. I guess Im more just putting this number 3 in here for my own reference/thoughts when I get to look at this again.
Was the only way to commit the generated STT to the chat? I know that text-gen-webui has recently moved from Gradio 3.5.2 to 4.28 (I think) and so maybe there are some better options within that version. What would be the benefit of going to "Generate JS in Gradio streaming" Sorry for the questions, Im just trying to get this fixed into my head! And thanks for offering your code! :) |
I’ve modified the default whisper_stt extensions to create ooba-insanely-fast-whisper,
«loop» for multiple reapeating step:
audio.stop_recording(
auto_transcribe, [audio, auto_submit, whipser_model, whipser_language], [shared.gradio['textbox'], audio]).then(
None, auto_submit, None, js="(check) => {if (check) { document.getElementById('Generate').click() }}") by using None, auto_submit, None, _js="(False) => { console.log('Check:', check); if (check) { document.getElementById('Generate').click(); }}"); so for |
Hi @Katehuuh Thanks for the reply. What Im going to do is put a link to this in the Feature Requests. Im so deep into working on v2 of AllTalk, I think its something I will try put in there as Im hoping to have a beta out soon. I may well get back to you if I get stuck somewhere along the lines. Thanks |
I've notice HF attempt the same: https://github.com/huggingface/speech-to-speech. Thought done manually will give more control. |
I’ve seen streaming TTS PR, and like the simple STT to TTS loop available in SillyTavern, it doesn't require any action from the user.
I thought you could add my script whisper I’ve made a fast STT along TTS (alltalk_tts) combined or optional with my ooba extensions fast STT script, to make a package streaming End-to-End STT to TTS so that user can answer naturally without Record/Press enter like from the defaut whisper_stt extensions.
While it works fine with auto
enter
key workaround, I did not find way toGenerate
JS in Gradio streaming.The text was updated successfully, but these errors were encountered: