Support for real-time TTS! #5994

czuzu · 2024-05-08T09:55:48Z

Hello,

I've setup TGUI with the alltalk_tts extension locally, modified the setup to allow for passing LLM replies as they're being generated (stream mode) to the extension, and to subsequently do real-time TTS (aka "incremental" TTS).

PR for the extension is in the backlog too, streaming TTS is working as expected locally, this one is for the parts in TGUI I needed to adjust/extend to allow this to work smoothly.

Mainly, 2 changes were needed:

Add an output_modifier_stream handler for extensions (works only for chat-mode currently) as the enabler for streaming the LLM text to extensions
Do the chat HTML updates structurally and "incrementally" ("diff" mode) - only update what's needed using JS, this was needed because "audio" elements in the chat HTML were previously continuously re-rendered and made audio streaming not possible

(the rest are miscellaneous changes - adding a llama3 instruction template and a commented line to allow remotely debugging TGUI)

Let me know what you think and btw, nice project!
Thanks!

Checklist:

I have read the Contributing guidelines.

1. Workaround gradio's limitation that doesn't directly allow passing data from Python -> JS (only indirectly, through components) - see create_dataholder_gradio 2. Update the chat HTML structure-wise and incrementally ("diff" mode) - see js_chat_html_update This significantly makes the updates more efficient (no redundant HTML rendering) and additionally allows for stable components in the chat in streaming mode (e.g. important for example when extensions add <audio> elements in the chat - e.g. alltalk_TTS).

Allow extensions to modify the output when in chat mode and bot replies are streamed. This is useful, for example, for extensions that need access to the bot replies while they are streamed (e.g. incremental/streaming TTS).

hypersniper05 · 2024-05-17T03:55:25Z

Please support instruct mode

hypersniper05 · 2024-05-17T03:57:38Z

@oobabooga please consider this 🙏

bobcate · 2024-05-25T14:52:49Z

Hey @czuzu
Would you consider making this for SillyTavern?
Given that you listed only 2 things for it, I thought I'd just ask, if it's no trouble.

RandomInternetPreson · 2024-08-30T12:18:57Z

I gotta check the PR list more often, this is something I've needed for a while. Thank God textgen is open source and I can implement these changes on my own rig. Ty❤️❤️

czuzu added 4 commits May 8, 2024 11:12

Misc: shorthand for enabling remote debugging

2c43e02

Misc: add llama3 instruction template

ec716a3

czuzu mentioned this pull request May 8, 2024

TGUI: add support for XTTSv2 local streaming (including sentences streaming) erew123/alltalk_tts#208

Open

czuzu added 2 commits May 9, 2024 14:37

Remove redundant check from js_chat_html_update

4eb64a3

Chat: fix HTML changes observing in main.js

a6420c4

Katehuuh mentioned this pull request May 11, 2024

[Feature Request]: Package streaming End-to-End STT to TTS erew123/alltalk_tts#218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for real-time TTS! #5994

Support for real-time TTS! #5994

czuzu commented May 8, 2024

hypersniper05 commented May 17, 2024

hypersniper05 commented May 17, 2024

bobcate commented May 25, 2024

RandomInternetPreson commented Aug 30, 2024

Support for real-time TTS! #5994

Are you sure you want to change the base?

Support for real-time TTS! #5994

Conversation

czuzu commented May 8, 2024

Checklist:

hypersniper05 commented May 17, 2024

hypersniper05 commented May 17, 2024

bobcate commented May 25, 2024

RandomInternetPreson commented Aug 30, 2024