SSML generation – retain pronunciation context #1098

g-30 · 2023-03-15T12:22:40Z

g-30
Mar 15, 2023

Would it be possible to support SSML (Speech Synthesis Markup Language) tags generation to indicate the "prosody" for every word / sentence? I.e. if the person talks fast, slow, loud, high pitch / low pitch – with SSML integration we would have that data recorded.

This would be very useful when it comes to translating one language to another with the goal of retaining the audio context so that the further TTS translation wouldn't sound like soulless wall of text read by the machine. Is it possible with Whisper?

curiousyuvi · 2024-05-18T07:09:28Z

curiousyuvi
May 18, 2024

Yeah that would be a great use case for whisper.

0 replies

wanghanlele12345 · 2024-10-23T05:58:38Z

wanghanlele12345
Oct 23, 2024

+1 it would be great

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSML generation – retain pronunciation context #1098

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

SSML generation – retain pronunciation context #1098

g-30 Mar 15, 2023

Replies: 2 comments

curiousyuvi May 18, 2024

wanghanlele12345 Oct 23, 2024

g-30
Mar 15, 2023

curiousyuvi
May 18, 2024

wanghanlele12345
Oct 23, 2024