Help debugging Seamless Streaming. Getting a weird translation on initialization and then echos of the correct translation (translates it twice) #515

Jonovono · 2024-09-30T23:25:36Z

I have a sample repo here

https://github.com/StreamUI/QuicSeamlessStreaming/tree/master

I am using Quic to transport audio from client -> server -> client. I borrowed most my inspo from the hugging face example and colab example, but must have something messed up. I know it's working as I can sometimes get the correct translation.

For testing, I have a simple spanish audio file saying "how are you". For some reason the first output I get this weird repeat of saying "all of it, all of it". I believe I tested this with a different initial audio file and it wasn't the same, so its not always spitting out the same stuff. Now, if I stop the client and restart it, I seem to get the correct translation but it's echod / duplicated. So it repeats "How are you" twice.

2024-09-30 18:34:19,130 INFO -- server: Connection made.
2024-09-30 18:34:19,132 INFO -- quic: [066b4886080a7db7] Negotiated protocol version 0x00000001 (VERSION_1)
2024-09-30 18:34:21,248 INFO -- server: [send_output] Got text segment: All of it, all
2024-09-30 18:34:21,248 INFO -- server: [send_output] Got speech segment
2024-09-30 18:34:23,104 INFO -- server: [send_output] Got text segment: of it, all of it, all of it, all of it, all of it.
2024-09-30 18:34:23,104 INFO -- server: [send_output] Got speech segment
2024-09-30 18:34:26,575 INFO -- server: [send_output] Got text segment: How
2024-09-30 18:34:26,576 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:26,739 INFO -- server: Connection terminated
2024-09-30 18:34:26,739 INFO -- server: Cancelling tasks
2024-09-30 18:34:26,740 INFO -- server: [send_output] Sender task cancelled
2024-09-30 18:34:28,468 INFO -- server: Connection made.
2024-09-30 18:34:28,469 INFO -- quic: [2618da09a7676177] Negotiated protocol version 0x00000001 (VERSION_1)
2024-09-30 18:34:28,857 INFO -- server: [send_output] Got text segment: are
2024-09-30 18:34:28,857 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:31,501 INFO -- server: [send_output] Got text segment: you
2024-09-30 18:34:31,501 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:33,224 INFO -- server: [send_output] Got text segment: ?
2024-09-30 18:34:33,224 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:35,365 INFO -- server: [send_output] Got text segment: How
2024-09-30 18:34:35,365 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:39,059 INFO -- server: [send_output] Got text segment: are you?
2024-09-30 18:34:39,059 INFO -- server: [send_output] Got speech segment

Perhaps just by looking at my code someone will be able to spot what I did wrong (under 300 lines really).

My guess is its something with the model config, reset states / state configuration, or how I sent audio in. In the HF example you do a lot of preprocessing which I am ignoring right now.

It's also possible it's unrelated to the model, and something to do with how I am processing events async with quic (might explain the repeats...)

Any help would be great, so close yet so far!

The text was updated successfully, but these errors were encountered:

Jonovono · 2024-10-01T22:43:46Z

Figured it out, mostly. Was chunking way too tiny.

2024-10-01 22:20:35,651 INFO -- server: [ingest_audio] Got input of length 623
2024-10-01 22:20:35,651 INFO -- server: [ingest_audio] Got input of length 622

Still lots of issues to debug, but not related to seamless ;p

Jonovono closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help debugging Seamless Streaming. Getting a weird translation on initialization and then echos of the correct translation (translates it twice) #515

Help debugging Seamless Streaming. Getting a weird translation on initialization and then echos of the correct translation (translates it twice) #515

Jonovono commented Sep 30, 2024 •

edited

Loading

Jonovono commented Oct 1, 2024

Help debugging Seamless Streaming. Getting a weird translation on initialization and then echos of the correct translation (translates it twice) #515

Help debugging Seamless Streaming. Getting a weird translation on initialization and then echos of the correct translation (translates it twice) #515

Comments

Jonovono commented Sep 30, 2024 • edited Loading

Jonovono commented Oct 1, 2024

Jonovono commented Sep 30, 2024 •

edited

Loading