Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help debugging Seamless Streaming. Getting a weird translation on initialization and then echos of the correct translation (translates it twice) #515

Closed
Jonovono opened this issue Sep 30, 2024 · 1 comment

Comments

@Jonovono
Copy link

Jonovono commented Sep 30, 2024

I have a sample repo here

https://github.com/StreamUI/QuicSeamlessStreaming/tree/master

I am using Quic to transport audio from client -> server -> client. I borrowed most my inspo from the hugging face example and colab example, but must have something messed up. I know it's working as I can sometimes get the correct translation.

For testing, I have a simple spanish audio file saying "how are you". For some reason the first output I get this weird repeat of saying "all of it, all of it". I believe I tested this with a different initial audio file and it wasn't the same, so its not always spitting out the same stuff. Now, if I stop the client and restart it, I seem to get the correct translation but it's echod / duplicated. So it repeats "How are you" twice.

2024-09-30 18:34:19,130 INFO -- server: Connection made.
2024-09-30 18:34:19,132 INFO -- quic: [066b4886080a7db7] Negotiated protocol version 0x00000001 (VERSION_1)
2024-09-30 18:34:21,248 INFO -- server: [send_output] Got text segment: All of it, all
2024-09-30 18:34:21,248 INFO -- server: [send_output] Got speech segment
2024-09-30 18:34:23,104 INFO -- server: [send_output] Got text segment: of it, all of it, all of it, all of it, all of it.
2024-09-30 18:34:23,104 INFO -- server: [send_output] Got speech segment
2024-09-30 18:34:26,575 INFO -- server: [send_output] Got text segment: How
2024-09-30 18:34:26,576 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:26,739 INFO -- server: Connection terminated
2024-09-30 18:34:26,739 INFO -- server: Cancelling tasks
2024-09-30 18:34:26,740 INFO -- server: [send_output] Sender task cancelled
2024-09-30 18:34:28,468 INFO -- server: Connection made.
2024-09-30 18:34:28,469 INFO -- quic: [2618da09a7676177] Negotiated protocol version 0x00000001 (VERSION_1)
2024-09-30 18:34:28,857 INFO -- server: [send_output] Got text segment: are
2024-09-30 18:34:28,857 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:31,501 INFO -- server: [send_output] Got text segment: you
2024-09-30 18:34:31,501 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:33,224 INFO -- server: [send_output] Got text segment: ?
2024-09-30 18:34:33,224 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:35,365 INFO -- server: [send_output] Got text segment: How
2024-09-30 18:34:35,365 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:39,059 INFO -- server: [send_output] Got text segment: are you?
2024-09-30 18:34:39,059 INFO -- server: [send_output] Got speech segment

Perhaps just by looking at my code someone will be able to spot what I did wrong (under 300 lines really).

My guess is its something with the model config, reset states / state configuration, or how I sent audio in. In the HF example you do a lot of preprocessing which I am ignoring right now.

It's also possible it's unrelated to the model, and something to do with how I am processing events async with quic (might explain the repeats...)

Any help would be great, so close yet so far!

@Jonovono
Copy link
Author

Jonovono commented Oct 1, 2024

Figured it out, mostly. Was chunking way too tiny.

2024-10-01 22:20:35,651 INFO -- server: [ingest_audio] Got input of length 623
2024-10-01 22:20:35,651 INFO -- server: [ingest_audio] Got input of length 622

Still lots of issues to debug, but not related to seamless ;p

@Jonovono Jonovono closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant