You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Quic to transport audio from client -> server -> client. I borrowed most my inspo from the hugging face example and colab example, but must have something messed up. I know it's working as I can sometimes get the correct translation.
For testing, I have a simple spanish audio file saying "how are you". For some reason the first output I get this weird repeat of saying "all of it, all of it". I believe I tested this with a different initial audio file and it wasn't the same, so its not always spitting out the same stuff. Now, if I stop the client and restart it, I seem to get the correct translation but it's echod / duplicated. So it repeats "How are you" twice.
2024-09-30 18:34:19,130 INFO -- server: Connection made.
2024-09-30 18:34:19,132 INFO -- quic: [066b4886080a7db7] Negotiated protocol version 0x00000001 (VERSION_1)
2024-09-30 18:34:21,248 INFO -- server: [send_output] Got text segment: All of it, all
2024-09-30 18:34:21,248 INFO -- server: [send_output] Got speech segment
2024-09-30 18:34:23,104 INFO -- server: [send_output] Got text segment: of it, all of it, all of it, all of it, all of it.
2024-09-30 18:34:23,104 INFO -- server: [send_output] Got speech segment
2024-09-30 18:34:26,575 INFO -- server: [send_output] Got text segment: How
2024-09-30 18:34:26,576 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:26,739 INFO -- server: Connection terminated
2024-09-30 18:34:26,739 INFO -- server: Cancelling tasks
2024-09-30 18:34:26,740 INFO -- server: [send_output] Sender task cancelled
2024-09-30 18:34:28,468 INFO -- server: Connection made.
2024-09-30 18:34:28,469 INFO -- quic: [2618da09a7676177] Negotiated protocol version 0x00000001 (VERSION_1)
2024-09-30 18:34:28,857 INFO -- server: [send_output] Got text segment: are
2024-09-30 18:34:28,857 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:31,501 INFO -- server: [send_output] Got text segment: you
2024-09-30 18:34:31,501 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:33,224 INFO -- server: [send_output] Got text segment: ?
2024-09-30 18:34:33,224 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:35,365 INFO -- server: [send_output] Got text segment: How
2024-09-30 18:34:35,365 WARNING -- server: [send_output] Received non-speech segment.
2024-09-30 18:34:39,059 INFO -- server: [send_output] Got text segment: are you?
2024-09-30 18:34:39,059 INFO -- server: [send_output] Got speech segment
Perhaps just by looking at my code someone will be able to spot what I did wrong (under 300 lines really).
My guess is its something with the model config, reset states / state configuration, or how I sent audio in. In the HF example you do a lot of preprocessing which I am ignoring right now.
It's also possible it's unrelated to the model, and something to do with how I am processing events async with quic (might explain the repeats...)
Any help would be great, so close yet so far!
The text was updated successfully, but these errors were encountered:
I have a sample repo here
https://github.com/StreamUI/QuicSeamlessStreaming/tree/master
I am using Quic to transport audio from client -> server -> client. I borrowed most my inspo from the hugging face example and colab example, but must have something messed up. I know it's working as I can sometimes get the correct translation.
For testing, I have a simple spanish audio file saying "how are you". For some reason the first output I get this weird repeat of saying "all of it, all of it". I believe I tested this with a different initial audio file and it wasn't the same, so its not always spitting out the same stuff. Now, if I stop the client and restart it, I seem to get the correct translation but it's echod / duplicated. So it repeats "How are you" twice.
Perhaps just by looking at my code someone will be able to spot what I did wrong (under 300 lines really).
My guess is its something with the model config, reset states / state configuration, or how I sent audio in. In the HF example you do a lot of preprocessing which I am ignoring right now.
It's also possible it's unrelated to the model, and something to do with how I am processing events async with quic (might explain the repeats...)
Any help would be great, so close yet so far!
The text was updated successfully, but these errors were encountered: