Possible to use for real-time / streaming tasks? #2
-
Is it possible to use |
Beta Was this translation helpful? Give feedback.
Replies: 22 comments 63 replies
-
It doesn't support real-time per se, but you could build something similar by e.g. incrementally transcribing the audio every second. |
Beta Was this translation helpful? Give feedback.
-
Hello everybody I came across this api with the exact same thing in mind. I created an organization for this. iInterpret the envision is an app that can translate verbal speech in real time for phone calls or communication in person maybe with a bluetooth piece? This would be a blessing for some of my business in China to go from Mandarin to english seamlessly. If this interests you or anyone else reading this please join the organization and feel free to reach out to me via [email protected] |
Beta Was this translation helpful? Give feedback.
-
Thank you collegues. Working at the International Telecommunication Union (United Nations Specialized Agency for ICT) we would really like this for our meetings. ps. the title of the org for me is quite hard to read, 'l then i' |
Beta Was this translation helpful? Give feedback.
-
Here is another attempt for real-time streaming: rt_esl_csgo_1.mp4This is using |
Beta Was this translation helpful? Give feedback.
-
I've built Whisper Playground for developers to easily build real-time speech2text web apps Whisper.Playground.mp4 |
Beta Was this translation helpful? Give feedback.
-
I did the installation for whisper by this 'pip install git+https://github.com/openai/whisper.git ' . |
Beta Was this translation helpful? Give feedback.
-
You can check my project: https://github.com/appvoid/vosper |
Beta Was this translation helpful? Give feedback.
-
Please see my project below, which uses the Whisper Tiny Tflite Model to implement audio streaming.. |
Beta Was this translation helpful? Give feedback.
-
If you want reduce processing time of transcribe when you use whisper for streaming, you can use whisper decoder for get only tokens of transcribe and decode it using tokenizer. Because the buffer of audio from the streaming chunk dont have length until 30 second, and in the transcribe of whisper there temperature and logprob, and the other prob for get the best result of transcribe, it process will need more iteration, it means you will need time more longer |
Beta Was this translation helpful? Give feedback.
-
This is my take for React.js useWhisper React hook can now do real-time transcription. Repo: Demo: (Whisper seems to can not understand my accent 😅) use-whisper-real-time-transcription.mp4 |
Beta Was this translation helpful? Give feedback.
-
In essence, what we ultimately need is a Real-time Syllable Recognition engine with a mechanical keyboard precision, for example for mandarin, I get the syllables of "ni3 hao3 ren2 men2 zong1 guo2 han4 zi4" and send it to ChatGPT-4, if we have a concise Real-time Syllable Recognition engine, LLM will replace the entire speech recognition industry. ref: "Transcribe to IPA" is very important for realtime interaction application #318 (comment) |
Beta Was this translation helpful? Give feedback.
-
Hi guys, I implemented realtime Whisper streaming for long audios in Python. Going to share it soon. |
Beta Was this translation helpful? Give feedback.
-
Has anyone seen or implemented a solution that can transcribe and translate from english into another language in real-time or with slight delay? Many of the projects here are great but I'm not seeing the English -> Other Language functionality anywhere. |
Beta Was this translation helpful? Give feedback.
-
Hi, I have made a small wrapper around OpenAI whisper API which adds kind of "streaming" capability to the API It can be useful if you want to use existing API instead of running your own Whisper instance. It splits the input audio into chunks of 30s each and sends them one-by-one to the API, which leads to much faster initial response and streaming experience for use cases where speed is important. It can be pretty easily extended for audio streaming applications as well, though it will not be real-time (expect around 40s latency when using such approach, or may be less if you reduce the chunk size). |
Beta Was this translation helpful? Give feedback.
-
I've created a streaming whisper_server which sends audio from your mic through Whisper and streams as Server Sent Events or gRPC |
Beta Was this translation helpful? Give feedback.
-
have anyone compare accuracy of whisper vs wav2vec2 for live transcription ? from my understanding whisper needs to pad audio to 30s so 1-2s chunks may not suitable, maybe wav2vec2 offer better accuracy for short chunks |
Beta Was this translation helpful? Give feedback.
-
If you need real-time Whisper transcription in the browser, check out my TypeScript package 📦 Install with: npm install whisper-live More details here: https://github.com/Alireza29675/whisper-live Happy to help if you have any questions! |
Beta Was this translation helpful? Give feedback.
-
I have found https://arxiv.org/abs/2406.10052 a nice solution to solve streaming whisper. |
Beta Was this translation helpful? Give feedback.
-
Hi! Is there any repo which has real-time transcribing using any model [english or non english] which uses VAD to split chunks? I can't seem to find one with python as the core lang. |
Beta Was this translation helpful? Give feedback.
-
It doesn't. Use server, read the README.
Kishlay Kisu ***@***.***> schrieb am Do. 15. 8. 2024 um 18:28:
… this requires a input wav file. I want to do it using my laptop's
microphone in real time.
—
Reply to this email directly, view it on GitHub
<#2 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIRQXKSZYEAQIVOEQBDH5DZRTJKDAVCNFSM6AAAAAAQSII5EWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZVGAYDANQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
You can try this with transformer.js. Works with browser that has support for WebGPU (Chrome browser) https://github.com/xenova/transformers.js/tree/v3/examples/webgpu-whisper |
Beta Was this translation helpful? Give feedback.
-
Hello, guys. Does anyone use whisper in a project that transcribes small chunks of audio per turn? I was using speech_recognition library to do something like this but I need a whisper trained model cause it involves portuguese medical jargons, so the default whisper does not work so well even with the large model. |
Beta Was this translation helpful? Give feedback.
It doesn't support real-time per se, but you could build something similar by e.g. incrementally transcribing the audio every second.