Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: use wit.ai speech to text and deepl/open ai to transtate it #11

Open
Zhen-Bo opened this issue Sep 11, 2023 · 2 comments

Comments

@Zhen-Bo
Copy link

Zhen-Bo commented Sep 11, 2023

Feature Request

Description of the feature you'd like:

Want to use the user's own wit.ai and deepl API key for real-time speech-to-text translation.

Feature Background:

After using it for a while, I found that there is often a translation delay issue (interval=3~5) when using the medium model.
It also frequently results in blank spaces.

I don't know if it's due to the delay in voice recognition or incorrect identification of language type that causes the translation failure.

And English is not my native language. After receiving English, I need to spend some time converting it into my native language. So I hope to increase the variety of translation languages.

Proposed Solution

  • speech-to-text: Use wit.ai to convert audio files into text wit.ai docs

    • Free to use
    • Users can customize the unique language corresponding to the API token, so as not to cause incorrect language identification.
    • The recognition speed is very fast and accurate.
      (I use it to identify Google reCAPTCHA voice verification, which is very fast and accurate.)
  • transalte: use deepl or chatGPT to translate to user target language

    • Deepl free api and GPT-3.5 turbo is free to use
    • Can set target language by user (for me: KO (text from wit.ai) -> ZH)
@fortypercnt
Copy link
Owner

Sorry for the delayed response. For the incorrect language identification issue, you should be able to fix that by setting the --language flag to the language spoken in the stream. The model only tries to identify the language if you leave the flag at the default ("auto").
The point of the repo was that you can use OpenAI's whisper model locally, so I don't wanna replace it with wit.ai.

Regarding adding an additional API call for translation into non-english languages: I like the idea, maybe I will add that when I get some free time. OpenAI's APIs are not free to use, only the web version of GPT-3.5 turbo is free.

@Zhen-Bo
Copy link
Author

Zhen-Bo commented Nov 13, 2023

I have used the --language setting to specify the language, but there are still cases where it cannot be recognized correctly.
As for using an additional API for translation, I suggest letting users fill in their own API Key (if they are using open AI or deepl's API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants