Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Google STT #49

Open
pietrop opened this issue Jun 19, 2020 · 2 comments
Open

Adding support for Google STT #49

pietrop opened this issue Jun 19, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed New Feature STT

Comments

@pietrop
Copy link
Owner

pietrop commented Jun 19, 2020

As requested by @will0225, it be great to add support for Google STT.

Disclaimer

I personally don't have a need for Google STT at the moment, (quiet happy with the other STT options) but happy to provide guidance if anyone else wants to have a go and make a PR, as it would open up options for adding more languages etc..

Previous attempt in autoEdit 2

In autoEdit V2 I made an attempt OpenNewsLabs/autoEdit_2#40 (comment) PR OpenNewsLabs/autoEdit_2#97

Possible Issues

But run into a few issues

  • Google node SDK seemed not to be compatible with electron (without re-building electron?)
  • Google STT for longer recognition, eg up to 80 min requires the files to be stored in google cloud storage. Which means you'd have to first upload it there to then transcribe it. Possibly requiring extra credentials and setup for the storage service. (as well as deciding on how to handle that logic. eg do you delete a file from google cloud storage once it has done transcribing? keeping it there might get expensive over time)

Before getting started

There's a few things to figure out / investigate

  1. can you get google STT node SDK to work inside electron? - To try this, you can either try in a fork of this repo, or do a simple demo project with the electron and the google STT node SDK to test it out.
  2. Decide how to handle the Google Cloud Storage logic. (see comment above)

To do give it a go

you can look at how it has been done for AssemblyAI.

dev setup

obv first thing first get the app run locally, seeREADME#setup

add it as an option in transcriber module

/src/ElectronWrapper/lib/transcriber

in the index.js file the switch statment /src/ElectronWrapper/lib/transcriber/index.js#L54 look at the case for AssemblyA and create a similar one for GoogleSTT.

create a GCP STT module in the transcriber module

Back in the transcriber module create a folder/ module for GoogleCloudSTT eg google-stt. Similar to the AssemblyAI one.
This module will use the google cloud Node STT SDK to talk to google. and will have a module to convert the result into the DPE format used by autoEdit. (You can use this modulegcp-to-dpe to do the conversion)

Credentials

You'd notice that the assemblyAI transcriber module requires another module to get the credentials src/ElectronWrapper/lib/transcriber/assemblyai/index.js#L3.

Which means you'll need to modify it to support GCP STT as well src/stt-settings/credentials.js#L88

Credentials UI

Now we need to change the UI of the settings window to add the option for GCP STT, and allow to both add credentials and chose the language. This corresponds to the initial setup page in the user manual

Which means modifying these two react components

apologies that window view is all in one file, and not modularized, but it was for ease of development, as it would have been laborious to add another bundling step for a the settings window etc...

Anyway can provide more info on how to modify those components if needed once/if you get to it, this seems plenty for now.

@pietrop
Copy link
Owner Author

pietrop commented Feb 26, 2021

Some good news, this seems to be a new thing

You can retrieve the results of the operation using the google.longrunning.Operations method. Results remain available for retrieval for 5 days (120 hours). Audio content can be sent directly to Speech-to-Text from a local file, or the API can process audio content stored in { storage_name }. Audio files longer than 1 minute must be stored in a Cloud Storage bucket in order to be transcribed by Speech-to-Text. Performing asynchronous speech recognition on a local file longer than 1 minute will result in either an error or an incomplete transcription.

Transcribing long audio files using a local file

So in theory with this in mind it could be possible to integrate with the electron app after all. Altho need to revisit how you get and add credentials from GCP STT to run this locally.

@pietrop pietrop mentioned this issue Feb 26, 2021
10 tasks
@pietrop
Copy link
Owner Author

pietrop commented Feb 27, 2021

@pietrop pietrop added the STT label Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed New Feature STT
Projects
None yet
Development

No branches or pull requests

1 participant