Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turn old gateway smart assistant into extension add-on #1040

Open
kgiori opened this issue Sep 23, 2020 · 7 comments
Open

turn old gateway smart assistant into extension add-on #1040

kgiori opened this issue Sep 23, 2020 · 7 comments
Labels

Comments

@kgiori
Copy link

kgiori commented Sep 23, 2020

In an earlier version of the gateway, there was an integrated voice assistant experiment that was taken out of the UI. I'd like to see it come back as an add-on. And anyone know how to make a "talking home" animated GIF (to replace the fox)?
gw-smart-assist-dialog

@mrstegeman
Copy link
Contributor

Just FYI, @lissyx had initially done this with the voice-addon when he was rewriting it to use DeepSpeech. However, getting voice data from a (remote) browser across an IPC connection to the add-on is non-trivial. He worked around that by using a separate, hard-coded local WebSocket, which prevented it from working on outside networks.

@kgiori
Copy link
Author

kgiori commented Sep 24, 2020

That's a step ahead of what I was imagining as a starting point. I was thinking of simply using the same smart assistant UI (except change the fox to a talking house), same intent-parser, and same STT back-end as before (processed in the Cloud, like Firefox Voice), except putting it all into an optional add-on.

For step 2, it would be great to enable the user to configure an option of processing the STT locally on the gateway, no cloud required. If both options are available, we'd need a way to be clear to users where the STT command is being processed. Perhaps the smart assistant page could be split in half. One side = cloud, the other side = local. And if that wouldn't be technically feasible, the user could simply pick one or the other (and ideally switch at any time).

@madb1lly
Copy link

Hi all,

Could we have the assistant running on gateway with a web UI loike before, but without the STT engine, it just accepts written text like before.

STT can be run on whatever device is being used to access the web UI, that way the STT runs locally at the client and the assistant runs on the gateway.

Everyone can use whichever STT engine they prefer and it's not something which the WebThings project needs to worry about, we only need to maintain the assistant.

Cheers 🙂

@kgiori
Copy link
Author

kgiori commented Sep 28, 2020

Great idea. Then would the assistant have the option of which STT add-on (text input) could be configured to tie into it? or would it accept the text from any/multiple STT engines? I could even imagine a configuration to allow the user to select the old cloud-based STT engine of the original assistant, in case they don't have a local STT option.

@benfrancis
Copy link
Member

benfrancis commented Sep 28, 2020

@madb1lly Yes actually we considered that architecture when we first built it, by using the Web Speech API in the browser. Unfortunately the STT (speech recognition) part of the Web Speech API is still only supported in Chromium-based browsers as far as I know, despite years of work to try to get it turned on in Firefox by default with a choice of back ends.

I would personally support using that approach in a smart assistant extension and only support text-based commands on browsers which don't support speech input. That also shifts the hardware requirements of speech recognition away from the gateway to the client which may have better suited hardware. One side effect of also relying on the browser for the STT (speech synthesis) part (which we never implemented) is that the assistant will have a different voice depending on which browser you're using, but that could be OK.

@kgiori Running the STT on the client side would mean relying on the browser for speech recognition. It might be feasible to send audio directly to a cloud service from the client as a fallback, rather than going via the gateway.

@madb1lly
Copy link

Hi @kgiori,

Well, the user would have the option of which STT engine to use, but the options wouldn't necessarily be presented by the Gateway, e.g. they can use voice typing on a smartphone, or example, or perhaps Firefox Voice on desktop. These options would be completely for the user to decide, independently of the gateway, and the gateway would not care.

The way I see it working is:

  • I load the Gateway UI
  • I go to the Assistant page
  • I select the text entry field
  • I use voice typing to enter the command I want to use, e.g. "Turn Heaters On"
  • The local voice typing app I have recognises my speech and converts it into text in the Assistant UI's webpage text entry field and virtually presses the return key
  • The Assistant interprets the text string and executes the command (if it understands)

As an option, the Gateway admin could opt to use a cloud-hosted STT engine, but the only way I can see this working easily is for the WebThings infrastructure to host that and make its optional use part of the add-on... which is how I think it worked, before, isn't it? Configuring the gateway to use any arbitrary cloud hosted STT engine would probably be too difficult to be worth it.

Cheers 🙂

PS - I see that @benfrancis gave a far more eloquent answer whilst I was typing! 😆

@kgiori
Copy link
Author

kgiori commented Sep 28, 2020

Count me in. I'd love to have STT feed a smart assistant add-on (extension add-on with its own GUI page). And if I have to use Chrome or manually enable the web speech API in Firefox, to use those for STT processing, that's fine too.

With respect to my local gateway, I currently use the Voice-contoller add-on, and I've tried Voco too. Would it be possible to connect the Voice-controller Add-on or the Voco Add-on to this smart assistant extension add-on? And if so, would it be as an additional STT input? or would it only be possible as a replacement for the STT processing done by a web speech API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants