diff --git a/README.md b/README.md index 9bc02f1..46d62e4 100644 --- a/README.md +++ b/README.md @@ -45,12 +45,12 @@ Answers might be slow depending on your setup, you may want to try the third par - [Installation](#installation-) - [Usage](#usage) -- [Voice](#voice) - [A few examples to get started 🐈‍⬛](#a-few-examples-to-get-started-) - [Integrating with editors](#integrating-with-editors) - [Example workflows](#example-workflows) - [Configuration](#configuration) ← please read this carefully - [Ollama setup](#ollama-setup) ← easiest way to get running for free +- [Voice](#voice) - [How to help?](./CONTRIBUTING.md) ## Installation @@ -112,37 +112,6 @@ You can use it to **accomplish tasks in the CLI** but **also in your editors** ( The key to make this work seamlessly is a good default prompt that tells the model to behave like a CLI tool an not write any unwanted text like markdown formatting or explanations. -# Voice - -⚠️ **Testing in progress** I only have a linux system and wasn't able to test the recording commands for other OS. The good news is you can make up your own that works and then plug it in the config. - -Use the `-v` flag to ask for voice input then press space to end it. It will replace the prompt customization arg. - -- uses openai whisper -- make sure your `recording_command` field works in your termimal command, it should create a wav file -- requires you to have an openai key in your `.api_keys.toml` -- you can still use any prompt template or text model to get your output - -``` -sc -v - -sc test -v - -sc test -v -c src/**/* -``` - -## How does it work? - -`smartcat` call an external program that handles the voice recording and instructs it to save the result in a wav file. It then listens to keyboard inputs and stops the recording when space is pressed. - -The recording is then sent to a speech to text model, the resulting transcript is finally added to the prompt and sent to the text model to get an answer. - -On linux: -On Mac: -On windows: - -To debug, you can check the `conversation.toml` file or listen to the `audio.wav` in the smart config home and see what the model heard and transcripted. - ## A few examples to get started 🐈‍⬛ ``` @@ -273,6 +242,7 @@ Three files are used: [ollama] # local API, no key required url = "http://localhost:11434/api/chat" default_model = "phi3" +timeout_seconds = 30 [openai] # each supported api has their own config section with api and url api_key = "" @@ -364,7 +334,39 @@ see [the config setup file](./src/config/mod.rs) for more details. 4. Make sure the serving is available `curl http://localhost:11434` which should say "Ollama is running", else you might need to run `ollama serve` 5. `smartcat` will now be able to reach your local ollama, enjoy! -⚠️ Answers might be slow depending on your setup, you may want to try the third party APIs for an optimal workflow. +⚠️ Answers might be slow depending on your setup, you may want to try the third party APIs for an optimal workflow. Timeout is configurable and set to 30s by default. + +# Voice + +⚠️ **Testing in progress** I only have a linux system and wasn't able to test the recording commands for other OS. The good news is you can make up your own that works and then plug it in the config. + +Use the `-v` flag to ask for voice input then press space to end it. It will replace the prompt customization arg. + +- uses openai whisper +- make sure your `recording_command` field works in your termimal command, it should create a wav file +- requires you to have an openai key in your `.api_keys.toml` +- you can still use any prompt template or text model to get your output + +``` +sc -v + +sc test -v + +sc test -v -c src/**/* +``` + +## How does it work? + +`smartcat` call an external program that handles the voice recording and instructs it to save the result in a wav file. It then listens to keyboard inputs and stops the recording when space is pressed. + +The recording is then sent to a speech to text model, the resulting transcript is finally added to the prompt and sent to the text model to get an answer. + +On linux: +On Mac: +On windows: + +To debug, you can check the `conversation.toml` file or listen to the `audio.wav` in the smart config home and see what the model heard and transcripted. + ## How to help?