Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
efugier committed Jul 19, 2024
1 parent c768ab1 commit 84eb58e
Showing 1 changed file with 35 additions and 33 deletions.
68 changes: 35 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,12 @@ Answers might be slow depending on your setup, you may want to try the third par

- [Installation](#installation-)
- [Usage](#usage)
- [Voice](#voice)
- [A few examples to get started 🐈‍⬛](#a-few-examples-to-get-started-)
- [Integrating with editors](#integrating-with-editors)
- [Example workflows](#example-workflows)
- [Configuration](#configuration) ← please read this carefully
- [Ollama setup](#ollama-setup) ← easiest way to get running for free
- [Voice](#voice)
- [How to help?](./CONTRIBUTING.md)

## Installation
Expand Down Expand Up @@ -112,37 +112,6 @@ You can use it to **accomplish tasks in the CLI** but **also in your editors** (

The key to make this work seamlessly is a good default prompt that tells the model to behave like a CLI tool an not write any unwanted text like markdown formatting or explanations.

# Voice

⚠️ **Testing in progress** I only have a linux system and wasn't able to test the recording commands for other OS. The good news is you can make up your own that works and then plug it in the config.

Use the `-v` flag to ask for voice input then press space to end it. It will replace the prompt customization arg.

- uses openai whisper
- make sure your `recording_command` field works in your termimal command, it should create a wav file
- requires you to have an openai key in your `.api_keys.toml`
- you can still use any prompt template or text model to get your output

```
sc -v
sc test -v
sc test -v -c src/**/*
```

## How does it work?

`smartcat` call an external program that handles the voice recording and instructs it to save the result in a wav file. It then listens to keyboard inputs and stops the recording when space is pressed.

The recording is then sent to a speech to text model, the resulting transcript is finally added to the prompt and sent to the text model to get an answer.

On linux:
On Mac:
On windows:

To debug, you can check the `conversation.toml` file or listen to the `audio.wav` in the smart config home and see what the model heard and transcripted.

## A few examples to get started 🐈‍⬛

```
Expand Down Expand Up @@ -273,6 +242,7 @@ Three files are used:
[ollama] # local API, no key required
url = "http://localhost:11434/api/chat"
default_model = "phi3"
timeout_seconds = 30

[openai] # each supported api has their own config section with api and url
api_key = "<your_api_key>"
Expand Down Expand Up @@ -364,7 +334,39 @@ see [the config setup file](./src/config/mod.rs) for more details.
4. Make sure the serving is available `curl http://localhost:11434` which should say "Ollama is running", else you might need to run `ollama serve`
5. `smartcat` will now be able to reach your local ollama, enjoy!

⚠️ Answers might be slow depending on your setup, you may want to try the third party APIs for an optimal workflow.
⚠️ Answers might be slow depending on your setup, you may want to try the third party APIs for an optimal workflow. Timeout is configurable and set to 30s by default.

# Voice

⚠️ **Testing in progress** I only have a linux system and wasn't able to test the recording commands for other OS. The good news is you can make up your own that works and then plug it in the config.

Use the `-v` flag to ask for voice input then press space to end it. It will replace the prompt customization arg.

- uses openai whisper
- make sure your `recording_command` field works in your termimal command, it should create a wav file
- requires you to have an openai key in your `.api_keys.toml`
- you can still use any prompt template or text model to get your output

```
sc -v
sc test -v
sc test -v -c src/**/*
```

## How does it work?

`smartcat` call an external program that handles the voice recording and instructs it to save the result in a wav file. It then listens to keyboard inputs and stops the recording when space is pressed.

The recording is then sent to a speech to text model, the resulting transcript is finally added to the prompt and sent to the text model to get an answer.

On linux:
On Mac:
On windows:

To debug, you can check the `conversation.toml` file or listen to the `audio.wav` in the smart config home and see what the model heard and transcripted.


## How to help?

Expand Down

0 comments on commit 84eb58e

Please sign in to comment.