Crosstalk

a system for 2-way interruptible voice interactions between human and LLM

This is an open source example implementation of the Crosstalk method for voice interaction between human and AI. The main problem this method aims to address is the lack of 2-way interruptions in traditional implementations of turn-based AI voice assistants. Normally, a turn based AI voice assistant system uses a combination of speech recognition, speech synthesis, and LLM text completion in distinct stages to engage in a dialog with the user. While the AI speaks, the user's speech is not being recognized. Once the AI is done speaking, the user's speech is recognized, then the text response is generated, and the AI responds. This is a problem because it is not how humans interact with each other. Humans interrupt each other all the time. In the absence of natural interruptions, both parties are unable to adequately model each other socially and spend a lot of time waiting for the other to finish speaking.

The Crosstalk method is a simple way to implement 2-way interruptions in a turn-based AI voice assistant system. The method uses a single stream of speech recognition, speech synthesis, and LLM text completion. The AI and the user's speech are recognized simultaneously. We use diarization to separate the user's speech from the AI's speech. When the AI is speaking, its words are added to the dialog until the user interrupts. Once the user interrupts, the diarization will recognize a change of speaker, the AI will stop speaking, and the text completion will continue to run on the dialog until it is predicted that there is a change of speaker. If the change of speaker is predicted to be the AI, the AI will continue speaking. If the change of speaker is predicted to be the user, the AI will stop speaking and the user's speech will be added to the dialog. This process repeats until the user ends the conversation.

Setup

Clone the repository
Install the dependencies

npm install

Create a config.js file in the src directory with the following content:

const config = {
  "deepgram": {
    "apiKey": "",
    "apiUrl": "",
  },
  "openai": {
    "apiKey": "",
    "dangerouslyAllowBrowser": true,
    "baseUrl": "http://localhost:1234/v1" // or "https://api.openai.com/v1"
  }
}

export default config;

Start the development server

npm start

Open http://localhost:3000 to view it in your browser.

Todos:

Sources:

ReactJS documentation

This project was bootstrapped with Create React App.

Available Scripts

In the project directory, you can run:

`npm start`

Runs the app in the development mode.
Open http://localhost:3000 to view it in your browser.

The page will reload when you make changes.
You may also see any lint errors in the console.

`npm run build`

Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tailwind.config.js		tailwind.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crosstalk

Setup

Todos:

ReactJS documentation

Available Scripts

`npm start`

`npm run build`

About

Releases

Packages

Languages

tarzain/crosstalk

Folders and files

Latest commit

History

Repository files navigation

Crosstalk

Setup

Todos:

ReactJS documentation

Available Scripts

npm start

npm run build

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`npm start`

`npm run build`

Packages