Pronouncer

A Rust-based text-to-speech synthesizer that uses the CMU phonetic dictionary and pre-recorded phonemes to generate funny-sounding speech using my voice (or your own samples which you can compile into the program by replacing the ones in pronouncer_lib/audio).

Features

Text-to-speech synthesis using CMU phonetic dictionary
High-quality pre-recorded phonemes for natural sound
Smooth audio transitions using advanced crossfading
Outputs standard WAV audio files (44.1kHz, 16-bit)
Static compilation of audio data for standalone binaries

Installation

Ensure you have Rust installed (https://rustup.rs/)
Clone this repository
Build the project:

cargo build --release

Usage

Run the program with words as arguments:

cargo run --release -- "hello world"

Or run it interactively:

cargo run --release
Enter a string: hello world

The program will generate an output.wav file containing the synthesized speech.

Project Structure

The project is organized as a Rust workspace containing two main crates:

pronouncer_lib

Core library containing the text-to-speech engine:

src/lib.rs - Main library interface and audio processing
src/phoneme.rs - Phoneme enum and conversion functions
build.rs - Build script for processing dictionary and audio files
audio/ - Pre-recorded WAV files for each phoneme
build/ - Build-time resources including CMU dictionary

pronouncer_bin

Command-line interface executable:

src/main.rs - CLI implementation
Handles argument parsing and file I/O

Key Components

Build System
- Processes CMU dictionary at compile time
- Serializes phoneme WAV files into binary data
- Generates optimized lookup tables
Phoneme System
- 39 distinct phonemes based on CMU dictionary
- Each phoneme has a corresponding WAV recording
- Efficient enum-based representation
Audio Processing
- 44.1kHz 16-bit mono WAV output
- Crossfading algorithm for smooth transitions
- Fileless audio storage - phoneme WAV data is serialized and embedded directly into the binary
Dictionary System
- CMU dictionary-based word to phoneme conversion
- Fallback to character-by-character pronunciation
- Efficient hashmap-based lookups

Technical Details

Build Process

The build script (build.rs) processes the CMU dictionary and WAV files
Dictionary is converted to a binary lookup table using bincode serialization
WAV files are serialized and embedded directly into the binary
Static initialization provides immediate access to audio data at runtime

Audio Synthesis Process

Input text is normalized and split into words
Words are looked up in the CMU dictionary
Unknown words fall back to character-by-character pronunciation
Phoneme sequences are converted to audio samples
Advanced crossfading is applied between phonemes
Final audio is written to WAV file

Performance Considerations

Audio data is compiled directly into the binary, eliminating runtime file I/O
Efficient bincode serialization for compact data storage
High-performance hashmap-based dictionary lookups
Optimized crossfading algorithm for smooth transitions

Dependencies

Core dependencies:

bincode: Fast serialization
hashbrown: High-performance hashmaps
hound: WAV file handling
lazy_static: Efficient static initialization
serde: Serialization framework

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pronouncer

Features

Installation

Usage

Project Structure

pronouncer_lib

pronouncer_bin

Key Components

Technical Details

Build Process

Audio Synthesis Process

Performance Considerations

Dependencies

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pronouncer

Features

Installation

Usage

Project Structure

pronouncer_lib

pronouncer_bin

Key Components

Technical Details

Build Process

Audio Synthesis Process

Performance Considerations

Dependencies

License