Releases · janhq/ichigo

30 Dec 13:16

hahuyhoang411

v0.1-iw

2124598

Ichigo Whisper v0.1 Latest

Latest

Ichigo Whisper v0.1 Release Notes:

🎉 Introducing Ichigo Whisper v0.1
We are thrilled to announce our very first speech tokenizer built upon the Whisper-medium model!
Ichigo Whisper is a lightweight (22M parameters), open-source speech tokenizer designed to optimize performance for multilingual while maintaining strong English capabilities. Unlike continuous embedding models, Ichigo Whisper compresses speech into discrete tokens, enabling seamless integration with large language models (LLMs) for advanced speech understanding.

🚀 Performance Highlights:

1. Vietnamese

Model	Codebook Size	Test Dataset	Test Samples	WER
Ichigo Whisper	2561	viVoice	1000	11.36
Whisper Medium	-	viVoice	1000	18.64

2. English

Model	Codebook Size	Test Dataset	Test Samples	WER
Ichigo Whisper	2561	LibriTTS-R	1000	12.96
Whisper Medium	-	LibriTTS-R	1000	12.99

🔗 Resources:

Model Weights: Hugging Face: Ichigo Whisper v0.1
Live Demo: Ichigo Whisper Online

Assets 2

11 Nov 04:22

tikikun

v0.4

28e5934

v0.4 🍓 Ichigo!

Change log for Ichigo v0.4:

Unified Training Pipeline: Consolidated Phase 2 and Phase 3 into a single-phase training approach.
Training data enhancements:
- Migrated speech noise data and speech multi-turn data from Phase 3 into Phase 2.
- Introduced noise-augmented multi-turn conversations: we synthetic by injecting noise turn in speech and text-only multi-turn datasets.

Performance Improvements vs v0.3:

Enhanced Intelligence: Improved benchmark scores on MMLU (64.66).
Extended Context Handling
Advanced Noise Management: More robust rejection of noisy environmental inputs
Improving Multi-turn Capabilities.

Model weight: https://huggingface.co/collections/homebrewltd/ichigo-v04-67317bde6dfdfdd55dddbc6e
Live demo at: https://ichigo.homebrew.ltd/

Assets 2

05 Nov 07:47

tikikun

v0.3

28e5934

First release of 🍓 Ichigo!

Model weight can be downloaded at:

https://huggingface.co/collections/homebrewltd/ichigo-66ffc7484ef31ec5596ef6d0

Changelog: v0.2 vs v0.3

Overall Comparison

Phase	Aspect	v0.2	v0.3
Pretraining	Data Size	2.42M	3.87M
	Data Source	parler-tts/mls_eng_10k	facebook/multilingual_librispeech
	Data Synthetic Pipeline	Using WhisperVQ(old checkpoint: whisper-vq-stoks-medium-en+pl.model) to tokenize english-only audio.	Using latest checkpoint whisper-vq-stoks-v3-7lang.model for 8 lang audio.
	Epoch	1	1
	Global batch size	480	480
	Learning Rate	2e-4	2e-4
	Warmup Steps	80	50
	Weight Decay	0.005	0.005
	Max length	512	512
	Precision	bf16	bf16
Instruction Phase	Data Size	929K	1.89M + 165k (phase 3)
	Preprocessing	Using rule-base to remove all hard-to-pronounce prompt	Utilizing rule-based methods to filter out hard-to-pronounce prompts, and rephrasing certain LLM-generated responses to sound more natural and human-like.
	Data Synthetic Pipeline	Using old text-to-speech checkpoint to generate: t2s-small-yt.model then using whisper-vq-stoks-medium-en+pl.model to tokenize audio.	Change t2s checkpoint to t2s-v1.1-small-en+pl.model and whisperVQ checkpoint to whisper-vq-stoks-v3-7lang.model.
	Epoch	5	1
	Global batch size	128	256
	Gradient Acc Step per device	1	8
	Learning Rate	1e-4	7e-5 and 1.5e-5 for phase 3
	Warmup Steps	80	73 and 8 for phase 3
	Weight Decay	0.005	0.005
	Max length	1024	4096
	Precision	bf16	bf16

Instruction Phase Data Task Types

Task Type	v0.2	v0.3
Speech Multiturn	None	150k(Mostly 2 turns around 10k >=4 turns
Speech QA	679k samples	1.332M samples
Transcription	250k samples(Using a special token to denote a transcription task)	400k samples(Using 6 different prompts)
Noise Audio	None	8k samples(Using Qwen2.5-72B to generate diverse synthetic answers for randomly generated sound tokens, with lengths matching the distribution of the Speech QA prompt)
Text-only	None	150k samples including: 100k multiturn + 50k single turn

Performance

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ichigo Whisper v0.1 Release Notes:

🚀 Performance Highlights:

1. Vietnamese

2. English

🔗 Resources:

Change log for Ichigo v0.4:

Performance Improvements vs v0.3:

Changelog: v0.2 vs v0.3

Overall Comparison

Instruction Phase Data Task Types

Performance

Releases: janhq/ichigo

Ichigo Whisper v0.1

Ichigo Whisper v0.1 Release Notes:

🚀 Performance Highlights:

1. Vietnamese

2. English

🔗 Resources:

v0.4 🍓 Ichigo!

Change log for Ichigo v0.4:

Performance Improvements vs v0.3:

First release of 🍓 Ichigo!

Changelog: v0.2 vs v0.3

Overall Comparison

Instruction Phase Data Task Types

Performance