Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

milestone: Ichigo v0.5 Multi-lingual #116

Open
1 of 7 tasks
hahuyhoang411 opened this issue Nov 19, 2024 · 0 comments
Open
1 of 7 tasks

milestone: Ichigo v0.5 Multi-lingual #116

hahuyhoang411 opened this issue Nov 19, 2024 · 0 comments
Assignees
Labels
type: epic A major feature or initiative
Milestone

Comments

@hahuyhoang411
Copy link
Contributor

hahuyhoang411 commented Nov 19, 2024

Goal

Expand Ichigo model's capabilities to support multiple Asian languages (and more), with initial focus on Vietnamese and Singlish, achieving production-grade speech recognition and synthesis quality.

Description

Current State

  • Ichigo model currently supports English language only
  • Limited ability to handle Asian language phonemes and tonal variations

Proposed Solution

  • Develop a multi-lingual speech foundation model based on WhisperSpeech approach
  • Training Speech Encoder for Asian languages

Resources

Multi-lingual Base

  • LibriSpeech (English baseline)
  • Common Voice (Multi-language)
  • GigaSpeech (Vie, Thai & Indo)
  • FLEURS (1000+ hours, 102 languages)
  • ... etc

Language-Specific

  • Vietnamese

    • ViVoice (100+ hours)
    • BUD500 (500 hours)
    • ... etc
  • Singlish

    • National Speech Corpus (NSC)

Reference:

Experiment Log

  • Add Google Sheet here (public view-only)

Tasklist

@hahuyhoang411 hahuyhoang411 added the type: epic A major feature or initiative label Nov 19, 2024
@bachvudinh bachvudinh self-assigned this Nov 20, 2024
@tuanlda78202 tuanlda78202 pinned this issue Nov 20, 2024
@github-project-automation github-project-automation bot moved this to Investigating in Jan & Cortex Nov 22, 2024
@tikikun tikikun moved this from Investigating to In Progress in Jan & Cortex Nov 25, 2024
@dan-homebrew dan-homebrew changed the title epic: Multi-lingual Ichigo epic: Ichigo v0.5 Multi-lingual Nov 25, 2024
@dan-homebrew dan-homebrew changed the title epic: Ichigo v0.5 Multi-lingual milestone: Ichigo v0.5 Multi-lingual Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: epic A major feature or initiative
Projects
Status: In Progress
Development

No branches or pull requests

5 participants