Skip to content

Code:Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

Notifications You must be signed in to change notification settings

darongliu/GAN_Harmonized_with_HMMs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GAN_Harmonized_with_HMMs

This is the implementation of our paper. In this paper, we proposed an unsupervised speech (phoneme) recogntion system which can achieve 33.1% phoneme error rate on TIMIT. This method developed a GAN-based model to achieve unsupervised phoneme recognition and we further use a set of HMMs to work in harmony with the GAN.

How to use

Dependencies

  1. tensorflow 1.13

  2. kaldi

  3. srilm (can be built with kaldi/tools/install_srilm.sh)

  4. librosa

Data preprocess

  • Usage:
  1. Modify path.sh with your path of Kaldi and srilm.
  2. Modify config.sh with your code path and timit path.
  3. Run $ bash preprocess.sh
  • This script will extract features and split dataset into train/test set.

  • The data which WFST-decoder needed also generate from here.

Train model

  • Usage:
  1. Modify the experimental setting in config.sh.
  2. Modify the GAN-based model's parameter in src/GAN-based-model/config.yaml.
  3. Run $ bash run.sh
  • This scipt contains the training flow for GAN-based model and HMM model.

  • GAN-based model generated the transcription for training HMM model.

  • HMM model refined the phoneme boundaries for training GAN-based model.

Note

  • Training process with boundaries generated by GAS (bnd_type=uns) is unstable, which need more training attempts to achieve the satisfactory performance.

Hyperparameters in config.sh

bnd_type : type of initial phoneme boundaries (orc/uns).

setting : matched and nonmatched case in our paper (match/nonmatch).

jobs : number of jobs in parallel (depends on your decive).

Reference

Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models, Kuan-Yu Chen, Che-Ping Tsai et.al.

Links

  1. The WFST decoder for phoneme classifier1 .
  2. The training scripts for Unsupervised HMM 1 .

Acknowledgement

Special thanks to Che-Ping Tsai (jackyyy0228) for kaldi parts! Special thanks to Sung-Feng Huang (b02901071) for pytorch version!

About

Code:Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 52.8%
  • Python 38.1%
  • Perl 9.1%