Skip to content

Latest commit

 

History

History
55 lines (43 loc) · 4.26 KB

pretrained_models.md

File metadata and controls

55 lines (43 loc) · 4.26 KB

Pretrained models

Pretrained HiFi-GAN with SourceFilter features

HiFi-GAN-based synthethis modules to synthesize waveform from source-filter vocoder features trained on JVS or VCTK.
Scripts for training are available in another repo.
hifigan_jvs_40d_600k is used in the default configuration.

Name Feature Dataset Iteration Link
hifigan_jvs_40d_600k 40-D Melcep. + F0 (WORLD) JVS 600K Download
hifigan_jvs_40d_1000k 40-D Melcep. + F0 (WORLD) JVS 1000K Download
hifigan_vctk_40d_600k 40-D Melcep. + F0 (WORLD) VCTK 600K Download
hifigan_vctk-jvs_40d_400k 40-D Melcep. + F0 (WORLD) JVS+VCTK 400K Download
hifigan_vctk-jvs_60d_400k 60-D Melcep. + F0 (WORLD) JVS+VCTK 400K Download

SSL pretarined models for speech restoration

Speech restoration models trained on simulated data.

Name Dataset Distortion Feature Link
jsut-bandlimited_melspec.ckpt JSUT Baseic5000 Bandlimited MelSpec Download
jsut-bandlimited_vocfeats.ckpt JSUT Baseic5000 Bandlimited SourceFilter Download
jsut-clip_melspec.ckpt JSUT Baseic5000 Clipping MelSpec Download
jsut-clip_vocfeats.ckpt JSUT Baseic5000 Clipping SourceFilter Download
jsut-qr_melspec.ckpt JSUT Baseic5000 Quantized & Resampled MelSpec Download
jsut-qr_vocfeats.ckpt JSUT Baseic5000 Quantized & Resampled SourceFilter Download
jsut-overdrive_melspec.ckpt JSUT Baseic5000 Overdrive MelSpec Download
jsut-overdrive_vocfeats.ckpt JSUT Baseic5000 Overdrive SourceFilter Download

Supervisedly pretrained models

Supervisedly pretrained model to apply our method to low-resource settings.
There are two type of the analysis module; Normal and GST.
Normal is to extract restored speech features and channel features simultaneously in the analysis module.
GST extracts channel features using a separated GST encoder.
We use the Normal method in our paper because we have confirmed that the Normal method is of slightly higher quality in our preliminary experiments.

Name Analysis module type Feature Dataset Link
pretrain_melspec_normal.ckpt Normal MelSpec JVS Download
pretrain_melspec_gst.ckpt GST MelSpec JVS Download
pretrain_vocfeats_normal.ckpt Normal SourceFilter JVS Download
pretrain_vocfeats_gst.ckpt GST SourceFilter JVS Download

SSL pretarined models for audio effect transfer

The following model was trained on the real data described in the paper and is intended to be used for audio effect transfer.
This operation enables to give effects to arbitrary speech data as if it were an old recording.
Note that the following model uses MelSpec features.

Name Distortion Link
tono.ckpt Tono no mukashibanashi Download