HiFi-GAN-based synthethis modules to synthesize waveform from source-filter vocoder features trained on JVS or VCTK.
Scripts for training are available in another repo.
hifigan_jvs_40d_600k
is used in the default configuration.
Name | Feature | Dataset | Iteration | Link |
---|---|---|---|---|
hifigan_jvs_40d_600k | 40-D Melcep. + F0 (WORLD) | JVS | 600K | Download |
hifigan_jvs_40d_1000k | 40-D Melcep. + F0 (WORLD) | JVS | 1000K | Download |
hifigan_vctk_40d_600k | 40-D Melcep. + F0 (WORLD) | VCTK | 600K | Download |
hifigan_vctk-jvs_40d_400k | 40-D Melcep. + F0 (WORLD) | JVS+VCTK | 400K | Download |
hifigan_vctk-jvs_60d_400k | 60-D Melcep. + F0 (WORLD) | JVS+VCTK | 400K | Download |
Speech restoration models trained on simulated data.
Name | Dataset | Distortion | Feature | Link |
---|---|---|---|---|
jsut-bandlimited_melspec.ckpt | JSUT Baseic5000 | Bandlimited | MelSpec | Download |
jsut-bandlimited_vocfeats.ckpt | JSUT Baseic5000 | Bandlimited | SourceFilter | Download |
jsut-clip_melspec.ckpt | JSUT Baseic5000 | Clipping | MelSpec | Download |
jsut-clip_vocfeats.ckpt | JSUT Baseic5000 | Clipping | SourceFilter | Download |
jsut-qr_melspec.ckpt | JSUT Baseic5000 | Quantized & Resampled | MelSpec | Download |
jsut-qr_vocfeats.ckpt | JSUT Baseic5000 | Quantized & Resampled | SourceFilter | Download |
jsut-overdrive_melspec.ckpt | JSUT Baseic5000 | Overdrive | MelSpec | Download |
jsut-overdrive_vocfeats.ckpt | JSUT Baseic5000 | Overdrive | SourceFilter | Download |
Supervisedly pretrained model to apply our method to low-resource settings.
There are two type of the analysis module; Normal
and GST
.
Normal
is to extract restored speech features and channel features simultaneously in the analysis module.
GST
extracts channel features using a separated GST encoder.
We use the Normal
method in our paper because we have confirmed that the Normal method is of slightly higher quality in our preliminary experiments.
Name | Analysis module type | Feature | Dataset | Link |
---|---|---|---|---|
pretrain_melspec_normal.ckpt | Normal | MelSpec | JVS | Download |
pretrain_melspec_gst.ckpt | GST | MelSpec | JVS | Download |
pretrain_vocfeats_normal.ckpt | Normal | SourceFilter | JVS | Download |
pretrain_vocfeats_gst.ckpt | GST | SourceFilter | JVS | Download |
The following model was trained on the real data described in the paper and is intended to be used for audio effect transfer.
This operation enables to give effects to arbitrary speech data as if it were an old recording.
Note that the following model uses MelSpec
features.
Name | Distortion | Link |
---|---|---|
tono.ckpt | Tono no mukashibanashi | Download |