diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..b544ea8 --- /dev/null +++ b/.gitignore @@ -0,0 +1,8 @@ +.DS_Store +docs/.DS_Store +*/*/.DS_Store +*/*/*/.DS_Store +*/*/*/*/.DS_Store + + + diff --git a/docs/.DS_Store b/docs/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/docs/.DS_Store differ diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..48435b6 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,160 @@ + + + + + Audio samples for "NU-Wave: A Diffusion model for Audio Super Resolution" + + +

Audio samples for "NU-Wave: A Diffusion model for Audio Super Resolution"

+ + +

Paper(will updated): arXiv:TBD (Submitted to INTERSPEECH 2021)

+

Code: mindslab-ai/nuwave @ GitHub + +

+

Authors: Junhyeok Lee, Seungu Han @MINDsLab Inc., SNU

+

Abstract: + Abstract + In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on the neural vocoders based on diffusion probabilistic models. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), logspectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity than baselines (5.4-21%) as 3.0M parameters. The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon. +

+

This page contains a set of audio samples in support of the paper: it is suggested that the reader listen to the samples in conjunction with reading the paper.
+ All utterances were unseen during training, and the results are uncurated (NOT cherry-picked) unless specified.

+ + + +

+

Section Ⅰ: Examples for single speaker 24 kHz to 48 kHz upsampling

+ This section contain examples for single speaker, "p225" from VCTK datast. Upsamplring rate is 2, upsampling from 24kHz to 48kHz +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Original low resolution (24 kHz)Original high resolution (48 kHz)Linear Interpolation (48 kHz)U-Net (48 kHz)MU-GAN (48 kHz)NU-Wave (48 kHz)
+ + +

+

Section Ⅱ: Examples for multi speaker 24 kHz to 48 kHz upsampling

+ This section contain examples for unseen speakers. The model is trained with first 100 speakers of VCTK dataset and this samples are from remainder 8 speakers. Upsampling rate is 2, upsampling from 24kHz to 48kHz + +

+ + + + + + + + + + + + + + + + + + + + + + +
Original low resolution (24 kHz)Original high resolution (48 kHz)Linear Interpolation (48 kHz)U-Net (48 kHz)MU-GAN (48 kHz)NU-Wave (48 kHz)
+

+

Section Ⅰ: Examples for single speaker 16 kHz to 48 kHz upsampling

+ This section contain examples for single speaker, "p225" from VCTK datast. Upsamplring rate is 2, upsampling from 24kHz to 48kHz + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Original low resolution (16 kHz)Original high resolution (48 kHz)Linear Interpolation (48 kHz)U-Net (48 kHz)MU-GAN (48 kHz)NU-Wave (48 kHz)
+

+ +

Section Ⅱ: Examples for multi speaker 16 kHz to 48 kHz upsampling

+ This section contain examples for unseen speakers. The model is trained with first 100 speakers of VCTK dataset and this samples are from remainder 8 speakers. Upsampling rate is 3, upsampling from 16kHz to 48kHz + +

+ + + + + + + + + + + + + + + + + + + + + + +
Original low resolution (16 kHz)Original high resolution (48 kHz)Linear Interpolation (48 kHz)U-Net (48 kHz)MU-GAN (48 kHz)NU-Wave (48 kHz)
+

+ +