diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..b544ea8 --- /dev/null +++ b/.gitignore @@ -0,0 +1,8 @@ +.DS_Store +docs/.DS_Store +*/*/.DS_Store +*/*/*/.DS_Store +*/*/*/*/.DS_Store + + + diff --git a/docs/.DS_Store b/docs/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/docs/.DS_Store differ diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..48435b6 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,160 @@ + + +
+ +Paper(will updated): arXiv:TBD (Submitted to INTERSPEECH 2021)
Code: mindslab-ai/nuwave @ GitHub + +
Authors: Junhyeok Lee, Seungu Han @MINDsLab Inc., SNU
Abstract: + Abstract + In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on the neural vocoders based on diffusion probabilistic models. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), logspectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity than baselines (5.4-21%) as 3.0M parameters. The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon. +
This page contains a set of audio samples in support of the paper: it is suggested that the reader listen to the samples in conjunction with reading the paper. + All utterances were unseen during training, and the results are uncurated (NOT cherry-picked) unless specified.
+ + + +Original low resolution (24 kHz) | +Original high resolution (48 kHz) | +Linear Interpolation (48 kHz) | +U-Net (48 kHz) | +MU-GAN (48 kHz) | +NU-Wave (48 kHz) | +
---|---|---|---|---|---|
+ | + | + | + | + | + |
+ | + | + | + | + | + |
Original low resolution (24 kHz) | +Original high resolution (48 kHz) | +Linear Interpolation (48 kHz) | +U-Net (48 kHz) | +MU-GAN (48 kHz) | +NU-Wave (48 kHz) | +
---|---|---|---|---|---|
+ | + | + | + | + | + |
Original low resolution (16 kHz) | +Original high resolution (48 kHz) | +Linear Interpolation (48 kHz) | +U-Net (48 kHz) | +MU-GAN (48 kHz) | +NU-Wave (48 kHz) | +
---|---|---|---|---|---|
+ | + | + | + | + | + |
+ | + | + | + | + | + |
Original low resolution (16 kHz) | +Original high resolution (48 kHz) | +Linear Interpolation (48 kHz) | +U-Net (48 kHz) | +MU-GAN (48 kHz) | +NU-Wave (48 kHz) | +
---|---|---|---|---|---|
+ | + | + | + | + | + |