Skip to content

Commit

Permalink
sample update
Browse files Browse the repository at this point in the history
  • Loading branch information
jun_mac committed Apr 5, 2021
1 parent aaee44c commit 7c73c2f
Show file tree
Hide file tree
Showing 33 changed files with 74 additions and 12 deletions.
29 changes: 29 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
BSD 3-Clause License

Copyright (c) 2021, MINDsLab Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Binary file removed docs/.DS_Store
Binary file not shown.
57 changes: 45 additions & 12 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>Audio samples for "NU-Wave: A Diffusion model for Audio Super Resolution"</title>
Expand All @@ -14,17 +12,16 @@ <h2>Audio samples for "NU-Wave: A Diffusion model for Audio Super Resolution"</h
</p></div>
<div><p><b>Authors:</b> Junhyeok Lee, Seungu Han @<a href="https://mindslab.ai">MINDsLab Inc.</a>, SNU</p></div>
<div><p><b>Abstract:</b>
Abstract
In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on the neural vocoders based on diffusion probabilistic models. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), logspectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity than baselines (5.4-21%) as 3.0M parameters. The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon.
</p></div>
<p>This page contains a set of audio samples in support of the paper: it is suggested that the reader listen to the samples in conjunction with reading the paper. </br>
<p>This page contains a set of audio samples to support the paper; we suggest that the reader listen to the samples when reading the paper.</br>
<b>All utterances were unseen during training, and the results are uncurated (NOT cherry-picked) unless specified.</b></p>

<body style="font-family: Helvetica">

<br> </br>
<h3> Section &#8544;: Examples for single speaker 24 kHz to 48 kHz upsampling</h3>
This section contain examples for single speaker, "p225" from VCTK datast. Upsamplring rate is 2, upsampling from 24kHz to 48kHz
<h3> Section &#8544;: Examples for SingleSpeaker (seen speaker during training) upsampled from 24kHz to 48kHz.</h3>
This section contains examples for the speakerp225" from the VCTK dataset. The upsampling rate is 2 (from 24kHz to 48kHz).
<br> </br>
<table>
<thead>
Expand Down Expand Up @@ -61,8 +58,8 @@ <h3> Section &#8544;: Examples for single speaker 24 kHz to 48 kHz upsampling</h


<br> </br>
<h3> Section &#8545;: Examples for multi speaker 24 kHz to 48 kHz upsampling</h3>
This section contain examples for unseen speakers. The model is trained with first 100 speakers of VCTK dataset and this samples are from remainder 8 speakers. Upsampling rate is 2, upsampling from 24kHz to 48kHz
<h3> Section &#8545;: Examples for MultiSpeaker (unseen speaker during training) upsampled from 24kHz to 48kHz.</h3>
This section contains examples for the unseen speakers. The model is trained on the first 100 speakers of the VCTK dataset. The following samples are generated for the remaining 8 speakers. The upsampling rate is 2 (from 24kHz to 48kHz).

<br> </br>
<table>
Expand All @@ -78,6 +75,15 @@ <h3> Section &#8545;: Examples for multi speaker 24 kHz to 48 kHz upsampling</h3
</thead>

<tbody>
<tr>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_0.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_0.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_0.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_0.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_0.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_0.wav" type="audio/wav"></audio></td>
</tr>

<tr>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_1.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_1.wav" type="audio/wav"></audio></td>
Expand All @@ -86,11 +92,38 @@ <h3> Section &#8545;: Examples for multi speaker 24 kHz to 48 kHz upsampling</h3
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_1.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_1.wav" type="audio/wav"></audio></td>
</tr>

<tr>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_2.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_2.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_2.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_2.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_2.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_2.wav" type="audio/wav"></audio></td>
</tr>

<tr>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_3.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_3.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_3.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_3.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_3.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_3.wav" type="audio/wav"></audio></td>
</tr>

<tr>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_4.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_4.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_4.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_4.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_4.wav" type="audio/wav"></audio></td>
<td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_4.wav" type="audio/wav"></audio></td>
</tr>
</tbody>
</table>
<br> </br>
<h3> Section &#8544;: Examples for single speaker 16 kHz to 48 kHz upsampling</h3>
This section contain examples for single speaker, "p225" from VCTK datast. Upsamplring rate is 2, upsampling from 24kHz to 48kHz
<h3> Section &#8546;: Examples for SingleSpeaker (seen speaker during training) upsampled from 16kHz to 48kHz.</h3>
This section contains examples for the speakerp225" from the VCTK dataset. The upsampling rate is 3 (from 16kHz to 48kHz).

<br> </br>
<table>
Expand Down Expand Up @@ -128,8 +161,8 @@ <h3> Section &#8544;: Examples for single speaker 16 kHz to 48 kHz upsampling</h
</table>
<br> </br>

<h3> Section &#8545;: Examples for multi speaker 16 kHz to 48 kHz upsampling</h3>
This section contain examples for unseen speakers. The model is trained with first 100 speakers of VCTK dataset and this samples are from remainder 8 speakers. Upsampling rate is 3, upsampling from 16kHz to 48kHz
<h3> Section &#8547;: Examples for multi speaker (unseen during training) upsampled from 16kHz to 48kHz.</h3>
This section contains examples for the unseen speakers. The model is trained on the first 100 speakers of the VCTK dataset. The following samples are generated for the remaining 8 speakers. The upsampling rate is 3 (from 16kHz to 48kHz).

<br> </br>
<table>
Expand Down
Binary file added docs/samples/multi_x2/linear_48k/sample_0.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/linear_48k/sample_1.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/linear_48k/sample_2.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/linear_48k/sample_3.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/linear_48k/sample_4.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/mugan_48k/sample_0.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/mugan_48k/sample_1.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/mugan_48k/sample_2.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/mugan_48k/sample_3.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/mugan_48k/sample_4.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/nuwave_48k/sample_0.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/nuwave_48k/sample_1.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/nuwave_48k/sample_2.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/nuwave_48k/sample_3.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/nuwave_48k/sample_4.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_24k/sample_0.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_24k/sample_1.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_24k/sample_2.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_24k/sample_3.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_24k/sample_4.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_48k/sample_0.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_48k/sample_1.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_48k/sample_2.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_48k/sample_3.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/original_48k/sample_4.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/unet_48k/sample_0.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/unet_48k/sample_1.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/unet_48k/sample_2.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/unet_48k/sample_3.wav
Binary file not shown.
Binary file added docs/samples/multi_x2/unet_48k/sample_4.wav
Binary file not shown.

0 comments on commit 7c73c2f

Please sign in to comment.