sample update

maum-ai · Apr 5, 2021 · 7c73c2f · 7c73c2f
1 parent aaee44c
commit 7c73c2f
Show file tree

Hide file tree

Showing 33 changed files with 74 additions and 12 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,29 @@
+BSD 3-Clause License
+
+Copyright (c) 2021, MINDsLab Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/docs/.DS_Store b/docs/.DS_Store
diff --git a/docs/index.html b/docs/index.html
@@ -1,5 +1,3 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
-
 <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
 
   <title>Audio samples for "NU-Wave: A Diffusion model for Audio Super Resolution"</title>
@@ -14,17 +12,16 @@ <h2>Audio samples for "NU-Wave: A Diffusion model for Audio Super Resolution"</h
      </p></div>
     <div><p><b>Authors:</b> Junhyeok Lee, Seungu Han @<a href="https://mindslab.ai">MINDsLab Inc.</a>, SNU</p></div>
     <div><p><b>Abstract:</b>
-      Abstract
       In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on the neural vocoders based on diffusion probabilistic models. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), logspectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity than baselines (5.4-21%) as 3.0M parameters. The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon.
     </p></div>
-    <p>This page contains a set of audio samples in support of the paper: it is suggested that the reader listen to the samples in conjunction with reading the paper. </br>
+    <p>This page contains a set of audio samples to support the paper; we suggest that the reader listen to the samples when reading the paper.</br>
     <b>All utterances were unseen during training, and the results are uncurated (NOT cherry-picked) unless specified.</b></p>
 
   <body style="font-family: Helvetica">
 
   	<br> </br>
-    <h3> Section &#8544;: Examples for single speaker 24 kHz to 48 kHz upsampling</h3>
-    This section contain examples for single speaker, "p225" from VCTK datast. Upsamplring rate is 2, upsampling from 24kHz to 48kHz
+    <h3> Section &#8544;: Examples for SingleSpeaker (seen speaker during training) upsampled from 24kHz to 48kHz.</h3>
+    This section contains examples for the speaker “p225" from the VCTK dataset. The upsampling rate is 2 (from 24kHz to 48kHz).
     <br> </br>
     <table>
       <thead>
@@ -61,8 +58,8 @@ <h3> Section &#8544;: Examples for single speaker 24 kHz to 48 kHz upsampling</h
 
 
     <br> </br>
-    <h3> Section &#8545;: Examples for multi speaker 24 kHz to 48 kHz upsampling</h3>
-    This section contain examples for unseen speakers. The model is trained with first 100 speakers of VCTK dataset and this samples are from remainder 8 speakers. Upsampling rate is 2, upsampling from 24kHz to 48kHz
+    <h3> Section &#8545;: Examples for MultiSpeaker (unseen speaker during training) upsampled from 24kHz to 48kHz.</h3>
+        This section contains examples for the unseen speakers. The model is trained on the first 100 speakers of the VCTK dataset. The following samples are generated for the remaining 8 speakers. The upsampling rate is 2 (from 24kHz to 48kHz). 
 
     <br> </br>
     <table>
@@ -78,6 +75,15 @@ <h3> Section &#8545;: Examples for multi speaker 24 kHz to 48 kHz upsampling</h3
         </thead>
 
         <tbody>
+          <tr>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_0.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_0.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_0.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_0.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_0.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_0.wav" type="audio/wav"></audio></td>
+          </tr>
+
           <tr>
             <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_1.wav" type="audio/wav"></audio></td>
             <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_1.wav" type="audio/wav"></audio></td>
@@ -86,11 +92,38 @@ <h3> Section &#8545;: Examples for multi speaker 24 kHz to 48 kHz upsampling</h3
             <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_1.wav" type="audio/wav"></audio></td>
             <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_1.wav" type="audio/wav"></audio></td>
           </tr>
+
+          <tr>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_2.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_2.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_2.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_2.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_2.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_2.wav" type="audio/wav"></audio></td>
+          </tr>
+
+          <tr>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_3.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_3.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_3.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_3.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_3.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_3.wav" type="audio/wav"></audio></td>
+          </tr>
+
+          <tr>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_24k/sample_4.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/original_48k/sample_4.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/linear_48k/sample_4.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/unet_48k/sample_4.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/mugan_48k/sample_4.wav" type="audio/wav"></audio></td>
+            <td><audio controls style="width: 250px; height: 50px"><source src="samples/multi_x2/nuwave_48k/sample_4.wav" type="audio/wav"></audio></td>
+          </tr>
         </tbody>
     </table>
     <br> </br>
-    <h3> Section &#8544;: Examples for single speaker 16 kHz to 48 kHz upsampling</h3>
-    This section contain examples for single speaker, "p225" from VCTK datast. Upsamplring rate is 2, upsampling from 24kHz to 48kHz
+    <h3> Section &#8546;: Examples for SingleSpeaker (seen speaker during training) upsampled from 16kHz to 48kHz.</h3>
+        This section contains examples for the speaker “p225" from the VCTK dataset. The upsampling rate is 3 (from 16kHz to 48kHz).
 
     <br> </br>
     <table>
@@ -128,8 +161,8 @@ <h3> Section &#8544;: Examples for single speaker 16 kHz to 48 kHz upsampling</h
     </table>
     <br> </br>
 
-    <h3> Section &#8545;: Examples for multi speaker 16 kHz to 48 kHz upsampling</h3>
-    This section contain examples for unseen speakers. The model is trained with first 100 speakers of VCTK dataset and this samples are from remainder 8 speakers. Upsampling rate is 3, upsampling from 16kHz to 48kHz
+    <h3> Section &#8547;: Examples for multi speaker (unseen during training) upsampled from 16kHz to 48kHz.</h3>
+        This section contains examples for the unseen speakers. The model is trained on the first 100 speakers of the VCTK dataset. The following samples are generated for the remaining 8 speakers. The upsampling rate is 3 (from 16kHz to 48kHz).
 
     <br> </br>
     <table>

diff --git a/docs/samples/multi_x2/linear_48k/sample_0.wav b/docs/samples/multi_x2/linear_48k/sample_0.wav
diff --git a/docs/samples/multi_x2/linear_48k/sample_1.wav b/docs/samples/multi_x2/linear_48k/sample_1.wav
diff --git a/docs/samples/multi_x2/linear_48k/sample_2.wav b/docs/samples/multi_x2/linear_48k/sample_2.wav
diff --git a/docs/samples/multi_x2/linear_48k/sample_3.wav b/docs/samples/multi_x2/linear_48k/sample_3.wav
diff --git a/docs/samples/multi_x2/linear_48k/sample_4.wav b/docs/samples/multi_x2/linear_48k/sample_4.wav
diff --git a/docs/samples/multi_x2/mugan_48k/sample_0.wav b/docs/samples/multi_x2/mugan_48k/sample_0.wav
diff --git a/docs/samples/multi_x2/mugan_48k/sample_1.wav b/docs/samples/multi_x2/mugan_48k/sample_1.wav
diff --git a/docs/samples/multi_x2/mugan_48k/sample_2.wav b/docs/samples/multi_x2/mugan_48k/sample_2.wav
diff --git a/docs/samples/multi_x2/mugan_48k/sample_3.wav b/docs/samples/multi_x2/mugan_48k/sample_3.wav
diff --git a/docs/samples/multi_x2/mugan_48k/sample_4.wav b/docs/samples/multi_x2/mugan_48k/sample_4.wav
diff --git a/docs/samples/multi_x2/nuwave_48k/sample_0.wav b/docs/samples/multi_x2/nuwave_48k/sample_0.wav
diff --git a/docs/samples/multi_x2/nuwave_48k/sample_1.wav b/docs/samples/multi_x2/nuwave_48k/sample_1.wav
diff --git a/docs/samples/multi_x2/nuwave_48k/sample_2.wav b/docs/samples/multi_x2/nuwave_48k/sample_2.wav
diff --git a/docs/samples/multi_x2/nuwave_48k/sample_3.wav b/docs/samples/multi_x2/nuwave_48k/sample_3.wav
diff --git a/docs/samples/multi_x2/nuwave_48k/sample_4.wav b/docs/samples/multi_x2/nuwave_48k/sample_4.wav
diff --git a/docs/samples/multi_x2/original_24k/sample_0.wav b/docs/samples/multi_x2/original_24k/sample_0.wav
diff --git a/docs/samples/multi_x2/original_24k/sample_1.wav b/docs/samples/multi_x2/original_24k/sample_1.wav
diff --git a/docs/samples/multi_x2/original_24k/sample_2.wav b/docs/samples/multi_x2/original_24k/sample_2.wav
diff --git a/docs/samples/multi_x2/original_24k/sample_3.wav b/docs/samples/multi_x2/original_24k/sample_3.wav
diff --git a/docs/samples/multi_x2/original_24k/sample_4.wav b/docs/samples/multi_x2/original_24k/sample_4.wav
diff --git a/docs/samples/multi_x2/original_48k/sample_0.wav b/docs/samples/multi_x2/original_48k/sample_0.wav
diff --git a/docs/samples/multi_x2/original_48k/sample_1.wav b/docs/samples/multi_x2/original_48k/sample_1.wav
diff --git a/docs/samples/multi_x2/original_48k/sample_2.wav b/docs/samples/multi_x2/original_48k/sample_2.wav
diff --git a/docs/samples/multi_x2/original_48k/sample_3.wav b/docs/samples/multi_x2/original_48k/sample_3.wav
diff --git a/docs/samples/multi_x2/original_48k/sample_4.wav b/docs/samples/multi_x2/original_48k/sample_4.wav
diff --git a/docs/samples/multi_x2/unet_48k/sample_0.wav b/docs/samples/multi_x2/unet_48k/sample_0.wav
diff --git a/docs/samples/multi_x2/unet_48k/sample_1.wav b/docs/samples/multi_x2/unet_48k/sample_1.wav
diff --git a/docs/samples/multi_x2/unet_48k/sample_2.wav b/docs/samples/multi_x2/unet_48k/sample_2.wav
diff --git a/docs/samples/multi_x2/unet_48k/sample_3.wav b/docs/samples/multi_x2/unet_48k/sample_3.wav
diff --git a/docs/samples/multi_x2/unet_48k/sample_4.wav b/docs/samples/multi_x2/unet_48k/sample_4.wav