Skip to content

Commit

Permalink
README/docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
jun_mac committed Jul 27, 2021
1 parent e79cc5e commit 12f855c
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 9 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ docs/.DS_Store
*/*/*/.DS_Store
*/*/*/*/.DS_Store
__pycache__/*
*.swp



7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
**NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling**<br>
Junhyeok Lee, Seungu Han @ [MINDsLab Inc.](https://github.com/mindslab-ai), SNU

Paper(arXiv): https://arxiv.org/abs/2104.02321 (Accepted to INTERSPEECH 2021)<br>
Audio Samples: https://mindslab-ai.github.io/nuwave<br>
[![arXiv](https://img.shields.io/badge/arXiv-2104.02321-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2104.02321) [![GitHub Repo stars](https://img.shields.io/github/stars/mindslab-ai/nuwave?color=yellow&label=NU-Wave&logo=github&style=flat-square)](https://github.com/mindslab-ai/nuwave) [![githubio](https://img.shields.io/badge/GitHub.io-audio_samples-blue?logo=Github&style=flat-square)](https://mindslab-ai.github.io/nuwave/)

Official Pytorch+[Lightning](https://github.com/PyTorchLightning/pytorch-lightning) Implementation for NU-Wave.<br>

Expand All @@ -22,14 +21,14 @@ Update: torch.log --> torch.log10 on lsd, value and lsd formula in the paper is
Before running our project, you need to download and preprocess dataset to `.pt` files
1. Download [VCTK dataset](https://datashare.ed.ac.uk/handle/10283/3443)
2. Remove speaker `p280` and `p315`
3. Modify path of downloaded dataset `data:dir` in `hparameters.yaml`
3. Modify path of downloaded dataset `data:dir` in `hparameter.yaml`
4. run `utils/wav2pt.py`
```shell script
$ python utils/wav2pt.py
```

## Training
1. Adjust `hparameters.yaml`, especially `train` section.
1. Adjust `hparameter.yaml`, especially `train` section.
```yaml
train:
batch_size: 18 # Dependent on GPU memory size
Expand Down
15 changes: 10 additions & 5 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<div class="container" style="max-width:1500px;">

<title>Audio samples for "NU-Wave: A Diffusion model for Neural Audio Upsampling"</title>

</head>
<h2>Audio samples for "NU-Wave: A Diffusion model for Neural Audio Upsampling"</h2>

</article>
<div><p><b>Paper:</b> <a href="https://arxiv.org/abs/2104.02321">arXiv:2104.02321</a> (Accepted to INTERSPEECH 2021)</p></div>
<div><p><b>Code(available soon):</b> <a href="https://github.com/mindslab-ai/nuwave">mindslab-ai/nuwave @ GitHub</a>
<iframe src="https://ghbtns.com/github-btn.html?user=mindslab-ai&repo=nuwave&type=star&count=true" frameborder="0" scrolling="0" width="150" height="20" title="GitHub"></iframe>
</p></div>
<div><p><b>Authors:</b> Junhyeok Lee, Seungu Han @<a href="https://mindslab.ai">MINDsLab Inc.</a>, SNU</p></div>
<p>
<a href="https://arxiv.org/abs/2104.02321" rel="nofollow"><img src="https://img.shields.io/badge/arXiv-2104.02321-brightgreen.svg?style=flat-square" style="max-width:100%;"></a>
<a href="https://github.com/mindslab-ai/nuwave"><img src="https://img.shields.io/github/stars/mindslab-ai/nuwave?color=yellow&amp;label=NU-Wave&amp;logo=github&amp;style=flat-square" style="max-width:100%;"></a>
<a href="https://www.interspeech2021.org/" rel="nofollow"><img src="https://img.shields.io/badge/Accepted-INTERSPEECH%202021-blue?style=flat-square" style="max-width:100%;"></a></p>


<div><p><b>Authors:</b> <a href="mailto:[email protected]">Junhyeok Lee</a>, <a href="mailto:[email protected]">Seungu Han</a> @<a href="https://mindslab.ai">MINDsLab Inc.</a>, SNU</p></div>
<div><p><b>Abstract:</b>
In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on neural vocoders. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), log-spectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity (3.0M parameters) than baselines (5.4-21%). The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon.
</p></div>
Expand Down Expand Up @@ -287,4 +291,5 @@ <h3> Section &#8547;: Examples for multi speaker (unseen speaker during training
</table>
<br> </br>
</body>
</div>
</html>

0 comments on commit 12f855c

Please sign in to comment.