Skip to content

Commit

Permalink
add genau and update dataset preperation
Browse files Browse the repository at this point in the history
  • Loading branch information
MoayedHajiAli committed Jun 24, 2024
1 parent 4aeaec9 commit 66c987d
Show file tree
Hide file tree
Showing 219 changed files with 448,277 additions and 11 deletions.
6 changes: 6 additions & 0 deletions .gitignore
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
__pycache__
**/__pycache__
**/data
**/pretrained_models
**/*.ckpt
**/*.pt
.DS_Store
**/.DS_Store
28 changes: 28 additions & 0 deletions GenAU/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
__pycache__
taming
log
**/log
logs
**/logs
esc50.zip
ESC-50-master
*.wav
ckpt
lightning_logs
mlx_submit_*
job_queue.sh
*.txt
*.cleaned
audiocaps_train.json
dataset
checkpoints
*.tar
condor*
wandb
audioldm_train/modules/fit
core*
.vscode
compute_clap.py
cal_clap_score.py
run_logs
samples
115 changes: 115 additions & 0 deletions GenAU/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
[![arXiv](ARXIV ICON)](ARXIV LINK)

# GenAU inference, training and evaluation
- [Inference](#inference)
* [Audio to text script](#text-to-audio)
* [Gradio demo](#gradio-demo)
* [Inference a list of promots](#inference-a-list-of-prompts)
- [Training](#training)
* [GenAU](#genau)
* [Finetuning GenAU](#finetuning-genau)
* [1D-VAE (optional)](#1d-vae-optional)
- [Evaluation](#evaluation)
- [Cite this work](#cite-this-work)
- [Acknowledgements](#acknowledgements)

# Environment initalization
For initializing your environment, please refer to the [general README](../README.md).

# Inference

## Text to Audio
To quickly generate an audio based on an input text prompt, run
```shell
python scripts/text_to_audio.py --prompt "Horses growl and clop hooves." --model "genau-full-l"
```
- This will automatically downloads and uses the model `genau-full-l` with default settings. You may change these parameters or provide your custome model config file and checkpoint path.
- Available models include `genau-full-l` (1.25B parameters) and `genau-full-s` (493M parameters)
- These models are trained to generate ambient sounds and is incapable of generating speech or music.
- Outputs will be saved by default at `samples/model_output` using the provided prompt as the file name.

## Gradio Demo
Run a local interactive demo with Gradio:
```shell
python app_text2audio.py
```

## Inference a list of prompts
Optionally, you may prepare a `.txt` file with your target prompts and run

```shell
python scripts/inference_file.py --list_inference <path-to-prompts-file> --model <model_name>

# Example
python scripts/inference_file.py --list_inference samples/prompts_list.txt --model "genau-full-l"
```


## Training

### Dataset
Please refer to the [dataset preperation README](../dataset_preperation/README.md) for instructions on downloading our dataset or preparing your own.

### GenAU
- Preapre a yaml config file for your experiments. A sample config file is provided at `settings/simple_runs/genau.yaml`
- Specify your project name and provide your Wandb key in the config file. A Wandb key can be obtained from [https://wandb.ai/authorize](https://wandb.ai/authorize)
- Optionally, provide your S3 bucket and folder to save intermediate checkpoints.
- By default, checkpoints will be saved under `run_logs/genau/train` at the same level as the config file.

```shell
# Training GenAU from scratch
python train/genau.py -c settings/simple_runs/genau.yaml
```

For multinode training, run
```shell
python -m torch.distributed.run --nproc_per_node=8 train/genau.py -c settings/simple_runs/genau.yaml
```
### Finetuning GenAU

- Prepare you custom dataset and obtain the dataset keys following [dataset preperation README](../dataset_preperation/README.md)
- Make a copy and adjust the default config file of `genau-full-l` which you can find under `pretrained_models/genau/genau-full-l.yaml`
- Add ids for your dataset keys under `dataset2id` attribute in the config file.

```shell
# Finetuning GenAU
python train/genau.py --reload_from_ckpt 'genau-full-l' \
--config <path-to-config-file> \
--dataset_keys "<dataset_key_1>" "<dataset_key_2>" ...
```


### 1D VAE (Optional)
By default, we offer a pre-trained 1D-VAE for GenAU training. If you prefer, you can train your own VAE by following the provided instructions.
- Prepare your own dataset following the instructions in the [dataset preperation README](../dataset_preperation/README.md)
- Preapre your yaml config file in a similar way to the GenAU config file
- A sample config file is provided at `settings/simple_runs/1d_vae.yaml`

```shell
python train/1d_vae.py -c settings/simple_runs/1d_vae.yaml
```

## Evaluation
- We follow [audioldm](https://github.com/haoheliu/AudioLDM-training-finetuning) to perform our evaulations.
- By default, the models will be evaluated periodically during training as specified in the config file. For each evaulation, a folder with the generated audio will be saved under `run_logs/train' at the same levels the specified config file.
- The code idenfities the test dataset in an already existing folder according to number of samples. If you would like to test on a new test dataset, register it in `scripts/generate_and_eval`

```shell

# Evaluate on an existing generated folder
python scripts/evaluate.py --log_path <path-to-the-experiment-folder>

# Geneate test audios from a pre-trained checkpoint and run evaulation
python scripts/generate_and_eval.py -c <path-to-config> -ckpt <path-to-pretrained-ckpt>
```
The evaluation result will be saved in a json file at the same level of the generated audio folder.

# Cite this work
If you found this useful, please consider citing our work

```TODO
```

# Acknowledgements
Our audio generation and evaluation codebase relies on [audioldm](https://github.com/haoheliu/AudioLDM-training-finetuning). We sincerely appreciate the authors for sharing their code openly.

19 changes: 19 additions & 0 deletions GenAU/audioldm_eval/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
ckpt/
*.pth
*.wav
*.npy
*.egg-info
__pycache__
vctk_test
.DS_*
script/*
datasets/*
test_fad/*
*.ckpt
*.json
audio
build
dist
*.pkl
pickle_check.py
test.py
20 changes: 20 additions & 0 deletions GenAU/audioldm_eval/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Copyright (c) 2012-2022 Scott Chacon and others

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
104 changes: 104 additions & 0 deletions GenAU/audioldm_eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Audio Generation Evaluation

This toolbox aims to unify audio generation model evaluation for easier future comparison.

## Quick Start

First, prepare the environment
```shell
pip install git+https://github.com/haoheliu/audioldm_eval
```

Second, generate test dataset by
```shell
python3 gen_test_file.py
```

Finally, perform a test run. A result for reference is attached [here](https://github.com/haoheliu/audioldm_eval/blob/main/example/paired_ref.json).
```shell
python3 test.py # Evaluate and save the json file to disk (example/paired.json)
```

## Evaluation metrics
We have the following metrics in this toolbox:

- Recommanded:
- FAD: Frechet audio distance
- ISc: Inception score
- Other for references:
- FD: Frechet distance, realized by PANNs, a state-of-the-art audio classification model
- KID: Kernel inception score
- KL: KL divergence (softmax over logits)
- KL_Sigmoid: KL divergence (sigmoid over logits)
- PSNR: Peak signal noise ratio
- SSIM: Structural similarity index measure
- LSD: Log-spectral distance

The evaluation function will accept the paths of two folders as main parameters.
1. If two folder have **files with same name and same numbers of files**, the evaluation will run in **paired mode**.
2. If two folder have **different numbers of files or files with different name**, the evaluation will run in **unpaired mode**.

**These metrics will only be calculated in paried mode**: KL, KL_Sigmoid, PSNR, SSIM, LSD.
In the unpaired mode, these metrics will return minus one.

## Evaluation on AudioCaps and AudioSet

The AudioCaps test set consists of audio files with multiple text annotations. To evaluate the performance of AudioLDM, we randomly selected one annotation per audio file, which can be found in the [accompanying json file](https://github.com/haoheliu/audioldm_eval/tree/c9e936ea538c4db7e971d9528a2d2eb4edac975d/example/AudioCaps).

Given the size of the AudioSet evaluation set with approximately 20,000 audio files, it may be impractical for audio generative models to perform evaluation on the entire set. As a result, we randomly selected 2,000 audio files for evaluation, with the corresponding annotations available in a [json file](https://github.com/haoheliu/audioldm_eval/tree/c9e936ea538c4db7e971d9528a2d2eb4edac975d/example/AudioSet).

For more information on our evaluation process, please refer to [our paper](https://arxiv.org/abs/2301.12503).

## Example

```python
import torch
from audioldm_eval import EvaluationHelper

# GPU acceleration is preferred
device = torch.device(f"cuda:{0}")

generation_result_path = "example/paired"
target_audio_path = "example/reference"

# Initialize a helper instance
evaluator = EvaluationHelper(16000, device)

# Perform evaluation, result will be print out and saved as json
metrics = evaluator.main(
generation_result_path,
target_audio_path,
limit_num=None # If you only intend to evaluate X (int) pairs of data, set limit_num=X
)
```

## Note

- Update on 24 June 2023:
- **Issues on model evaluation:** I found the PANNs based Frechet Distance and KL score is not as robust as FAD sometimes. For example, when the generation are all silent audio, the FAD and KL still indicate model perform very well, while FAD and Inception Score (IS) can still reflect the model true bad performance. Sometimes the resample method on audio can significantly affect the FD (+-30) and KL (+-0.4) performance as well.
- To address this issue, in another branch of this repo ([passt_replace_panns](https://github.com/haoheliu/audioldm_eval/tree/passt_replace_panns)), I change the PANNs model to Passt, which I found to be more robust to resample method and other trival mismatches.

- **Update on code:** The calculation of FAD is slow. Now, after each calculation of a folder, the code will save the FAD feature into an .npy file for later reference.

## TODO

- [ ] Add pretrained AudioLDM model.
- [ ] Add CLAP score

## Cite this repo

If you found this tool useful, please consider citing
```bibtex
@article{liu2023audioldm,
title={AudioLDM: Text-to-Audio Generation with Latent Diffusion Models},
author={Liu, Haohe and Chen, Zehua and Yuan, Yi and Mei, Xinhao and Liu, Xubo and Mandic, Danilo and Wang, Wenwu and Plumbley, Mark D},
journal={arXiv preprint arXiv:2301.12503},
year={2023}
}
```

## Reference

> https://github.com/toshas/torch-fidelity
> https://github.com/v-iashin/SpecVQGAN
7 changes: 7 additions & 0 deletions GenAU/audioldm_eval/audioldm_eval/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from .metrics.fid import calculate_fid
from .metrics.isc import calculate_isc
from .metrics.kid import calculate_kid
from .metrics.kl import calculate_kl
from .eval import EvaluationHelper

print("2023 -06 -22")
Empty file.
Loading

0 comments on commit 66c987d

Please sign in to comment.