pip install fad_pytorch
- runs in parallel on multiple processors and multiple GPUs (via
accelerate
) - supports multiple embedding methods:
- VGGish and PANN, both mono @ 16kHz
- OpenL3 and (LAION-)CLAP, stereo @ 48kHz
- uses publicly-available pretrained checkpoints for music (+other sources) for those models. (if you want Speech, submit a PR or an Issue; I don’t do speech.)
- favors ops in PyTorch rather than numpy (or tensorflow)
fad_gen
supports local data read or WebDataset (audio data stored in S3 buckets)- runs on CPU, CUDA, or MPS
This is designed to be run as 3 command-line scripts in succession. The
latter 2 (fad_embed
and fad_score
) are probably what most people
will want:
fad_gen
: produces directories of real & fake audio (given real data). Seefad_gen
documentation for calling sequence.fad_embed [options] <real_audio_dir> <fake_audio_dir>
: produces directories of embeddings of real & fake audiofad_score [options] <real_emb_dir> <fake_emb_dir>
: reads the embeddings & generates FAD score, for real (“$r$”) and fake (“$f$”):
See the Documentation Website.
- “
RuntimeError: CUDA error: invalid device ordinal
”: This happens when you have a “bad node” on an AWS cluster. Haven’t yet figured out what causes it or how to fix it. Workaround: Just add the current node to your SLURM--exclude
list, exit and retry. Note: it may take as many as 5 to 7 retries before you get a “good node”. - “FAD scores obtained from different embedding methods are wildly different!” …Yea. It’s not obvious that scores from different embedding methods should be comparable. Rather, compare different groups of audio files using the same embedding method, and/or check that FAD scores go down as similarity improves.
- “FAD score for the same dataset repeated (twice) is not exactly zero!” …Yea. There seems to be an uncertainty of around +/- 0.008. I’d say, don’t quote any numbers past the first decimal point.
This repo is still fairly “bare bones” and will benefit from more documentation and features as time goes on. Note that it is written using nbdev, so the things to do are:
- Fork this repo
- Clone your fork to your (local) machine
- Install nbdev:
python3 -m pip install -U nbdev
- Make changes by editing the notebooks in
nbs/
, not the.py
files infad_pytorch/
. - Run
nbdev_export
to export notebook changes to.py
files - For good measure, run
nbdev_install_hooks
andnbdev_clean
- especially if you’ve added any notebooks. - Do a
git status
to see all the.ipynb
and.py
files that need to be added & committed git add
those files and thengit commit
, and thengit push
- Take a look in your fork’s GitHub Actions tab, and see if the “test” and “deploy” CI runs finish properly (green light) or fail (red light)
- Once you get green lights, send in a Pull Request!
Feel free to ask me for tips with nbdev, it has quite a learning curve. You can also ask on fast.ai forums and/or fast.ai Discord
This repo is 2 weeks old. I’m not ready for this to be cited in your papers. I’d hate for there to be some mistake I haven’t found yet. Perhaps a later version will have citation info. For now, instead, there’s:
Disclaimer: Results from this repo are still a work in progress. While every effort has been made to test model outputs, the author takes no responsbility for mistakes. If you want to double-check via another source, see “Related Repos” below.
There are [several] others, but this one is mine. These repos didn’t have all the features I wanted, but I used them for inspiration: