Skip to content

ucsd-ml-arts/generative-audio-parker-audio

Repository files navigation

Project 3 Generative Audio

Parker Addison, [email protected]

Abstract

Which bird songs are we used to hearing? Which may we be hearing less and less of? And which will we never hear again, save in rare recordings? Perhaps we can experience the past, present, and soon-to-be-gone future as a way to reflect on our impact on birds.

The result of this art project is a latent space interpolation between bird songs from species that are Common, Endangered, and now Extinct.

I hand selected two songs each from common and endangered species, and three songs from extinct species. The chosen birds from each category were as follows:

Common - Mourning Dove, American Robin
Endangered - Snowy Plover, Spectacled Elder
Extinct - Kauai O'o, Huia, ???*

I encoded my training samples as Mel spectrograms, then built and trained a convlutional autoencoder on the spectrograms with a 8x compression factor. After choosing desired birdsongs, I created three interpolated frames between each of these encoded bird samples in the latent space, and then decoded the interpolated frames and applied a sharpening filter. Finally, I converted the concatenation of these frames back to a spectrogram and to an audio file.

Model/Data

  • Training data: from Nanni et al. 2016 bird songs dataset is not included in this repository. It is a ~10GB collection of 2814 bird song syllables from 46 different species of birds in southern Brazil. Note that it appears all syllables have been looped until the audio is at least 1min 20s long. I converted each clip into 3-second splits, and converted each split into a spectrogram (saved as a numpy array). This produced a total of 79,377 spectrogram samples.
  • Convolutional Autoencoder weights: model*.h5, trained up to 90 epochs on a training data set consisting of 5000 random samples of the above training set
  • Testing/chosen interpolation samples: manually chosensee References

Code

See proj2.ipynb.

Results

See output2.wav, spectro.png.

Both are representations of the final interpolated audio.

Technical Notes

  • Requires ffmpeg and librosa for working with audio and spectrograms.

References

Papers/Articles:

Datasets:

Other ideas/art projects involving ML and birdsongs that I stumbled upon while working:

About

generative-audio-parker-audio created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published