Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from soundfile to torchaudio #360

Closed
6 tasks
hbredin opened this issue Apr 22, 2020 · 4 comments · Fixed by #492
Closed
6 tasks

Switch from soundfile to torchaudio #360

hbredin opened this issue Apr 22, 2020 · 4 comments · Fixed by #492
Assignees
Milestone

Comments

@hbredin
Copy link
Member

hbredin commented Apr 22, 2020

  • rename pyannote.audio.features.utils into pyannote.audio.features.io
  • add unit tests for pyannote.audio.features.io on WAV files
  • add speed tests for pyannote.audio.features.io on WAV files
  • switch from soundfile to torchaudio
  • add unit tests for pyannote.audio.features.io on main formats supported by torchaudio
  • add speed tests for pyannote.audio.features.io on main formats supported by torchaudio
@hbredin hbredin changed the title Switch from soundfile to torchaudio Switch from soundfile to torchaudio Apr 22, 2020
@hbredin
Copy link
Member Author

hbredin commented Apr 22, 2020

Originally posted by @nryant in #347 (comment)

Yes, if there are not currently unit tests, those should definitely be added first. Since this code is relevant for performance, we should probably also add some basic benchmarks so that we can check for regressions. Perhaps using airspeed velocity?

As to changes to the underlying audio io in order to support additional formats, it seems that ideally the following four audio file formats would minimally be supported:

  • wav
  • flac
  • mp3 (since it shows often in very large corpora such as Common Voice or the raw chapters from which LibriSpeech was sliced)
  • sphere files NOT compressed using SHORTEN (since they appear so frequently in older LDC and NIST packages)

librosa supports the last two formats, though random reads from MP3 files are SLOOOW. Torchaudio also supports all four formats and is quite a bit faster with MP3, so that might also be an option. Direct comparison of librosa and tochaudio for loading an entire 8 minute long, 16 kHz recording into a Tensor (mean times in seconds as reported by iPython's timeit command):

format librosa torchaudio
wav 0.017831 0.058931
flac 0.115104 0.135125
mp3 0.580566 0.449611
sph 0.017851 0.058338

And the same table for reading a random 500 ms chunk of the same recording :

format librosa torchaudio
wav 0.000174 0.000125
flac 0.000405 0.000253
mp3 0.256025 0.003628
sph 0.000165 0.000142

Do you want to open an issue and/or pull request to organize this work?

@hbredin
Copy link
Member Author

hbredin commented Apr 22, 2020

Originally posted by @nryant in #347 (comment)

After looking at the librosa and audioread code, there are other reasons to not rely on librosa for MP3 other than speed. In particular, audioread will use one of the supported backends available on the system, which may end up being ffmpeg. In which case the estimated (and potentially inaccurate) MP3 duration will be parsed from the output logged by ffmpeg to STDOUT, but only to the nearest 100 ms:

https://github.com/beetbox/audioread/blob/8d02710f5cd9f8db1290e4a3a08115e222b0790a/audioread/ffdec.py#L268

Maybe it would be safest to either depend on torchaudio or a combination of soundfile and torchaudio.

@hbredin
Copy link
Member Author

hbredin commented Apr 22, 2020

tests/data contains a bunch of 30-seconds AMI excerpts that can be used as sample WAV files for unit and speed tests.

If that helps, they even come with a pyannote.database custom protocol that can be used like this.

@hbredin hbredin added this to the Version 2.0 milestone Sep 17, 2020
@mogwai
Copy link
Contributor

mogwai commented Nov 10, 2020

Closed by #492

@mogwai mogwai linked a pull request Nov 10, 2020 that will close this issue
@mogwai mogwai self-assigned this Nov 11, 2020
@mogwai mogwai closed this as completed Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants