Switch from soundfile to torchaudio #360

hbredin · 2020-04-22T13:17:55Z

rename pyannote.audio.features.utils into pyannote.audio.features.io
add unit tests for pyannote.audio.features.io on WAV files
add speed tests for pyannote.audio.features.io on WAV files
switch from soundfile to torchaudio
add unit tests for pyannote.audio.features.io on main formats supported by torchaudio
add speed tests for pyannote.audio.features.io on main formats supported by torchaudio

The text was updated successfully, but these errors were encountered:

hbredin · 2020-04-22T13:22:42Z

Originally posted by @nryant in #347 (comment)

Yes, if there are not currently unit tests, those should definitely be added first. Since this code is relevant for performance, we should probably also add some basic benchmarks so that we can check for regressions. Perhaps using airspeed velocity?

As to changes to the underlying audio io in order to support additional formats, it seems that ideally the following four audio file formats would minimally be supported:

wav
flac
mp3 (since it shows often in very large corpora such as Common Voice or the raw chapters from which LibriSpeech was sliced)
sphere files NOT compressed using SHORTEN (since they appear so frequently in older LDC and NIST packages)

librosa supports the last two formats, though random reads from MP3 files are SLOOOW. Torchaudio also supports all four formats and is quite a bit faster with MP3, so that might also be an option. Direct comparison of librosa and tochaudio for loading an entire 8 minute long, 16 kHz recording into a Tensor (mean times in seconds as reported by iPython's timeit command):

format	librosa	torchaudio
wav	0.017831	0.058931
flac	0.115104	0.135125
mp3	0.580566	0.449611
sph	0.017851	0.058338

And the same table for reading a random 500 ms chunk of the same recording :

format	librosa	torchaudio
wav	0.000174	0.000125
flac	0.000405	0.000253
mp3	0.256025	0.003628
sph	0.000165	0.000142

Do you want to open an issue and/or pull request to organize this work?

hbredin · 2020-04-22T13:23:12Z

Originally posted by @nryant in #347 (comment)

After looking at the librosa and audioread code, there are other reasons to not rely on librosa for MP3 other than speed. In particular, audioread will use one of the supported backends available on the system, which may end up being ffmpeg. In which case the estimated (and potentially inaccurate) MP3 duration will be parsed from the output logged by ffmpeg to STDOUT, but only to the nearest 100 ms:

https://github.com/beetbox/audioread/blob/8d02710f5cd9f8db1290e4a3a08115e222b0790a/audioread/ffdec.py#L268

Maybe it would be safest to either depend on torchaudio or a combination of soundfile and torchaudio.

hbredin · 2020-04-22T13:26:52Z

tests/data contains a bunch of 30-seconds AMI excerpts that can be used as sample WAV files for unit and speed tests.

If that helps, they even come with a pyannote.database custom protocol that can be used like this.

mogwai · 2020-11-10T12:49:07Z

Closed by #492

hbredin changed the title ~~Switch from soundfile to torchaudio~~ Switch from soundfile to torchaudio Apr 22, 2020

hbredin mentioned this issue Apr 22, 2020

.mp3 files not supported for SAD #347

Closed

hbredin added this to the Version 2.0 milestone Sep 17, 2020

mogwai linked a pull request Nov 10, 2020 that will close this issue

feat: Using torchaudio for IO #492

Merged

mogwai self-assigned this Nov 11, 2020

mogwai closed this as completed Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from soundfile to torchaudio #360

Switch from soundfile to torchaudio #360

hbredin commented Apr 22, 2020

hbredin commented Apr 22, 2020

hbredin commented Apr 22, 2020

hbredin commented Apr 22, 2020 •

edited

Loading

mogwai commented Nov 10, 2020

Switch from soundfile to torchaudio #360

Switch from soundfile to torchaudio #360

Comments

hbredin commented Apr 22, 2020

hbredin commented Apr 22, 2020

hbredin commented Apr 22, 2020

hbredin commented Apr 22, 2020 • edited Loading

mogwai commented Nov 10, 2020

hbredin commented Apr 22, 2020 •

edited

Loading