.mp3 files not supported for SAD #347

dannima · 2020-04-06T04:28:38Z

Looks like pyannote-audio is using soundfile to load audio files, and soundfile does not support mp3 files. Is it possible to switch to audioread/ffmpeg?

hbredin · 2020-04-06T08:08:07Z

pyannote.audio relies on SoundFile because

it is faster than most alternatives
it supports seeking (which is used heavily during training to extract audio chunk at random position)

The counterpart is, indeed, the limited support for other audio formats.

For more insights, see this useful benchmark by @faroit: https://github.com/faroit/python_audio_loading_benchmark

That being said, once the model is trained, seeking is indeed not a mandatory features.
I’d gladly merge a PR that add support for other formats at inference time.

An alternative option is to use your own pyannote.database preprocessor to pre-load the waveform using any library you see fit.

https://github.com/pyannote/pyannote-database#preprocessors

pyannote-audio/pyannote/audio/applications/config.py

Lines 119 to 124 in cd2f2b5

    
           # preprocessors: 
        
           #    key: 
        
           #       name: package.module.ClassName 
        
           #       params: 
        
           #          param1: value1 
        
           #          param2: value2

nryant · 2020-04-16T14:47:40Z

Hi @hbredin , those audio loading benchmarks were originally run with a now 2 year old version of librosa. Since then, librosa switched to using soundfile by default, then falling back to an alternate backend if soundfile could not handle the format. When rerun with modern versions of librosa, the performance is MUCH BETTER.

For WAV and FLAC, the pure soundfile solution and librosa are essentially identical. If you'd like, I could make a pull request replacing the soundfile references with wrappers around librosa.

faroit · 2020-04-16T20:12:56Z

@nryant thanks for reminding me to finish this PR ;-)

hbredin · 2020-04-21T12:59:10Z

Thanks for the offer, @nryant.

I'd definitely merge such a PR adding support for more file formats.

However, I also want to make sure this does not slow things down. In particular (but this is true for all methods and functions in pyannote.audio.features.utils), we need to make sure that RawAudio.crop is at least as fast as it currently is for WAV files.

I am also concerned about the lack of unit tests for pyannote.audio.features.utils.
These tests should probably be added first, to avoid any regression when switching to librosa (and also catch any future regression with future releases of librosa).

nryant · 2020-04-21T17:24:07Z

Hi @hbredin ,

Yes, if there are not currently unit tests, those should definitely be added first. Since this code is relevant for performance, we should probably also add some basic benchmarks so that we can check for regressions. Perhaps using airspeed velocity?

As to changes to the underlying audio io in order to support additional formats, it seems that ideally the following four audio file formats would minimally be supported:

wav
flac
mp3 (since it shows often in very large corpora such as Common Voice or the raw chapters from which LibriSpeech was sliced)
sphere files NOT compressed using SHORTEN (since they appear so frequently in older LDC and NIST packages)

librosa supports the last two formats, though random reads from MP3 files are SLOOOW. Torchaudio also supports all four formats and is quite a bit faster with MP3, so that might also be an option. Direct comparison of librosa and tochaudio for loading an entire 8 minute long, 16 kHz recording into a Tensor (mean times in seconds as reported by iPython's timeit command):

format	librosa	torchaudio
wav	0.017831	0.058931
flac	0.115104	0.135125
mp3	0.580566	0.449611
sph	0.017851	0.058338

And the same table for reading a random 500 ms chunk of the same recording :

format	librosa	torchaudio
wav	0.000174	0.000125
flac	0.000405	0.000253
mp3	0.256025	0.003628
sph	0.000165	0.000142

Do you want to open an issue and/or pull request to organize this work?

nryant · 2020-04-21T22:12:24Z

After looking at the librosa and audioread code, there are other reasons to not rely on librosa for MP3 other than speed. In particular, audioread will use one of the supported backends available on the system, which may end up being ffmpeg. In which case the estimated (and potentially inaccurate) MP3 duration will be parsed from the output logged by ffmpeg to STDOUT, but only to the nearest 100 ms:

https://github.com/beetbox/audioread/blob/8d02710f5cd9f8db1290e4a3a08115e222b0790a/audioread/ffdec.py#L268

Maybe it would be safest to either depend on torchaudio or a combination of soundfile and torchaudio.

hbredin · 2020-04-22T13:21:35Z

Thanks @nryant for this analysis -- that is very helpful!
torchaudio looks like a very good candidate.

I just created a new issue to continue the discussion and keep track of the switch.

Closing this one (will try to import your analysis in there).

Fixes #1069 #1096 Related to #347

commit d966b51 Author: Yoyoma22 <[email protected]> Date: Mon Nov 14 04:43:08 2022 -0500 fix(setup): use utf-8 encoding Fixes pyannote#1142 Co-authored-by: Andre.Bonin <[email protected]> Co-authored-by: Hervé BREDIN <[email protected]> commit 85bc101 Author: Atai Barkai <[email protected]> Date: Wed Nov 9 01:53:56 2022 -0800 setup: add support for soundfile 0.11 (pyannote#1140) Fixes pyannote#1069 pyannote#1096 Related to pyannote#347 commit 8b5ed07 Author: Rhenan <[email protected]> Date: Sat Oct 29 01:54:53 2022 -0400 doc: add hf.co/pyannote/segmentation to readme (pyannote#1127) commit 378f7fb Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:29:57 2022 +0200 Créé avec Colaboratory commit d992509 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:04:37 2022 +0200 Update README.md commit c68125e Merge: 36f1e7b 460f7e7 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:01:55 2022 +0200 Merge tag '2.1.1' into develop Version 2.1.1 commit 460f7e7 Merge: 2cf1490 82a07ad Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:01:47 2022 +0200 Merge branch 'release/2.1.1' commit 82a07ad Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:01:38 2022 +0200 git: update version number commit 36f1e7b Merge: f700d6e 2cf1490 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:53:54 2022 +0200 Merge tag '2.1' into develop Version 2.1 commit 2cf1490 Merge: 25462d5 6d9d98c Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:53:46 2022 +0200 Merge branch 'release/2.1' commit 6d9d98c Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:53:26 2022 +0200 setup: fix version number commit fc55946 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:52:44 2022 +0200 setup: bump version commit b7343cc Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:52:32 2022 +0200 doc: update README commit cca3316 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:52:19 2022 +0200 doc: update changelog commit f700d6e Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:21:17 2022 +0200 feat: add support for private and gated hf.co models (pyannote#1064) commit a463e5c Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 09:21:50 2022 +0200 * fix: prioritize threshold value when looking for best iteration (pyannote#1115) commit 0dd2842 Author: Hervé BREDIN <[email protected]> Date: Tue Oct 25 15:22:53 2022 +0200 setup: switch to latest hugginface_hub API (pyannote#1114) Fixes pyannote#1065 commit ad0df4c Author: Hervé BREDIN <[email protected]> Date: Tue Oct 25 15:15:24 2022 +0200 feat: add support for {min|max}_clusters to AgglomerativeClustering (pyannote#1113)

Infinitay · 2023-03-03T16:58:30Z

I was having issues with this lately myself. My issue went away for mp3s when I manually specified to install the latest available soundfile package, albeit I was forced to use v0.12.0 since that's the maximum version allowed for the dependency whilst 0.12.1 has been recently released

nryant · 2023-03-03T17:59:22Z

I'm not 100%, but I believe torchaudio uses soundfile, which in turn depends on libsndfile for reading formats. As of version 1.10, libsndfile does handle MP3:

https://github.com/libsndfile/libsndfile/issues/258

However, which version of libsndfile the soundfile package uses is more complicated. For Linux, soundfile always uses the system version, though they are working toward building and distributing wheels for Linux systems:

https://github.com/bastibe/python-soundfile/issues/353

For OS X/WIndows, they provide wheels which contain a pre-compiled shared library, though in my experience on OS X, if you already have libsndfile installed, soundfile often picks up the wrong library.

Infinitay · 2023-03-03T18:34:38Z

Last I checked torchaudio makes using soundfile optional but I believe we use it primarily. At this rate the more I learn about more dependencies the more lost I get on what is the main culprit. Although, I will look more into libsndfile personally. As I said, I updated my soundfile package to v0.12.0 which allowed me to use mp3s. However, I have a lot more m4as and unless I convert them into mp3s or another format that soundfile supports, I can't directly process them. ~~I'll see if updating my libsndfile is possible and could help with that.~~

though in my experience on OS X, if you already have libsndfile installed, soundfile often picks up the wrong library.
I wonder if that's where my problems first started on Windows, it having chosen the wrong version. ~~I guess I'll find out if I manage to attempt this later.~~

Thanks for the additional information

EDIT: That was short lived. soundfile v0.12.0 updated to use libsndfile v1.2.0 which is the current latest. Looking at their issues, doesn't seem like m4a will be supported yet due to patents or something? At this rate spending a few hours converting my files is much simpler.

hbredin closed this as completed Apr 22, 2020

hbredin mentioned this issue Apr 22, 2020

Switch from soundfile to torchaudio #360

Closed

6 tasks

hbredin pushed a commit that referenced this issue Nov 9, 2022

setup: add support for soundfile 0.11 (#1140)

85bc101

Fixes #1069 #1096 Related to #347

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.mp3 files not supported for SAD #347

.mp3 files not supported for SAD #347

dannima commented Apr 6, 2020

hbredin commented Apr 6, 2020

nryant commented Apr 16, 2020 •

edited

Loading

faroit commented Apr 16, 2020

hbredin commented Apr 21, 2020

nryant commented Apr 21, 2020 •

edited

Loading

nryant commented Apr 21, 2020 •

edited

Loading

hbredin commented Apr 22, 2020

Infinitay commented Mar 3, 2023 •

edited

Loading

nryant commented Mar 3, 2023

Infinitay commented Mar 3, 2023 •

edited

Loading

.mp3 files not supported for SAD #347

.mp3 files not supported for SAD #347

Comments

dannima commented Apr 6, 2020

hbredin commented Apr 6, 2020

nryant commented Apr 16, 2020 • edited Loading

faroit commented Apr 16, 2020

hbredin commented Apr 21, 2020

nryant commented Apr 21, 2020 • edited Loading

nryant commented Apr 21, 2020 • edited Loading

hbredin commented Apr 22, 2020

Infinitay commented Mar 3, 2023 • edited Loading

nryant commented Mar 3, 2023

Infinitay commented Mar 3, 2023 • edited Loading

nryant commented Apr 16, 2020 •

edited

Loading

nryant commented Apr 21, 2020 •

edited

Loading

nryant commented Apr 21, 2020 •

edited

Loading

Infinitay commented Mar 3, 2023 •

edited

Loading

Infinitay commented Mar 3, 2023 •

edited

Loading