Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.mp3 files not supported for SAD #347

Closed
dannima opened this issue Apr 6, 2020 · 10 comments
Closed

.mp3 files not supported for SAD #347

dannima opened this issue Apr 6, 2020 · 10 comments

Comments

@dannima
Copy link

dannima commented Apr 6, 2020

Looks like pyannote-audio is using soundfile to load audio files, and soundfile does not support mp3 files. Is it possible to switch to audioread/ffmpeg?

@hbredin
Copy link
Member

hbredin commented Apr 6, 2020

pyannote.audio relies on SoundFile because

  • it is faster than most alternatives
  • it supports seeking (which is used heavily during training to extract audio chunk at random position)

The counterpart is, indeed, the limited support for other audio formats.

For more insights, see this useful benchmark by @faroit: https://github.com/faroit/python_audio_loading_benchmark

That being said, once the model is trained, seeking is indeed not a mandatory features.
I’d gladly merge a PR that add support for other formats at inference time.

An alternative option is to use your own pyannote.database preprocessor to pre-load the waveform using any library you see fit.

https://github.com/pyannote/pyannote-database#preprocessors

# preprocessors:
# key:
# name: package.module.ClassName
# params:
# param1: value1
# param2: value2

@nryant
Copy link

nryant commented Apr 16, 2020

Hi @hbredin , those audio loading benchmarks were originally run with a now 2 year old version of librosa. Since then, librosa switched to using soundfile by default, then falling back to an alternate backend if soundfile could not handle the format. When rerun with modern versions of librosa, the performance is MUCH BETTER.

For WAV and FLAC, the pure soundfile solution and librosa are essentially identical. If you'd like, I could make a pull request replacing the soundfile references with wrappers around librosa.

@faroit
Copy link

faroit commented Apr 16, 2020

@nryant thanks for reminding me to finish this PR ;-)

@hbredin
Copy link
Member

hbredin commented Apr 21, 2020

Thanks for the offer, @nryant.

I'd definitely merge such a PR adding support for more file formats.

However, I also want to make sure this does not slow things down. In particular (but this is true for all methods and functions in pyannote.audio.features.utils), we need to make sure that RawAudio.crop is at least as fast as it currently is for WAV files.

I am also concerned about the lack of unit tests for pyannote.audio.features.utils.
These tests should probably be added first, to avoid any regression when switching to librosa (and also catch any future regression with future releases of librosa).

@nryant
Copy link

nryant commented Apr 21, 2020

Hi @hbredin ,

Yes, if there are not currently unit tests, those should definitely be added first. Since this code is relevant for performance, we should probably also add some basic benchmarks so that we can check for regressions. Perhaps using airspeed velocity?

As to changes to the underlying audio io in order to support additional formats, it seems that ideally the following four audio file formats would minimally be supported:

  • wav
  • flac
  • mp3 (since it shows often in very large corpora such as Common Voice or the raw chapters from which LibriSpeech was sliced)
  • sphere files NOT compressed using SHORTEN (since they appear so frequently in older LDC and NIST packages)

librosa supports the last two formats, though random reads from MP3 files are SLOOOW. Torchaudio also supports all four formats and is quite a bit faster with MP3, so that might also be an option. Direct comparison of librosa and tochaudio for loading an entire 8 minute long, 16 kHz recording into a Tensor (mean times in seconds as reported by iPython's timeit command):

format librosa torchaudio
wav 0.017831 0.058931
flac 0.115104 0.135125
mp3 0.580566 0.449611
sph 0.017851 0.058338

And the same table for reading a random 500 ms chunk of the same recording :

format librosa torchaudio
wav 0.000174 0.000125
flac 0.000405 0.000253
mp3 0.256025 0.003628
sph 0.000165 0.000142

Do you want to open an issue and/or pull request to organize this work?

@nryant
Copy link

nryant commented Apr 21, 2020

After looking at the librosa and audioread code, there are other reasons to not rely on librosa for MP3 other than speed. In particular, audioread will use one of the supported backends available on the system, which may end up being ffmpeg. In which case the estimated (and potentially inaccurate) MP3 duration will be parsed from the output logged by ffmpeg to STDOUT, but only to the nearest 100 ms:

https://github.com/beetbox/audioread/blob/8d02710f5cd9f8db1290e4a3a08115e222b0790a/audioread/ffdec.py#L268

Maybe it would be safest to either depend on torchaudio or a combination of soundfile and torchaudio.

@hbredin
Copy link
Member

hbredin commented Apr 22, 2020

Thanks @nryant for this analysis -- that is very helpful!
torchaudio looks like a very good candidate.

I just created a new issue to continue the discussion and keep track of the switch.

Closing this one (will try to import your analysis in there).

@hbredin hbredin closed this as completed Apr 22, 2020
hbredin pushed a commit that referenced this issue Nov 9, 2022
FrenchKrab added a commit to FrenchKrab/pyannote-audio that referenced this issue Nov 16, 2022
commit d966b51
Author: Yoyoma22 <[email protected]>
Date:   Mon Nov 14 04:43:08 2022 -0500

    fix(setup): use utf-8 encoding

    Fixes pyannote#1142

    Co-authored-by: Andre.Bonin <[email protected]>
    Co-authored-by: Hervé BREDIN <[email protected]>

commit 85bc101
Author: Atai Barkai <[email protected]>
Date:   Wed Nov 9 01:53:56 2022 -0800

    setup: add support for soundfile 0.11 (pyannote#1140)

    Fixes pyannote#1069 pyannote#1096
    Related to pyannote#347

commit 8b5ed07
Author: Rhenan <[email protected]>
Date:   Sat Oct 29 01:54:53 2022 -0400

    doc: add hf.co/pyannote/segmentation to readme (pyannote#1127)

commit 378f7fb
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 22:29:57 2022 +0200

    Créé avec Colaboratory

commit d992509
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 22:04:37 2022 +0200

    Update README.md

commit c68125e
Merge: 36f1e7b 460f7e7
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 22:01:55 2022 +0200

    Merge tag '2.1.1' into develop

    Version 2.1.1

commit 460f7e7
Merge: 2cf1490 82a07ad
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 22:01:47 2022 +0200

    Merge branch 'release/2.1.1'

commit 82a07ad
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 22:01:38 2022 +0200

    git: update version number

commit 36f1e7b
Merge: f700d6e 2cf1490
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:53:54 2022 +0200

    Merge tag '2.1' into develop

    Version 2.1

commit 2cf1490
Merge: 25462d5 6d9d98c
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:53:46 2022 +0200

    Merge branch 'release/2.1'

commit 6d9d98c
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:53:26 2022 +0200

    setup: fix version number

commit fc55946
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:52:44 2022 +0200

    setup: bump version

commit b7343cc
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:52:32 2022 +0200

    doc: update README

commit cca3316
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:52:19 2022 +0200

    doc: update changelog

commit f700d6e
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 21:21:17 2022 +0200

    feat: add support for private and gated hf.co models (pyannote#1064)

commit a463e5c
Author: Hervé BREDIN <[email protected]>
Date:   Thu Oct 27 09:21:50 2022 +0200

    * fix: prioritize threshold value when looking for best iteration (pyannote#1115)

commit 0dd2842
Author: Hervé BREDIN <[email protected]>
Date:   Tue Oct 25 15:22:53 2022 +0200

    setup: switch to latest hugginface_hub API (pyannote#1114)

    Fixes pyannote#1065

commit ad0df4c
Author: Hervé BREDIN <[email protected]>
Date:   Tue Oct 25 15:15:24 2022 +0200

    feat: add support for {min|max}_clusters to AgglomerativeClustering (pyannote#1113)
@Infinitay
Copy link

Infinitay commented Mar 3, 2023

I was having issues with this lately myself. My issue went away for mp3s when I manually specified to install the latest available soundfile package, albeit I was forced to use v0.12.0 since that's the maximum version allowed for the dependency whilst 0.12.1 has been recently released

@nryant
Copy link

nryant commented Mar 3, 2023

I'm not 100%, but I believe torchaudio uses soundfile, which in turn depends on libsndfile for reading formats. As of version 1.10, libsndfile does handle MP3:

https://github.com/libsndfile/libsndfile/issues/258

However, which version of libsndfile the soundfile package uses is more complicated. For Linux, soundfile always uses the system version, though they are working toward building and distributing wheels for Linux systems:

https://github.com/bastibe/python-soundfile/issues/353

For OS X/WIndows, they provide wheels which contain a pre-compiled shared library, though in my experience on OS X, if you already have libsndfile installed, soundfile often picks up the wrong library.

@Infinitay
Copy link

Infinitay commented Mar 3, 2023

Last I checked torchaudio makes using soundfile optional but I believe we use it primarily. At this rate the more I learn about more dependencies the more lost I get on what is the main culprit. Although, I will look more into libsndfile personally. As I said, I updated my soundfile package to v0.12.0 which allowed me to use mp3s. However, I have a lot more m4as and unless I convert them into mp3s or another format that soundfile supports, I can't directly process them. I'll see if updating my libsndfile is possible and could help with that.

though in my experience on OS X, if you already have libsndfile installed, soundfile often picks up the wrong library.
I wonder if that's where my problems first started on Windows, it having chosen the wrong version. I guess I'll find out if I manage to attempt this later.

Thanks for the additional information

EDIT: That was short lived. soundfile v0.12.0 updated to use libsndfile v1.2.0 which is the current latest. Looking at their issues, doesn't seem like m4a will be supported yet due to patents or something? At this rate spending a few hours converting my files is much simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants