-
-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.mp3 files not supported for SAD #347
Comments
pyannote.audio relies on SoundFile because
The counterpart is, indeed, the limited support for other audio formats. For more insights, see this useful benchmark by @faroit: https://github.com/faroit/python_audio_loading_benchmark That being said, once the model is trained, seeking is indeed not a mandatory features. An alternative option is to use your own pyannote.database preprocessor to pre-load the waveform using any library you see fit. https://github.com/pyannote/pyannote-database#preprocessors pyannote-audio/pyannote/audio/applications/config.py Lines 119 to 124 in cd2f2b5
|
Hi @hbredin , those audio loading benchmarks were originally run with a now 2 year old version of For WAV and FLAC, the pure |
@nryant thanks for reminding me to finish this PR ;-) |
Thanks for the offer, @nryant. I'd definitely merge such a PR adding support for more file formats. However, I also want to make sure this does not slow things down. In particular (but this is true for all methods and functions in I am also concerned about the lack of unit tests for |
Hi @hbredin , Yes, if there are not currently unit tests, those should definitely be added first. Since this code is relevant for performance, we should probably also add some basic benchmarks so that we can check for regressions. Perhaps using airspeed velocity? As to changes to the underlying audio io in order to support additional formats, it seems that ideally the following four audio file formats would minimally be supported:
And the same table for reading a random 500 ms chunk of the same recording :
Do you want to open an issue and/or pull request to organize this work? |
After looking at the Maybe it would be safest to either depend on |
Thanks @nryant for this analysis -- that is very helpful! I just created a new issue to continue the discussion and keep track of the switch. Closing this one (will try to import your analysis in there). |
commit d966b51 Author: Yoyoma22 <[email protected]> Date: Mon Nov 14 04:43:08 2022 -0500 fix(setup): use utf-8 encoding Fixes pyannote#1142 Co-authored-by: Andre.Bonin <[email protected]> Co-authored-by: Hervé BREDIN <[email protected]> commit 85bc101 Author: Atai Barkai <[email protected]> Date: Wed Nov 9 01:53:56 2022 -0800 setup: add support for soundfile 0.11 (pyannote#1140) Fixes pyannote#1069 pyannote#1096 Related to pyannote#347 commit 8b5ed07 Author: Rhenan <[email protected]> Date: Sat Oct 29 01:54:53 2022 -0400 doc: add hf.co/pyannote/segmentation to readme (pyannote#1127) commit 378f7fb Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:29:57 2022 +0200 Créé avec Colaboratory commit d992509 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:04:37 2022 +0200 Update README.md commit c68125e Merge: 36f1e7b 460f7e7 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:01:55 2022 +0200 Merge tag '2.1.1' into develop Version 2.1.1 commit 460f7e7 Merge: 2cf1490 82a07ad Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:01:47 2022 +0200 Merge branch 'release/2.1.1' commit 82a07ad Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 22:01:38 2022 +0200 git: update version number commit 36f1e7b Merge: f700d6e 2cf1490 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:53:54 2022 +0200 Merge tag '2.1' into develop Version 2.1 commit 2cf1490 Merge: 25462d5 6d9d98c Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:53:46 2022 +0200 Merge branch 'release/2.1' commit 6d9d98c Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:53:26 2022 +0200 setup: fix version number commit fc55946 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:52:44 2022 +0200 setup: bump version commit b7343cc Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:52:32 2022 +0200 doc: update README commit cca3316 Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:52:19 2022 +0200 doc: update changelog commit f700d6e Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 21:21:17 2022 +0200 feat: add support for private and gated hf.co models (pyannote#1064) commit a463e5c Author: Hervé BREDIN <[email protected]> Date: Thu Oct 27 09:21:50 2022 +0200 * fix: prioritize threshold value when looking for best iteration (pyannote#1115) commit 0dd2842 Author: Hervé BREDIN <[email protected]> Date: Tue Oct 25 15:22:53 2022 +0200 setup: switch to latest hugginface_hub API (pyannote#1114) Fixes pyannote#1065 commit ad0df4c Author: Hervé BREDIN <[email protected]> Date: Tue Oct 25 15:15:24 2022 +0200 feat: add support for {min|max}_clusters to AgglomerativeClustering (pyannote#1113)
I was having issues with this lately myself. My issue went away for mp3s when I manually specified to install the latest available soundfile package, albeit I was forced to use v0.12.0 since that's the maximum version allowed for the dependency whilst 0.12.1 has been recently released |
I'm not 100%, but I believe
However, which version of
For OS X/WIndows, they provide wheels which contain a pre-compiled shared library, though in my experience on OS X, if you already have |
Last I checked torchaudio makes using soundfile optional but I believe we use it primarily. At this rate the more I learn about more dependencies the more lost I get on what is the main culprit. Although, I will look more into libsndfile personally. As I said, I updated my soundfile package to v0.12.0 which allowed me to use mp3s. However, I have a lot more m4as and unless I convert them into mp3s or another format that soundfile supports, I can't directly process them.
Thanks for the additional information EDIT: That was short lived. soundfile v0.12.0 updated to use libsndfile v1.2.0 which is the current latest. Looking at their issues, doesn't seem like m4a will be supported yet due to patents or something? At this rate spending a few hours converting my files is much simpler. |
Looks like pyannote-audio is using soundfile to load audio files, and soundfile does not support mp3 files. Is it possible to switch to audioread/ffmpeg?
The text was updated successfully, but these errors were encountered: