Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support Speaker Diarization #1039

Open
uniqueness-ae opened this issue Oct 12, 2024 · 3 comments
Open

[Feature Request] Support Speaker Diarization #1039

uniqueness-ae opened this issue Oct 12, 2024 · 3 comments

Comments

@uniqueness-ae
Copy link

Implement speaker diarization for the existing mlx whisper support to:

  1. Enhance transcription accuracy in multi-speaker conversations
  2. Distinguish between different speakers in the output
  3. Improve overall usability of the transcription feature

This addition will provide more insightful and structured transcripts, making it easier to analyze and understand complex audio content. Thanks

@Hoohm
Copy link

Hoohm commented Oct 20, 2024

Would love to see this as well.
I can help out in making the feature but I need some pointers as to how it would be possible.

@uniqueness-ae
Copy link
Author

I tried pyannote.audio model using rented cloud GPUs and had some success. Perhaps if there is a way to run this mlx, it will probably run faster. Maybe even better if it’s coupled with whisper to simplify the process. There is a repo from m-bain called WhisperX that does this. Could help as a reference.

@krzysztoflupa
Copy link

@uniqueness-ae On how many speakers have you tried? WhisperX didnt work properly at all with more than 20 speakers still do not know why...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants