diarization multiprocessing on CPUs #1124
Unanswered
MrEdwards007
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am very interested in speaker-diarization but I do not have a supported GPU so this runs on my CPU.
I would like to run diarization on an chunks of a single input source, in parallel and stitch the results together.
Can this be done using the below methodology or is there another that would work?
All testing was done on a MacBook, macOS Big Sur using 2.3 GHz Intel Core i9, 16 cores, with 16G of RAM (GPU not supported)
My first target was to learn how long diarization would take under normal circumstances. I ran the process overnight.
My target (testing) file DonQuixote_OneHour is 3706.393 seconds - 01:01:46(H:M:S)
I guessed the process would be at most 1-1.5x the length of the audio on a CPU
The elapsed completion time for DonQuixote_OneHour_audio.rttm was 7:41:12 (H:M:S), which is greater than -7x (closer to -8x)
Once I established how long a typical process ran, I tried to learn if the was a signature]hash for each speaker identified, versus 'speaker 1, 2, 3, etc' If there is a signature, the process can be broken into chunks (I've already done this for transcription, including fixing the concatenated timelines) and run in parallel to bring the completion time down. If there is a signature\hash, each parallel process can be stitched back together, using the signature\hash of the speaker (assuming this would be the same across processes).
Below is the approach that I took with accelerating Whisper tasks, (specifically transcriptions)
openai/whisper#432
Using 011 of 16CPUs for the "tiny.en" model, a transcription speed (over real time) of 32.713x
Using 007 of 16CPUs for the "base.en" model, a transcription speed (over real time) of 16.416x
Using 009 of 16CPUs for the "small.en" model, a transcription speed (over real time) of 5.595x
Thank you for your time and assistance.
Beta Was this translation helpful? Give feedback.
All reactions