Some comprehensive papers about speaker diarization (SD).
If you discover any unnoticed documents, please open issues or pull requests (recommended).
- Overview
- Reviews
- EEND (End-to-End Neural Diarization)-based
- Using Target Speaker Embedding
- Clustering-based
- Online
- Self-Supervised
- Multitask
- Multi-Channel
- Measurement
- Multi-Modal
- Challenge
- DIHARD Keynote Session: The yellow brick road of diarization, challenges and other neural paths [Slides] [Video]
- “A review of speaker diarization: Recent advances with deep learning”, in Computer Speech & Language, Volume 72, 2023. (USC) [Paper]
- "An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings", in Computer Speech & Language, 2023. [Paper]
- "Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning," in Submitted to IEEE/ACM TASLP, 2024. [Paper]
- BLSTM-EEND: "End-to-End Neural Speaker Diarization with Permutation-Free Objectives", in Proc. Interspeech, 2019. (Hitachi) [Paper]
- SA-EEND (1): “End-to-End Neural Speaker Diarization with Self-attention”, in Proc. ASRU, 2019. (Hitachi) [Paper] [Code] [Pytorch] [Review]
- SA-EEND (2): “End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification”, in arXiv:2003.02966, 2020. (Hitachi) [Paper] [Review]
- SC-EEND: "Neural Speaker Diarization with Speaker-Wise Chain Rule", in arXiv:2006.01796, 2020. (Hitachi) [Paper] [Review]
- EEND-EDA (1): “End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors”, in Proc. Interspeech, 2020. (Hitachi) [Paper] [Review] [Code]
- EEND-EDA (2): “Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization”, in IEEE/ACM TASLP, 2022. (Hitachi) [Paper] [Review] [Code]
- CB-EEND: "End-to-end Neural Diarization: From Transformer to Conformer", in Proc. Interspeech, 2021. (Amazon) [Paper] [Review]
- TDCN-SA: "End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings", in Proc. ICASSP, 2021. (Google) [Paper] [Review]
- "End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection", in Proc. IEEE SLT, 2021. (Hitachi) [Paper]
- EEND-VC (1): "Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds", in Proc. ICASSP, 2021. (NTT) [Paper] [Review] [Code]
- EEND-VC (2): "Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech", in Proc. Interspeech, 2021. (NTT) [Paper] [Review] [Code]
- "Robust End-to-End Speaker Diarization with Conformer and Additive Margin Penalty," in Proc. Interspeech, 2021. (Fano Labs) [Paper]
- EEND-GLA: "Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors", in Proc. ASRU, 2021. (Hitachi) [Paper] [Reivew]
- "DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding", in Proc. ICASSP, 2022. (Google) [Paper]
- RX-EEND: “Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization”, in Proc. ICASSP, 2022. (GIST) [Paper] [Review]
- "End-to-end speaker diarization with transformer", in Proc. arXiv, 2022. [Paper]
- EEND-VC-iGMM: "Tight integration of neural and clustering-based diarization through deep unfolding of infinite Gaussian mixture model", in Proc. ICASSP, 2022. (NTT) [Paper]
- EDA-RC: "Robust End-to-end Speaker Diarization with Generic Neural Clustering", in Proc. Interspeech, 2022. (SJTU) [Paper]
- EEND-NAA: "End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors", in Proc. Interspeech, 2022. (JHU) [Paper]
- Graph-PIT: "Utterance-by-utterance overlap-aware neural diarization with Graph-PIT", in Proc. Interspeech, 2022. (NTT) [Paper] [Code]
- "Efficient Transformers for End-to-End Neural Speaker Diarization", in Proc. IberSPEECH, 2022. [Paper]
- "Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads", in Proc. ICASSP, 2023. (HU) [Paper]
- EEND-NA: “Neural Diarization with Non-Autoregressive Intermediate Attractors”, in Proc. ICASSP, 2023. (LINE) [Paper]
- EEND-EDA-SpkAtt: "Towards End-to-end Speaker Diarization in the Wild", in arXiv:2211.01299v1, 2022. [Paper]
- "TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization", in Proc. ICASSP, 2023. (Alibaba) [Paper] [Code]
- EEND-IAAE: "End-to-end neural speaker diarization with an iterative adaptive attractor estimation," in Neural Networks, Elsevier. [Paper] [Code]
- "Improving End-to-End Neural Diarization Using Conversational Summary Representations", in Proc. Interspeech, 2023. (Fano Labs) [Paper]
- AED-EEND: “Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor”, in Proc. Interspeech, 2023. (SJTU) [Paper] [Review]
- "Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization", in Proc. Interspeech, 2023. (HU) [Paper]
- "Powerset Multi-class Cross Entropy Loss for Neural Speaker Diarization", in Proc. Interspeech, 2023. (Pyannote) [Paper] [Code]
- "End-to-End Neural Speaker Diarization with Absolute Speaker Loss", in Proc. Interspeech, 2023. (Pyannote) [Paper]
- "Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization", in Electronics, 2023. [Paper]
- EEND-TA: "Transformer Attractors for Robust and Efficient End-to-End Neural Diarization," in Proc. ASRU, 2023. (Fano Labs) [Paper]
- "Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning," in Proc. ASRU, 2023. (Fano Labs) [Paper]
- "NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization," in Proc. ICASSP, 2024. (NTT) [Paper]
- AED-EEND-EE: "Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer," in IEEE/ACM TASLP, 2024. (SJTU) [Paper] [Review]
- "DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors," in IEEE/ACM TASLP, 2024. (BUT) [Paper] [Code] [Review]
- "EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings," in Submitted to IEEE SPL, 2024. (SNU) [Paper] [Review]
- "EEND-M2F: Masked-attention mask transformers for speaker diarization," in Proc. Interspeech, 2024. (Fano Labs) [arXiv] [Pub.] [Review]
- EEND-NAA (2): "End-to-End Neural Speaker Diarization with Non-Autoregressive Attractors", in IEEE/ACM TASLP, 2024. (JHU) [Paper]
- "From Modular to End-to-End Speaker Diarization," Ph.D. thesis, 2024. (BUT) [Paper]
- "Mamba-based Segmentation Model for Speaker Diarization," Submitted to ICASSP, 2025. (NTT) [arXiv] [Code]
- "Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?," in Proc. Odyssey, 2024. [Paper]
- "Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios," in Proc. Odyssey, 2024. [Paper]
- Concat-and-sum: “End-to-end neuarl speaker diarization with permuation-free objectives”, in Proc. Interspeech, 2019. [Paper]
- “From simulated mixtures to simulated conversations as training data for end-to-end neural diarization” , in Proc. Interspeech, 2022. (BUT) [Paper] [Code] [Review]
- Markov selection: “Improving the naturalness of simulated conversations for end-to-end neural diarization”, in Proc. Odyssey, 2022. (Hitachi) [Paper]
- "Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization", in Proc. ICASSP, 2023. (BUT) [Paper] [Code] [Review]
- EEND-EDA-SpkAtt: "Towards End-to-end Speaker Diarization in the Wild", in arXiv:2211.01299v1, 2022. [Paper]
- "Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation," in CHiME-7 Workshop, 2023. (NVIDIA) [Paper]
- "Enhancing low-latency speaker diarization with spatial dictionary learning," in Proc. ICASSP, 2024. (NTU) [Paper] [Poster]
- "Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling," in Proc. ICASSP, 2024. (OSU) [Paper]
- EENDasP: "End-to-End Speaker Diarization as Post-Processing", in Proc. ICASSP, 2021. (Hitachi) [Paper] [Review [Code]
- Dover-Lap: "DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs", in Proc. IEEE SLT, 2021. (JHU) [Paper] [Review] [Code]
- "DiaCorrect: Error Correction Back-end For Speaker Diarization," in Proc. ICASSP, 2024. (BUT) [Paper] [Code]
- TS-VAD: "Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario", in Proc. Interspeech, 2020. [Paper] [Code] [PPT]
- “The STC system for the CHiME-6 challenge,” in CHiME Workshop, 2020. [Paper]
- SEND (1): "Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information," in arXiv:2111.13694, 2021. (Alibaba) [Paper]
- SEND (2): "Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios," in arXiv:2203.09767, 2022 (Alibaba) [Paper]
- MTEAD: "Multi-target Filter and Detector for Unknown-number Speaker Diarization", in IEEE SPL, 2022. [Paper]
- SOND: "Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis", in Proc. EMNLP, 2022. (Alibaba) [Paper] [Code]
- EDA-TS-VAD: “Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization”, in Proc. ICASSP, 2023. (Microsoft) [Paper]
- Seq2Seq-TS-VAD: “Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction”, in Proc. ICASSP, 2023. (DKU) [Paper] [Review]
- QM-TS-VAD: "Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization", in Proc. Interspeech, 2023. (USTC) [Paper]
- "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding," in IEEE/ACM TASLP, 2023. (USTC) [Paper] [Code]
- NSD-MS2S: "Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture, " in Proc. ICASSP, 2024. (USTC) [Paper] [Code]
- PET-TSVAD: "Profile-Error-Tolerant Target-Speaker Voice Activity Detection," in Proc. ICASSP, 2024. (Microsoft) [Paper]
- PTSD: "Prompt-driven Target Speech Diarization," in Proc. ICASSP, 2024. (NUS) [Paper]
- "Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis," in Proc. SLT, 2021. (JHU) [Paper] [Blog] [Review]
- EEND-SS: "Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers”, in Proc. SLT, 2022. (CMU) [Paper] [Review]
- "TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings", in IEEE/ACM TASLP, 2024. [Paper]
- "Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings," in arXiv:2401.15993, 2024. (Tencent) [Paper] [Demo]
- "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings," in Proc. Odyssey, 2024. [Paper] [Code]
- MC-EEND: "Multi-channel Conversational Speaker Separation via Neural Diarization," in IEEE/ACM TASLP, 2024. (OSU) [Paper]
- "USED: Universal Speaker Extraction and Diarization," in submitted to IEEE/ACM TASLP, 2024. (CUHK) [Paper] [Demo] [Util] [Review]
- "Neural Blind Source Separation and Diarization for Distant Speech Recognition," in Proc. Interspeech, 2024. (AIST) [Paper]
- "TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024," in Proc. Interspeech, 2024. (Pyannote) [Paper]
- "Multi-Channel End-to-End Neural Diarization with Distributed Microphones", in Proc. ICASSP, 2022. (Hitachi) [Paper]
- "Multi-Channel Speaker Diarization Using Spatial Features for Meetings", in Proc. ICASSP, 2022. (Tencent) [Paper]
- "Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization," in Proc. IEEE SLT, 2023. (Hitachi) [Paper]
- "Semi-supervised multi-channel speaker diarization with cross-channel attention", in Proc. ASRU, 2023. (USTC) [Paper]
- "UniX-Encoder: A Universal X-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing," in arXiv:2310.16367, 2024. (JHU, Tencent) [Paper]
- "Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection," in IEEE/ACM TASLP, 2024. [Paper]
- "A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition," in Proc. ICASSP, 2024. (USTC) [Paper]
- MC-EEND: "Multi-channel Conversational Speaker Separation via Neural Diarization," in IEEE/ACM TASLP, 2024. (OSU) [Paper]
- "ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings," in Proc. Interspeech, 2024. (LIUM) [Paper]
- "Supervised online diarization with sample mean loss for multi-domain data", in Proc. ICASSP, 2020 [Paper] [Code]
- "Online End-to-End Neural Diarization with Speaker-Tracing Buffer", in Proc. IEEE SLT, 2021. (Hitachi) [Paper]
- BW-EDA-EEND: "BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers", in Proc. Interspeech, 2021. (Amazon) [Paper]
- FS-EEND: "Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers", in Proc. Interspeech, 2021. (Hitachi) [Paper] [Reivew]
- Diart: "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation", in Proc. ASRU, 2021. [Paper] [Code]
- "Low-Latency Online Speaker Diarization with Graph-Based Label Generation", in Proc. Odyssey, 2022. (DKU) [Paper]
- EEND-GLA: "Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors", in IEEE/ACM TASLP, 2022. (Hitachi) [Paper]
- Online TS-VAD: "Online Target Speaker Voice Activity Detection for Speaker Diarization", in Proc. Interspeech, 2022. (DKU) [Paper]
- "Absolute decision corrupts absolutely: conservative online speaker diarisation", in Proc. ICASSP, 2023. (Naver) [Paper]
- "A Reinforcement Learning Framework for Online Speaker Diarization", in Under Review. NeruIPS, 2023. (CU) [Paper]
- OTS-VAD: "End-to-end Online Speaker Diarization with Target Speaker Tracking," in Submitted IEEE/ACM TASLP, 2023. (DKU) [Paper]
- FS-EEND: "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors," in Proc. ICASSP, 2024. (Hangzhou) [Paper] [Code]
- "Online speaker diarization of meetings guided by speech separation," in Proc. ICASSP, 2024. (LTCI) [Paper] [Code]
- "Interrelate Training and Clustering for Online Speaker Diarization," in IEEE/ACM TASLP, 2024. [Paper]
- UIS-RNN: "Fully Supervised Speaker Diarization" (Google) [Paper] [Code]
- DNC: "Discriminative Neural Clustering for Speaker Diarisation", in Proc. IEEE SLT, 2019. [Paper] [Code] [Review]
- Pyannote: "pyannote.audio: neural building blocks for speaker diarization", in Proc. ICASSP, 2020. (CNRS) [Paper] [Code] [Video]
- NME-SC: “Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap”, IEEE SPL, 2019. [Paper] [Code]
- Resegmentation with VB: “Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection”, in Proc. ICASSP, 2020. [Paper]
- Pyannote 2.0: "End-to-end speaker segmentation for overlap-aware resegmentation", in Proc. Interspeech, 2021. (CNRS) [Paper] [Code] [Video]
- UMAP-Leiden: "Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure", in Proc. ICASSP, 2022. (Alibaba) [Paper]
- SCALE: "Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation", in Proc. ICASSP, 2023. (CAM) [Paper]
- SHARC: "Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization", in Proc. ICASSP, 2023. (IISC) [Paper]
- CDGCN: "Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization," in Proc. ICASSP, 2023. (XMU) [Paper]
- "Pyannote.Audio 2.1: Speaker Diarization Pipeline: Principle, Benchmark and Recipe", in Proc. Interspeech, 2023. (CNRS) [Paper]
- GADEC: "Graph attention-based deep embedded clustering for speaker diarization,", in Speech Communication, 2023. (NJUPT) [Paper]
- "Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization," in submitted to IEEE/ACM TASLP, 2024. [Paper]
- "Apollo's Unheard Voices: Graph Attention Networks for Speaker Diarization and Clustering for Fearless Steps Apollo Collection," in Proc. ICASSP, 2024. (UTD) [Paper]
- "Multi-View Speaker Embedding Learning for Enhanced Stability and Discriminability," in Proc. ICASSP, 2024. (Tsinghua) [Paper]
- "Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders," in arXiv:2407.01963, 2024. [Paper]
- "Investigating Confidence Estimation Measures for Speaker Diarization," in Proc. Interspeech, 2024. [Paper]
- "Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment," in Proc. Interspeech, 2024. (PU) [Paper]
- "Multi-Scale Speaker Diarization With Neural Affinity Score Fusion", in Proc. ICASSP, 2021. (USC) [Paper]
- AA+DR+NS: "Adapting Speaker Embeddings for Speaker Diarisation", in Proc. Interspeech, 2021. (Naver) [Paper] [Review]
- GAT+AA: "Multi-scale speaker embedding-based graph attention networks for speaker diarisation", in Proc. ICASSP, 2022. (Naver) [Paper]
- MSDD: "Multi-scale Speaker Diarization with Dynamic Scale Weighting", in Proc. Interspeech, 2022. (NVIDIA) [Paper] [Code] [Blog]
- "In Search of Strong Embedding Extractors For Speaker Diarization", in Proc. ICASSP, 2023. (Naver) [Paper] [Review]
- PRISM: "PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification", in Proc. Interspeech, 2022. (Alibaba) [Paper]
- DR-DESA: "Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity", in Proc. ICASSP, 2023. (Naver) [Paper] [Review]
- HEE: "High-resolution embedding extractor for speaker diarisation", in Proc. ICASSP, 2023. (Naver) [Paper] [Review]
- "Frame-wise and overlap-robust speaker embeddings for meeting diarization", in Proc. ICASSP, 2023. (PU) [Paper] [Review]
- "A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures", in Proc. Interspeech, 2023. (PU) [Paper]
- "Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios", in Proc. ICASSP, 2024. (PU) [Paper] [Review]
- "Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization," in Proc. Odyssey, 2024. (IDLab) [Paper]
- "Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification, in Submitted to IEEE/ACM TASLP, 2024. [Paper]
- "Xi-Vector Embedding for Speaker Recognition," in IEEE, SPL. (A*STAR) [Paper] [Review]
- "Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022," in Proc. Interspeech, 2023. (SJTU) [Paper]
- RecXi "Disentangling Voice and Content with Self-Supervision for Speaker Recognition," in Proc. NeurIPS, 2023. (A*STAR) [Paper]
- "ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings," in Proc. ASRU, 2023. (IDLab) [Paper] [Model] [Review]
- "Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification," in Proc. ICASSP, 2024. (Naver) [Paper]
- "Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition," in Proc. ICASSP, 2024. (CUHK) [Paper]
- "Disentangled Representation Learning for Environment-agnostic Speaker Recognition," in Proc. Interspeech, 2024. (KAIST) [arXiv] [Pub.] [Code]
- LSTM scoring: "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization", in Proc. Interspeech, 2019. (DKU) [Paper]
- "Self-Attentive Similarity Measurement Strategies in Speaker Diarization", in Proc. Interspeech, 2020. (DKU) [Paper]
- “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization”, IEEE/ACM TASLP, 2023. (DKU) [Paper]
- "Speaker Diarization based on Bayesian HMM with Eigenvoice Priors", in Proc. Odyssey, 2018. (BUT) [Paper]
- "VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation", in Proc. Odyssey, 2018. (Tsinghua) [Paper]
- “Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors”, IEEE/ACM TASLP, 2019. (BUT) [Paper]
- "BUT System Description for DIHARD Speech Diarization Challenge 2019", in arXiv:1910.08847, 2019. (BUT) [Paper]
- "Bayesian HMM Based x-Vector Clustering for Speaker Diarization", in Proc. Interspeech, 2019. (BUT) [Paper]
- "Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge", in Proc. ICASSP, 2020. (BUT) [Paper]
- "Analysis of the but Diarization System for Voxconverse Challenge", in Proc. ICASSP, 2021. (BUT) [Paper] [Code]
- "Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks", in Computer Speech & Language, 2022. (BUT) [Paper]
- MS-VBx: "Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization", in Proc. Interspeech, 2023. (NTT) [Paper]
- DVBx: "Discriminative Training of VBx Diarization", in Proc. ICASSP, 2024. (BUT) [Paper] [Code]
- "Variational Bayesian methods for audio indexing", in Proc. ICMI-MLMI, 2005. [Paper]
- "Bayesian analysis of speaker diarization with eigenvoice priors", in CRIM, Montreal, Technical Report, 2008. [Paper]
- "Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach", IEEE/ACM TASLP, 2013. [Paper]
- "Diarization resegmentation in the factor analysis subspace", in Proc. ICASSP, 2015. [Paper]
- "Diarization is hard: some experiences and lessons learned for the JHU team in the inaugural DIHARD challenge", in Proc. Interspeech, 2018. [Paper]
- "Analysis of i-vector length normalization in speaker recognition systems", in Proc. Interspeech, 2011. [Paper]
- "The speaker partitioning problem", in Proc. Odyssey, 2018. [Paper]
- "Discriminatively trained probabilistic linear discriminant analysis for speaker verification", in Proc. ICASSP, 2021. [Paper]
- "Speaker diarization with plda i-vector scoring and unsupervised calibration", in Proc. IEEE SLT, 2014. [Paper]
- "Iterative PLDA Adaptation for Speaker Diarization", in Proc. Interspeech, 2016. [Paper]
- "Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering, in Proc. Interspeech, 2017. [Paper]
- "Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge", in Proc. Interspeech, 2018. [Paper]
- DCA-PLDA "A Speaker Verification Backend with Robust Performance across Conditions”, in Computer & Language, 2022. [Paper] [Code]
- "Generalized domain adaptation framework for parametric back-end in speaker recognition", in arXiv:2305.15567, 2023. [Paper]
- "Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR," in Proc. ICASSP, 2022. [Paper]
- "Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription", in Proc. Interspeech, 2022. [Paper]
- "Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator", in Proc. Interspeech, 2023. (CUHK) [Paper]
- "Multi-resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model", in Proc. Interspeech, 2023.
- "Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach", in Proc. Interspeech, 2023. [Paper]
- "Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction", in Proc. Interspeech, 2023. (Amazon) [Paper]
- "Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach,", in Proc. ICASSP, 2024. (NVIDIA) [Paper]
- WEEND: "Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network," in arXiv:2309.08489, 2024. (Google) [Paper] [Supplementary]
- "One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition", in Proc. ICASSP, 2024. (CMU) [Paper]
- "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization," in arXiv:2309.16482, 2024. (PU) [Paper]
- “Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing," in Proc. ICASSP, 2024. (DKU) [Paper]
- "Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers" in Proc. ICASSP, 2024. (Idiap) [Paper]
- "A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition," in Proc. ICASSP, 2024. (USTC) [Paper]
- "SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR," in Proc. ASRU, 2023. (Alibaba) [Paper]
- "Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition," in arXiv:2312.10959, 2024. (NICT) [Paper]
- "On Speaker Attribution with SURT," in Proc. Odyssey, 2024. (JHU) [Paper]
- "Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications," in Proc. Odyssey, 2024. (CNRS) [Paper]
- "Target Speaker ASR with Whisper," in Submitted to ICASSP, 2025. (BUT) [Paper] [Code(Not yet)]
- "End-to-End Spoken Language Diarization with Wav2vec Embeddings", in Proc. Interspeech, 2023. [Paper] [Code]
- "Multi-resolution Approach to Identification of Spoken Languages and To Improve Overall Language Diarization System Using Whisper Model," in Proc. Interspeech, 2023. [Paper]
- "Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization", in Proc. ACL, 2023. (Alibaba) [Paper]
- MMSCD, "Encoder-decoder multimodal speaker change detection", in Proc. Interspeech, 2023. (Naver) [Paper]
- "Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment,", in Proc. ICTAI, 2023. [Paper]
- "DiariST: Streaming Speech Translation with Speaker Diarization," in Proc. ICASSP, 2024. (Microsoft) [Paper] [Code]
- JPCP: "Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation," in arXiv:2309.10456, 2024. (Alibaba) [Paper]
- "DiarizationLM: Speaker Diarization Post-Processing with Large Language Models," in Submitted to ICLR, 2024. (Google) [Paper] [Code]
- "LLM-based speaker diarization correction: A generalizable approach," in Submitted to IEEE/ACM TASLP, 2024. [Paper]
- "AG-LSEC: Audio Grounded Lexical Speaker Error Correction," in Proc. Interspeech, 2024. (Amazon) [Paper]
- "Who said that?: Audio-visual speaker diarisation of real-world meetings", in Proc. Interspeech, 2019. (Naver) [Paper]
- "Self-supervised learning for audio-visual speaker diarization", in Proc. ICASSP, 2020. (Tencent) [Paper] [Blog]
- AVA-AVD (AVR-Net): "AVA-AVD: Audio-Visual Speaker Diarization in the Wild", in Proc. ACM MM, 2022. [Paper] [Code] [Video]
- "End-to-End Audio-Visual Neural Speaker Diarization", in Proc. Interspeech, 2022. (USTC) [Paper] [Code] [Review]
- DyViSE: "DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization", in Proc. MMSP, 2022. (THU) [Paper] [Code]
- "Audio-Visual Speaker Diarization in the Framework of Multi-User Human-Robot Interaction", in Proc. ICASSP, 2023. [Paper]
- STHG: "Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization, in Proc. CVPR, 2023. (Intel) [Paper]
- "Speaker Diarization of Scripted Audiovisual Content," in arXiv:2308.02160, 2024. (Amazon) [Paper]
- "Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings," in Proc. ACM MM, 2023. [Paper]
- "Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization," in Springer Computer Science proceedings, 2023. [Paper]
- EEND-EDA++: "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization," in arXiv:2211.01299v2, 2023. [Paper]
- "AFL-Net: Integrating Audio, Facial, and Lip Modalities with Cross-Attention for Robust Speaker Diarization in the Wild," in Proc. ICASSP, 2024. (Tencent) [Paper] [Demos]
- "Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation," in Proc. AAAI, 2024. (Tencent) [Paper]
- "Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization," in Submitted to IEEE/ACM TASLP. (DKU) [Paper]
- "3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization," in arXiv:2403.19971, 2024. (Alibaba) [Paper] [Code]
- "Target Speech Diarization with Multimodal Prompts," in Submitted to IEEE/ACM TASLP, 2024. (NUS) [Paper]
- MFV-KSD: "Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization," in Submitted to ACM MM, 2024. [Paper] [Code]
- "Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio," in Proc. Interspeech, 2024. (IITK) [Paper]
- "A Benchmark for Multi-speaker Anonymization," in Submitted to IEEE/ACM TASLP, 2024. (SIT) [Paper] [Code]
- "Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework," in Proc. Interspeech, 2024. (LY) [Paper]
- "Speech Emotion Diarization: Which Emotion Appears When?," in Proc. ASRU, 2023. (Zaion) [Paper]
- "EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks," in arxiv:2310.12851, 2023. [Paper]
- "ED-TTS: Multi-scale Emotion Modeling using Cross-domain Emotion Diarization for Emotional Speech Synthesis, in Proc. ICASSP, 2024. [Paper]
- "Personal VAD: Speaker-Conditioned Voice Activity Detection", in Proc. Odyssey, 2020. (Google) [Paper]
- "SVVAD: Personal Voice Activity Detection for Speaker Verification", in Proc. Interspeech, 2023. [Paper]
- "Overlapped Speech Detection in Broadcast Streams Using X-vectors," in Proc. Interspeech, 2022. [Paper]
- "Overlapped speech and gender detection with WavLM pre-trained features," in Proc. Interspeech, 2022. [Paper]
- "Microphone Array Channel Combination Algorithms for Overlapped Speech Detection," in Proc. Interspeech, 2022. [Paper]
- "Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0," in Proc. ICASSP, 2023. [Paper] [Code]
- "Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction," in Proc. Interspeech, 2023. [Paper]
- "Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains," in arxiv:2307.13012, 2023. [Paper]
- "Advancing the study of Large-Scale Learning in Overlapped Speech Detection," in arXiv:2308.05987, 2023. [Paper]
- "USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models," in Proc. ICASSP, 2024. (Google) [Paper]
- "Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection," in IEEE/ACM TASLP, 2024. [Paper]
- Voxconverse: "Spot the conversation: speaker diarisation in the wild", in Proc. Interspeech, 2020. (VGG, Naver) [Paper] [Code] [Dataset]
- MSDWild: Multi-modal Speaker Diarization Dataset in the Wild, in Proc. Interspeech, 2020. [Paper] [Dataset]
- "LibriMix: An Open-Source Dataset for Generalizable Speech Separation," in arXiv:2005.11262, 2020. [Paper] [Code]
- Ego4D: " Around the World in 3,000 Hours of Egocentric Video," in Proc. CVPR, 2022. (Meta) [Paper] [Code] [Page]
- AliMeeting: "Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge," in Proc. ICASSP, 2022. (Alibaba) [Paper] [Dataset] [Code]
- "VoxBlink: X-Large Speaker Verification Dataset on Camera", in Proc. ICASSP, 2024. [Paper] [Dataset]
- "NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription," in arXiv:2401.08887, 2024. (MS) [Paper]
- "A Comparative Analysis of Speaker Diarization Models: Creating a Dataset for German Dialectal Speech," in Proc. ACL, 2024. [Paper]
- "Conversations in the wild: Data collection, automatic generation and evaluation," in Computer Speech & Language, 2025. [Paper]
- "ALLIES: A Speech Corpus for Segmentation, Speaker Diarization, Speech Recognition and Speaker Change Detection," in Proc. ACL, 2024. (LIUM) [Paper]""
- "Gryannote open-source speaker diarization labeling tool," in Proc. Interspeech (Show and Tell), 2024. (IRIT) [Pub.] [Code]
- “Self-supervised Speaker Diarization”, in Proc. Interspeech, 2022. [Paper]
- CSDA: "Continual Self-Supervised Domain Adaptation for End-to-End Speaker Diarization", in Proc. IEEE SLT, 2022. (CNRS) [Paper] [Code]
- DiarZen: "Leveraging Self-Supervised Learning for Speaker Diarization," in *Submitted to ICASSP," 2025. (BUT) [Paper] [Code]
- "Active Learning Based Constrained Clustering For Speaker Diarization", in IEEE/ACM TASLP, 2017. (UT) [Paper]
- "Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism," in Proc. Interspeech, 2023. [Paper]
- "Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions," in Proc. Interspeech, 2024. (USC) [Paper]
[Workshop]
- 1st: Microsoft [Tech Report] [Video]
- 2nd: BUT [Tech Report] [Video]
- 3rd: DKU [Tech Report]
[Workshop]
- 1st: DKU [Tech Report] [Video]
- 2nd: Bytedance [Tech Report] [Video]
- 3rd: Tencent [Tech Report]
- 1st: DKU [Tech Report] [slide] [Video]
- 2nd: KristonAI [Tech Report] [slide] [Video]
- 3rd: GIST [Tech Report] [slide] [Video] [Reivew]
- 1st: DKU [Tech Report] [Slide] [Video]
- 2nd: KrispAI [Tech Report] [Slide] [Video]
- 3rd: Pyannote [Tech Report] [Slide] [Video]
- 4th: GIST [Tech Report]
- Wespeaker [Tech Report]
[Introduction Paper] [Summary Paper] [Dataset-AliMeeting] [Code]
[Introduction Paper] [Page] [Basline Code]
- 1st: USTC [Paper] Slides] [Video]
- 2nd: Hitachi [Paper] [Slide] [Video]
- 3rd: Naver Clova [Paper] [Slide] [Video]
- 1st: USTC-NELSLIP [Paper] Slides] [Video]
- 2nd: Hitachi [Paper] [Slide] [Video]
- 3rd: DKU [Paper] [Slide] [Video]
"End-to-end speaker diarization system for the third dihard challenge system description," in DIHARD III Tech. Report, 2021
- "The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments," in Proc. Interspeech, 2023. [Paper] [Page]
- "The SpeeD--ZevoTech submission at DISPLACE 2023," in Proc. Interspeech, 2023. [Paper]
- "MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization," in Proc. Interspeech, 2023. [Paper] [Page]
- "ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge," 2023. [Paper]
- "The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge," in Technical Report, 2023. [Paper]
- "The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments," in Proc. Interspeech, 2024. [Paper]"
- "The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization," 2024. [Paper]