Dear Organizer, before using Indico please first read the instructions. You will find there useful hints to make your agenda well-suited to the mobile app import process

25-29 October 2020
Shanghai International Convention Center
Asia/Shanghai timezone

Mon-1-7-4 New advances in speaker diarization

26 Oct 2020, 19:15



Gakuto Kurata (IBM Research) Hagai Aronowitz (IBM Research - Haifa) Masayuki Suzuki (IBM Research) Ron Hoory (IBM Haifa Research Lab) Weizhong Zhu (IBM T.J. Watson Research Center)


Recently, speaker diarization based on speaker embeddings has shown excellent results in many works. In this paper we propose several enhancements throughout the diarization pipeline. This work addresses two clustering frameworks: agglomerative hierarchical clustering (AHC) and spectral clustering (SC).
First, we use multiple speaker embeddings. We show that fusion of x-vectors and d-vectors boosts accuracy significantly.
Second, we train neural networks to leverage both acoustic and duration information for scoring similarity of segments or clusters. Third, we introduce a novel method to guide the AHC clustering mechanism using a neural network. Fourth, we handle short duration segments in SC by deemphasizing their effect on setting the number of speakers.
Finally, we propose a novel method for estimating the number of clusters in the SC framework. The method takes each eigenvalue and analyzes the projections of the SC similarity matrix on the corresponding eigenvector.
We evaluated our system on NIST SRE 2000 CALLHOME and, using cross-validation, we achieved an error rate of 5.1%, going beyond state-of-the-art speaker diarization.

Presentation Materials