Evaluating Speaker Diarization in Transcripts: A Text-based Approach with the TDER Metric and the TranscribeView System Open Access
Gong, Chen (Spring 2023)
Abstract
Speaker Diarization (SD), the task of attributing speaker labels to dialogue segments, has traditionally been performed and evaluated at the audio level. The diarization error rate (DER) metric for SD systems measures errors in time but does not account for the impact of automatic speech recognition (ASR) systems on transcript-based performance. Word error rate (WER), the evaluation metric for ASR, only considers errors in word insertion, deletion, and substitution, disregarding SD quality. To better evaluate SD performance at the text level, this paper proposes Text-based Diarization Error Rate (TDER) and diarization F1-score, which jointly assess SD and ASR performance.
To address inconsistencies in token counts between hypothesis and reference transcripts, we introduce a multiple sequence alignment tool that accurately maps words between reference and hypothesis transcripts. Our alignment method achieves 99% accuracy on a simulated corpus generated based on common SD and ASR errors.
Comparisons with DER, WER, and WDER on 10 transcripts from the CallHome dataset demonstrate that TDER and diarization F1-score provide a more reliable evaluation of speaker diarization at the text level. To enable a comprehensive evaluation of transcript quality, we present TranscribeView, a web-based platform for assessing and visualizing errors in speech recognition and speaker diarization. To the best of our knowledge, TranscribeView is the first comprehensive platform that enables researchers to align multi-sequence transcripts and assess and visualize speaker diarization errors, contributing significantly to the advancement of data-driven conversational AI research.
Table of Contents
1 Introduction 1
1.1 Background and motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Objectives and Contributions . . . . . . . . . . . . . . . . . . 2
1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 ASR Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Speaker Diarization Evaluation Metrics . . . . . . . . . . . . . . . . . 6
2.2.1 Diarization Error Rate (DER) . . . . . . . . . . . . . . . . . . 6
2.2.2 Word-level Diarization Error Rate (WDER) . . . . . . . . . . 8
2.3 Transcript alignment methods . . . . . . . . . . . . . . . . . . . . . . 9
3 Text-based Diarization Error Rate and F-1 score 10
3.1 Text-based Diarization Error Rate (TDER) . . . . . . . . . . . . . . 10
3.2 Diarization F-1 score . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Multiple Sequence Alignment for Transcript Mapping 13
4.1 Limitations for Pair-wise Alignment Algorithms . . . . . . . . . . . . 13
4.2 Needleman-Wunsch algorithm . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Adaptation to 3-dimension . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . 20
5 Experiments and Results 22
5.1 Transcribers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 CallHome Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Evaluation of Multiple Sequence Alignment . . . . . . . . . . . . . . 24
5.3.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 Evaluation of Proposed Metrics . . . . . . . . . . . . . . . . . . . . . 26
5.4.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.2 Speaker Alignment . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 TranscribeView: A System for Transcript Evaluation and Diarization Error Visualization 29
6.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Case Study: Comparing Transcribers . . . . . . . . . . . . . . . . . . 31
7 Conclusion 34
7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Bibliography 37
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Evaluating Speaker Diarization in Transcripts: A Text-based Approach with the TDER Metric and the TranscribeView System () | 2023-04-11 13:40:28 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|