Zoom Audio Transcription Accuracy for African American Vernacular English Open Access
Chance, Christina (Spring 2022)
Abstract
As telecommunication is becoming a growing part of society, there is a concern for reliability and accuracy for all users. African American Vernacular English has been a dialect marginalized and forgotten by the Speech Recognition and Natural Language Processing community, thereby making most speech recognition tools less accurate for Black speakers. This study explores Zoom’s closed captioning services for both African American Vernacular English and Standard American English to assess the accuracy amongst the different regional forms of AAVE as well as compare the overall accuracy between SAE and AAVE. Python’s Asr Evaluation module was used to compute the edit distance. About 9 hours from both the CORAAL data-sets and Santa Barbra Corpus of Spoken American English we used; both data-sets possess conversational speech with linguistic sounds and stuttering. Results suggested that Zoom’s closed captioning tool works more effectively for AAVE than for SAE based on the current data. To supplement that data in order to determine if the outcome of this work can be generalized to all closed captioning for video-conferencing tools, more formal speech samples were analyzed to assess the effect of outside compounding factors. The supplementary experiment showed contradicting results to the main study.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2 Background & Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.2 Language Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . .5
2.2.1 Standard American English . . . . . . . . . . . . . . . . . . . . . . . . . . .5
2.2.2 African American Vernacular English . . . . . . . . . . . . . . . . . . . . . . . . . . .6
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .8
3.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . . .8
3.1.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . .9
4 Experiment & Results . . . . . . . . . . . . . . . . . . . . . . . . . . .12
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . .12
4.1.1 Audio Data . . . . . . . . . . . . . . . . . . . . . . . . . . .12
4.1.2 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . .14
4.1.3 Video-Conferencing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . .14
4.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.2.1 Analysis: Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .17
4.3.1 Analysis: Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .17
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .19
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .22
5.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . .22
5.1.1 Continuation of Work . . . . . . . . . . . . . . . . . . . . . . . . . . .22
5.1.2 Fairness Space . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Appendix A
Full Data Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Appendix B
Appendix C
Full Supplementary Data . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .31
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Zoom Audio Transcription Accuracy for African American Vernacular English () | 2022-04-12 16:16:25 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|