Zoom Audio Transcription Accuracy for African American Vernacular English Open Access

Chance, Christina (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/3n2040522?locale=pt-BR%2A

Published

Abstract

As telecommunication is becoming a growing part of society, there is a concern for reliability and accuracy for all users. African American Vernacular English has been a dialect marginalized and forgotten by the Speech Recognition and Natural Language Processing community, thereby making most speech recognition tools less accurate for Black speakers. This study explores Zoom’s closed captioning services for both African American Vernacular English and Standard American English to assess the accuracy amongst the different regional forms of AAVE as well as compare the overall accuracy between SAE and AAVE. Python’s Asr Evaluation module was used to compute the edit distance. About 9 hours from both the CORAAL data-sets and Santa Barbra Corpus of Spoken American English we used; both data-sets possess conversational speech with linguistic sounds and stuttering. Results suggested that Zoom’s closed captioning tool works more effectively for AAVE than for SAE based on the current data. To supplement that data in order to determine if the outcome of this work can be generalized to all closed captioning for video-conferencing tools, more formal speech samples were analyzed to assess the effect of outside compounding factors. The supplementary experiment showed contradicting results to the main study.

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .1

2 Background & Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.2 Language Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . .5

2.2.1 Standard American English . . . . . . . . . . . . . . . . . . . . . . . . . . .5

2.2.2 African American Vernacular English . . . . . . . . . . . . . . . . . . . . . . . . . . .6

3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .8

3.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . . .8

3.1.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . .9

4 Experiment & Results . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4.1.1 Audio Data . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4.1.2 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . .14

4.1.3 Video-Conferencing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . .14

4.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .15

4.2.1 Analysis: Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .15

4.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .17

4.3.1 Analysis: Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .17

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .19

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5.1.1 Continuation of Work . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5.1.2 Fairness Space . . . . . . . . . . . . . . . . . . . . . . . . . . .24

Appendix A

Full Data Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Appendix B

Appendix C

Full Supplementary Data . . . . . . . . . . . . . . . . . . . . . . . . . . .29

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .31

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Mathematics and Computer Science
Degree	B.S.
Submission	Honors Thesis
Language	English
Research Field	Artificial Intelligence Computer Science
Keyword	AI Fairness Audio Transcription Closed Captioning Speech Recognition
Committee Chair / Thesis Advisor	Arnold, Dorian, Emory University
Committee Members	Hue, Gillian, Emory University Mayo, Talea, Emory University Klein, Lauren, Emory University Wall, Emily, Emory University

Last modified

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Zoom Audio Transcription Accuracy for African American Vernacular English ()	2022-04-12 16:16:25 -0400	Download

Zoom Audio Transcription Accuracy for African American Vernacular English Open Access

Chance, Christina (Spring 2022)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files