Zoom Audio Transcription Accuracy for African American Vernacular English Público

Chance, Christina (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/3n2040522?locale=es
Published

Abstract

As telecommunication is becoming a growing part of society, there is a concern for reliability and accuracy for all users. African American Vernacular English has been a dialect marginalized and forgotten by the Speech Recognition and Natural Language Processing community, thereby making most speech recognition tools less accurate for Black speakers. This study explores Zoom’s closed captioning services for both African American Vernacular English and Standard American English to assess the accuracy amongst the different regional forms of AAVE as well as compare the overall accuracy between SAE and AAVE. Python’s Asr Evaluation module was used to compute the edit distance. About 9 hours from both the CORAAL data-sets and Santa Barbra Corpus of Spoken American English we used; both data-sets possess conversational speech with linguistic sounds and stuttering. Results suggested that Zoom’s closed captioning tool works more effectively for AAVE than for SAE based on the current data. To supplement that data in order to determine if the outcome of this work can be generalized to all closed captioning for video-conferencing tools, more formal speech samples were analyzed to assess the effect of outside compounding factors. The supplementary experiment showed contradicting results to the main study.

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .1

2 Background & Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.2 Language Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . .5

2.2.1 Standard American English . . . . . . . . . . . . . . . . . . . . . . . . . . .5

2.2.2 African American Vernacular English . . . . . . . . . . . . . . . . . . . . . . . . . . .6

3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .8

3.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . . .8

3.1.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . .9

4 Experiment & Results . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4.1.1 Audio Data . . . . . . . . . . . . . . . . . . . . . . . . . . .12

4.1.2 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . .14

4.1.3 Video-Conferencing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . .14

4.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .15

4.2.1 Analysis: Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .15

4.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .17

4.3.1 Analysis: Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .17

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .19

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5.1.1 Continuation of Work . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5.1.2 Fairness Space . . . . . . . . . . . . . . . . . . . . . . . . . . .24

Appendix A

Full Data Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Appendix B

Appendix C

Full Supplementary Data . . . . . . . . . . . . . . . . . . . . . . . . . . .29

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .31

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palabra Clave
Committee Chair / Thesis Advisor
Committee Members
Última modificación

Primary PDF

Supplemental Files