This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember your browser. We use this information to improve and customize your browsing experience, for analytics and metrics about our visitors both on this website and other media, and for marketing purposes. By using this website, you accept and agree to be bound by UVic’s Terms of Use and Protection of Privacy Policy.  If you do not agree to the above, you can configure your browser’s setting to “do not track.”

Skip to main content

Xin He (Alyssa) Jiang

  • B.A. (University of Alberta, 2020)
Notice of the Final Oral Examination for the Degree of Master of Arts

Topic

Applying Automatic Speech Recognition to Indigenous Language Documentation: A Case Study with Hul’q’umi’num

School of Languages, Linguistics and Cultures

Date & location

  • Wednesday, May 13, 2026
  • 10:00 A.M.
  • Clearihue Building, Room B021

Examining Committee

Supervisory Committee

  • Dr. Sonya Bird, School of Languages, Linguistics and Cultures, University of Victoria (Co-Supervisor)
  • Dr. Suzanne Urbanczyk, School of Languages, Linguistics and Cultures, UVic (Co-Supervisor)
  • Dr. Christopher Cox, School of Linguistics and Language Studies, Carleton University (Outside Member)

External Examiner

  • Dr. Chuutsqa Rorick, Department of Indigenous Education, UVic

Chair of Oral Examination

  • Dr. Iain McKechnie, Department of Anthropology, UVic

Abstract

The process of documenting Indigenous languages can create a large amount of audio recordings that are difficult to convert into a written form. Speeding up the transcription process using automatic speech recognition could help the Hul’q’umi’num’ Language & Culture Society to create pedagogical materials and make their recordings more accessible. In this project, I trained a language model known as XLS-R on Hul’q’umi’num’ audio recordings to determine how accurately it can transcribe Hul’q’umi’num’, whether particular linguistic and orthographic features are more difficult for XLS-R to transcribe, and what amount of time and computational resources the training takes. The model reached a CER of 11.1% and WER of 50% using 26 minutes of continuous speech. Most phonemes could be transcribed with high accuracy but the model showed difficulties with segmenting words, differentiating glottalized consonants from plain consonants, determining vowel length, and predicting the placement of glottal stops.