Video captions

Video content is made accessible to deaf or hard of hearing people by three primary methods:

  • Subtitles are a synchronized translation of spoken words into text, sometimes in a different language.
  • Captions are a synchronized translation of all important sounds, possibly including speech, music, and noise. Captions can be part of the video, called open, or separate from the video and only included when wanted, called closed.
  • A transcript is a static record of all speech or all audio in the video.

The terms “captions” and “subtitles” are sometimes used interchangeably, but this page uses them as defined above. Captions are usually a strict improvement in accessibility over subtitles since they capture more audio information.

A transcript complements captions, since it can be read at any speed without the video. Ideally both captions and a transcript are available.

Support at the University of Victoria

Teach Anywhere has instructions for adding automatic captions with Echo360, and for adding captions to live video. Instructors can request a meeting with an Accessibility Learning Experience Designer from Learning and Teaching Support and Innovation through UVic’s consultation booking system. Employees can contact their chair or director for information about further support for captioning in their department or unit, such as TA hours.

Students with academic accommodations for transcription or sign language interpretation are supported by CAL’s Interpreting and Transcribing Program.

UVic employees interested in improving their video captions beyond those generated by Echo360, Teams, or Zoom can start by reading this page.


To be useful, captions must be closely synchronized with the audio. Words should appear when they are spoken, and disappear with the same pace as the speech, provided they appear long enough to be read comfortably.


Captions must be readable. This means a large, readable font with sharp contrast against the video.

  • Use sans-serif fonts. Captions might be displayed on small or low-resolution screens.
  • Keep lines short. Captions must be read quickly. 32 characters, including spaces, is a good guideline. 
  • Limit the amount of text on the screen. Two lines of text is a good guideline.
  • Provide contrast with text outlines or a coloured background. 
  • Do not rely only on colour to convey meaning. Not all people see colour equally and not all screens display colour equally. 
  • Break lines at the ends of phrases or clauses. 

Captions should not block important visual elements. This sometimes requires the text to be placed in different parts of the video at different times.


Captions must be equivalent to the audio information, as much as possible.

  • Name the speaker when multiple people are speaking.
  • Include filler words like “um” and “er”.
  • Retain slang like “gimme” and “gonna”. 
  • Indicate sound effects differently from speech, such as with parentheses. 
  • Indicate silence when someone appears to be speaking with a caption like “(no speech)”.

The minimum accuracy of captions depends on the context of the video. Please consult your standards office for more information. Automatic captions, such as those generated by Zoom, usually achieve 75% accuracy. Professional captioning usually achieves at least 99% accuracy.

EngageMedia has a much more detailed list of rules and suggestions for effective captions.

Other resources

These links were compiled on May 18, 2021. If they are no longer working or their information is out of date, please contact CAL's coordinator of adaptive technology at or 250-472-5483.