WebVTT and Video Subtitles
With the web industry embracing HTML5‘s ability to embed video in the browser, we need to take a further look at what the future holds for video and HTML5, particularly in the field of accessibility.
Updated on to reflect changes in the WebVTT specification.
It’s all very well to embed audio and video into your website, but how accessible are these? Simply adding a video is fine for those who don’t need any help in viewing it, but for those who might need to read what’s being said or have something read it out to them (to take one particular example) it’s not so useful.
The WHATWG, and in particular Silvia Pfeiffer, have attempted to address this with a new file format called WebVTT: Web Video Text Tracks. This file, used in conjunction with the HTML5 track element, can be used to specify accessible information for a multimedia source:
- subtitles
- descriptions
- captions
- navigation
- chapters
- metadata
That’s quite a lot, and today we’ll simply concentrate on video subtitles, but I aim to cover the others, with examples, in other articles in the near future.
WebVTT file format
A WebVTT file is simply a text file with a .vtt extension that follows a certain format. No surprise there.
A WebVTT file takes the following format:
WEBVTT FILE
[idstring]
[hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings]
TextLine1
TextLine2
...
As you can see, the file needs to start with a
<abbr>WebVTT</abbr> FILE header followed by details of each cue that’s specified within the video itself. (The idstring can be one or more characters that does not contain –> or \r, \n or \r\n).
Using this, let’s dive into immediately taking a look at a sample WebVTT file:
WEBVTT FILE
1
00:00:03.500 --> 00:00:05.000 vertical:lr align:start
Everyone wants the most from life
2
00:00:06.000 --> 00:00:09.000 align:start
Like internet experiences that are rich <b>and</b> entertaining
This particular example defines two cues, each of which is, and must, be separated by at least one single line. The timings are pretty self explanatory, as are the line texts, but I’ve thrown in some cue settings there too, which I’ve not yet spoken about (you’ll also notice some HTML in the line text also, more about that in a bit).
There are a number of settings that can be specified for a subtitle cue which affect how it is displayed on the video:
- vertical: lr | rl – specifies the direction
- line: XX% – specifies the line position relative to the video frame
- align: start | middle | end – indicates the text alignment
- position: XX% – specifies the text position
- size: XX% – specifies the text size
For a more comprehensive view of these, see cue settings.
As mentioned above, you can also add styling to the text within the cue itself. The example above uses a simple <b> element but CSS classes can also be added:
3
00:00:11.000 --> 00:00:14.000 align:end
Phone conversations where people truly <c.highlight>connect</c>
If you’d prefer that the text would appear step-by-step (as in karaoke) you can simply add different timings in the cue text itself:
00:00:11.000 --> 00:00:14.000 align:end
Phone<00:00:11.000> conversations<00:00:12.000> where people<00:00:13.000> truly <c.highlight>connect</c>
These are the basics, although there are some other cue text formatting options available.
Note: Anne van Kesteren created an on-the-fly validator for WebVTT in JavaScript which includes a parser. Quite useful for helping you create and validate WebVTT files.
The <track> element
So how do we actually tell the browser to use this WebVTT file? Simple, use the <track> element, which you place inside the <video> element after the sources have been specified.
<track label="English subtitles" kind="subtitles" srclang="en" src="upc-video-subtitles-en.vtt">
Since the <track> element has a kind attribute, which is set to subtitles here, you can see that this will also be used to specify captions, navigation and descriptions etc. for the same video source.
(More about the track element).
Browsers
Unfortunately not many of the the major browsers currently support WebVTT (Chrome does and Internet Explorer 10 will) but there are a few JavaScript polyfills out there that you can use at the moment to take advantage of its features.
I have used the excellent Playr by Julien Villetorte , although there are others out there such as Captionator by Chris Giffard.
The Example
That’s all well and good, but I bet you actually want to see it working. I’ve put together a WebVTT Example which uses the playr polyfill mentioned above, to illustrate how it works.
The full WebVTT file can also be viewed.
Conclusion…for now
This was only a taster of what the WebVTT file format can do for HTML5 video and for web accessibility. There’s more to come!
20 Responses
This looks very promising – I’m yet to delve any deeper that your writeup but would be interested to see if webVTT file could be used for Flash captions, subtitle etc. I fear not.
The obvious issue being that creating a HTML5 Video already requires a few formats and a flash fallback that would need the webVTT file contents ‘porting’ across to flash friendly version. Could be quite costly timewise.
Hi Mike.
The WebVTT file remains the same regardless of the number of sources that have been specified for the video, the browser will simply pick the source it can play and play it, and then (eventually) use the specified WebVTT file for the use indicated.
Flash will of course ignore it, and you’re right, something would have to be written for Flash to read in the contents of the specified WebVTT file and use it accordingly.
Who knows, perhaps Adobe will do this in the future?
Hi Ian.
I’m trying this out and am having trouble getting an accessible video version working in mobile browsers (such as iPad / iPhone).
What’s your current strategy for ensuring mobile users such as these can still get an “accessible” experience?
Cheers for now, Mike L
Pingback: html 5 - Page 2
Michael, sorry for the delayed reply. As mentioned above, playr currently only supports a limited number of browsers.
However, mediaelement.js apparently supports more, including Internet Explorers 6 – 8. I’m not sure if it extends to iPads/iPhones just yet, but I aim to find out.
I haven’t had a chance to test it myself.
If you do have a go yourself, please feel free to let me know how it goes.
Pingback: HTML5 multimedia accessibility at DevCSI Accessibility Hack Day | Opera I/O
Can captions be placed on multiple lines? how can we add a line feed in caption text?
Hi Russell, they can indeed, if you want a line feed simply write the captions underneath each other within the same caption definition.
e.g.
1
00:00:03.500 –> 00:00:05.000
Everyone wants
the most from life
Pingback: Joystick API, Subtitles, Remote Desktop and bouncing animations « Peter Beverloo
Pingback: HTML5 Multimedia – Bruce’s Frontrow, Krakow, Poland presentation | Opera News
hey ian,
great write-up! looking forward to your book! Accessibilities First!
quick question: in your example using css classes, c.highlight, c is the element and highlight is the class, correct? just want to make sure i understand it correctly. also, where would the stylesheet for that be located?
thanks and great work!
cheers,
Albert
Hi Albert,
Thanks for your comment and kind words!
Yes you are correct, c is the element and highlight is the class, but things have moved on from that example and it looks like even more styling will be allowed. The styles in this case are located in the header of the same file, but could just as easily be located in a separate CSS file, just like your normal CSS styling can be.
Pingback: The developer’s guide to the HTML5 APIs | Feature | .net magazine | Programmer Solution
Pingback: HTML5 APIs程序员指南 | Web App Trend
Pingback: Introducing HTML5: Second Edition – Book Review | Ian Devlin - web developer and author
Pingback: HTML5专家解读HTML5规范中API的用途和进展 | 狒狒博客
Pingback: Google Chrome supports WebVTT Subtitles | Ian Devlin
For using CSS, are there any limitations? Can we use background images or even CSS3 features?
Pingback: HTML5 - la tousse à outils - Jolis-Graphismes.fr
Pingback: HTML5 APIs程序员指南 | java交流网