WebVTT and Video Subtitles

With the web industry embracing HTML5‘s ability to embed video in the browser, we need to take a further look at what the future holds for video and HTML5, particularly in the field of accessibility.

Updated on to reflect changes in the WebVTT specification.

It’s all very well to embed audio and video into your website, but how accessible are these? Simply adding a video is fine for those who don’t need any help in viewing it, but for those who might need to read what’s being said or have something read it out to them (to take one particular example) it’s not so useful.

The WHATWG, and in particular Silvia Pfeiffer, have attempted to address this with a new file format called WebVTT: Web Video Text Tracks. This file, used in conjunction with the HTML5 track element, can be used to specify accessible information for a multimedia source:

That’s quite a lot, and today we’ll simply concentrate on video subtitles, but I aim to cover the others, with examples, in other articles in the near future.

WebVTT file format

A WebVTT file is simply a text file with a .vtt extension that follows a certain format. No surprise there.

A WebVTT file takes the following format:


[hh:]mm:ss.msmsms --> [hh:]mm:ss.msmsms [cue settings]

As you can see, the file needs to start with a <abbr>WebVTT</abbr> FILE header followed by details of each cue that’s specified within the video itself. (The idstring can be one or more characters that does not contain –> or \r, \n or \r\n).

Using this, let’s dive into immediately taking a look at a sample WebVTT file:


00:00:03.500 --> 00:00:05.000 vertical:lr align:start
Everyone wants the most from life

00:00:06.000 --> 00:00:09.000 align:start
Like internet experiences that are rich <b>and</b> entertaining

This particular example defines two cues, each of which is, and must, be separated by at least one single line. The timings are pretty self explanatory, as are the line texts, but I’ve thrown in some cue settings there too, which I’ve not yet spoken about (you’ll also notice some HTML in the line text also, more about that in a bit).

There are a number of settings that can be specified for a subtitle cue which affect how it is displayed on the video:

For a more comprehensive view of these, see cue settings.

As mentioned above, you can also add styling to the text within the cue itself. The example above uses a simple <b> element but CSS classes can also be added:

00:00:11.000 --> 00:00:14.000 align:end
Phone conversations where people truly <c.highlight>connect</c>

If you’d prefer that the text would appear step-by-step (as in karaoke) you can simply add different timings in the cue text itself:

00:00:11.000 --> 00:00:14.000 align:end
Phone<00:00:11.000> conversations<00:00:12.000> where people<00:00:13.000> truly <c.highlight>connect</c>

These are the basics, although there are some other cue text formatting options available.

Note: Anne van Kesteren created an on-the-fly validator for WebVTT in JavaScript which includes a parser. Quite useful for helping you create and validate WebVTT files.

The element

So how do we actually tell the browser to use this WebVTT file? Simple, use the <track> element, which you place inside the <video> element after the sources have been specified.

<track label="English subtitles" kind="subtitles" srclang="en" src="upc-video-subtitles-en.vtt">

Since the <track> element has a kind attribute, which is set to subtitles here, you can see that this will also be used to specify captions, navigation and descriptions etc. for the same video source.

(More about the track element).


Unfortunately not many of the the major browsers currently support WebVTT (Chrome does and Internet Explorer 10 will) but there are a few JavaScript polyfills out there that you can use at the moment to take advantage of its features.

I have used the excellent Playr by Julien Villetorte , although there are others out there such as Captionator by Chris Giffard.

The Example

That’s all well and good, but I bet you actually want to see it working. I’ve put together a WebVTT Example which uses the playr polyfill mentioned above, to illustrate how it works.

The full WebVTT file can also be viewed.

Conclusion…for now

This was only a taster of what the WebVTT file format can do for HTML5 video and for web accessibility. There’s more to come!