Just announced: Level Access and eSSENTIAL Accessibility agree to merge! Read more.

Timed Text Markup Language vs Web Video Text Track

What are Timed Text Markup Language (TTML) and Web Video Text Track (WebVTT)? How do we benefit from these captioning formats? Will these ever merge together, or will they evolve separately over time? Why does it matter which captioning format we use when creating closed captioning for videos?

A variety of captioning file formats exist, all of which differ slightly in syntax. These different formats are used for captioning, subtitling, karaoke, etc. in videos for both TV and on the web. Of the many captioning file formats that are available, TTML and WebVTT are the two that spark debates among developers. However, choosing the correct file format is important for both accessibility and compatibility.

Development History of TTML and WebVTT

TTML is an XML-based format that was created circa 2002 by Timed Text Working Group (TTWG) at the World Wide Web Consortium (W3C). TTML was not established as a standard until 2010, due to the fact that many different captioning formats are in use for different captioning needs. TTML became the W3C’s standard to serve as a unified format for captioning and timed formatting needs.

On the other hand, WebVTT is in its infancy and growing in popularity. WebVTT was born out of Subtitle Resource Tracks (SRT) by Web Hypertext Application Technology Working Group (WHATWG), a small group of developers. Instead of using a complex captioning format like TTML, this group created a simple markup language, WebVTT, that is based on SRT. The simplicity of WebVTT has driven it to become the standard captioning format for HTML5 videos.

Comparison of Syntax

The examples of TTML and WebVTT seen below are from Microsoft Developer Network. The differences are notable between these syntaxes; TTML is an XML-based format, while WebVTT is simply a plain text format.

This illustrates of how the TTML syntax is coded in XML:

<?xml version='1.0' encoding='UTF-8'?> 
<tt xmlns='http://www.w3.org/ns/ttml' xml:lang='en'> 
<p> begin="00:00:01.878" end="00:00:05.334" > 
Good day everyone, my name is John Smith</p>
<p> begin="00:00:08.608" end="00:00:15.296" >
This video teaches you how to build a sand castle on any beach</p>

WebVTT syntax is derived from Subtitle Resource Tracks (SRT). This syntax hardly uses XML language:

00:00:01.878 --> 00:00:05.334
Good day everyone, my name is John Smith.
00:00:08.608 --> 00:00:15.296
This video teaches you how to build a sand castle on any beach.

Source: Microsoft Developer Network, accessed July 3, 2014.

Support of Modern Browsers for TTML and WebVTT

Many browsers (e.g., Internet Explorer, Chrome, Safari) recognize and support WebVTT files (.vtt) for HTML5 videos. HTML5 comes with elements such as <track> and <video> that allows embedded videos without the use of third party plugins (e.g., RealPlayer or Quick Time). Think about this for a second: developers had to use the third party plugins prior to HTML5 because back then, browsers lacked ability to support video natively.

However, Internet Explorer 10 to date is the only browser that supports both TTML and WebVTT. What does this mean for developers? It means that TTML is not supported natively by all modern browsers in spite of W3C’s position with TTML. The third party plugins are the only tool that will support TTML across browsers.

Benefits of TTML for Broadcasting Industries

Broadcasting industries such as US and Europe depend on TTML for TV-based media because it does a better job emitting XML language. Balisage: The Markup Conference 2013 states that “while the translation process of spoken text into subtitles still requires a large amount of manual work, the deployment of subtitles in different subtitle formats for linear and non-linear TV is only practically feasible when it is automated.” This is one reason why the WHATWG created a simpler captioning format designed to handle different captioning and other timed formatting needs such as subtitling for the videos on the web.

What Does the Future Hold?

Only time will tell how both standards, TTML and WebVTT, will evolve as they become more robust for both television and web platforms. At this point, it is clear that WebVTT is the forerunner of the two products, because HTML5 is the dominant lead today and of the future.

The example of the captioning formats above shows the vast differences in their syntax and how they work for TV and web platforms; therefore, the idea of merging both formats into one standard for both platforms appears unlikely. This may change in the future. As Balisage: The Markup Conference 2013 indicates, “the efforts to combine both activities in one W3C working group can be seen as a promising step.”