Media matters

For a very long time, if you wanted to interact with rich media on the web, you needed a plugin (probably Flash). It was a huge hole in the browser, but a difficult one to plug, since media formats are often protected under copyright and only available for an onerous licensing fee. It took years to sort out. We're very lucky to be in a place now where we can include <audio> and <video> in a page and it pretty much just works.

Originally, both media tags were designed to support multiple source files, since one browser might support H.264 encoding and another might only support OGGV (or MP3 vs OGG). You could put two source tags in the parent media tag, one for each format. The downside to this is that you needed to create two (or more) files per media clip, and nobody wanted to do that. There was also the fact that Apple, which controls Safari with little concern for the feelings of developers or other companies, just flat-out refused to support open formats like VP8 on the iPhone. At the end of the day, Firefox caved and supported the patent-encumbered formats through a combination of trickery.

This is arguably bad for software freedom (should you care about such a thing), but it's good for us as developers. Both media tags support a src attribute identical to an img tag, using either H.264 or MP3 encoding for video and audio, respectively. So putting a playable media file on your page is as simple as the following:

<video controls src="movie.mp4"></video>

<audio controls src="sound.mp3"></video>

There are a number of optional attributes you can add to these tags, such as the controls attribute (which provides a track scrubber, play button, and time display). Most media tag attributes are boolean, meaning that they're either there or not, but the value doesn't matter. Other useful options include:

That last option is where things get complicated. Autoplaying video (and, to a lesser extent, audio) is... controversial, to say the least. To put it more bluntly, people hate it with a furious passion. And so, for most of the time that the video tag has existed, playing video on mobile required a "user gesture" of some kind, in a click or touch event listener. Users rejoiced, knowing that their phones would be relatively free of the annoying ads plastered all over our news sites (at least, until advertisers figured out workarounds).

But here's the thing: there are good reasons that you might want autoplaying video, and we can sum it up with the letters G, I, and F. Originally designed for a dial-up bulletin board in the 80's, GIF files are used for animation all over the web. Unfortunately, they're spectacularly bad at it, since they were never really designed to compress full-frame animations well, and because normally they support only 256 colors (which creates the signature "grain" you see in AOL screenshots).

A GIF is a bad video, both in terms of file size and image quality. But they played automatically and the video tag didn't, so they survived and even thrived in both ads and "real" content. Ironically, restrictions put into place to preserve users' bandwidth allowance ended up costing us much more, which is often the story of well-meaning technology standards.

Luckily, confronted with this reality, browser developers have created new guidelines for autoplaying video. There are three things that are key to know:

Let's get responsive

We typically think about responsive elements in terms of size. For example, we might want to rearrange the layout if the physical dimensions of the screen are very small or very large. We can also think of it in terms of resolution, delivering bigger or smaller images depending on whether the screen is high-DPI or not. However, there's a third kind of responsiveness that is not well-represented on the web platform now, but is especially important for time-based media like video and audio, and that's bandwidth.

Your connection to the wider Internet can be classified in several ways. We might use "latency," or the time delay between making a request and getting a response. Or we might discuss "speed" in terms of actual download rate—how many kilobytes per second you can pull down once the connection is made. We generally expect desktop users to have "fast" connections in both senses, while mobile connections have increasingly high download rates but relatively slow latency. But these rules of thumb are not particularly reliable, so we're already on shaky ground.

For me, a more important question in terms of bandwidth is the cost. Budget-sensitive users may be using a plan on desktop and/or mobile that has a download cap, at which point they will either be throttled or will need to pay by the byte for traffic. Forcing these users to download a high definition video file, which will dwarf almost any other asset on your page, is not only rude, it may well cost them money. Don't be the person who runs up the data cost for low-income users just for a visual effect.

Unfortunately, there's not a media query for bandwidth, and even if there were, there's no way to hook it up to a video tag, because they don't support srcset like images do. So our first problem is that if we want responsive video, we need to drop down into JavaScript. Our second problem is that our usual measures of quality, like screen size, are not a great proxy for bandwidth, although they will do in a pinch. There is a network information API in JavaScript, which provides some information, but is not yet well-supported. We may be better off providing a "switch" for users to opt-in to high quality video, similar to the "HD" buttons on YouTube and Vimeo.

We'll sidestep the question of checking the connection for now. Regardless of the method that we use to decide that a user is "high bandwidth," our method is going to be the same. First, we'll create the page with a video that has a poster attribute and the src set to point to a video encoded at a low resolution and high compression. This ensures that low-bandwidth users (who, after all, will also probably have a delay while waiting for JavaScript to download) are sent the smallest possible file. An extra attribute will hold onto the high-quality video URL for us.

<video
  autoplay muted
  poster="splash.jpg"
  src="low-quality.mp4"
  data-src="high-quality.mp4"
  class="responsive-video"></video>

Then, based on our test, we may swap out that video with a sharper version at runtime using JavaScript. For users with a fast connection, this should happen fast enough that they won't notice the change.

// isHighBandwidth() should return true if the user is considered "fast"
// it could be one or more of the above tests
var swap = isHighBandwidth();
if (swap) $(".responsive-video").forEach(function(video) {
  video.src = video.dataset.src;
});

A relatively simple change, but potentially a big difference for our users. And as a bonus, if we're paying by the megabyte for traffic, it will save us money as well. If we want to get really aggressive, and the network connection API is available to tell us when users are on very poor connections, we could even make the default asset an image, and upgrade in multiple stages:

<img
  class="responsive-video"
  src="poster.jpg"
  data-low="low-quality.mp4"
  data-high="high-quality.mp4"
>
var quality = getConnectionQuality();
if (quality != "low") $(".responsive-video").forEach(function(img) {
  // create a replacement video
  var video = document.createElement("video");
  video.className = "responsive-video";
  video.setAttribute("poster", img.src);
  video.setAttribute("autoplay", "");
  video.setAttribute("muted", "");
  video.src = quality == "medium" ? img.dataset.low : img.dataset.high;
  img.parentElement.replaceChild(video, img);
});

Creating a design that can scale up or down to a user's capability adds some complexity to this process, and it does require us to re-encode the video into a couple of different files. But this seems like a small price to pay for accessibility.

Live updates

It's great that the browser's media tags provide controls, and for the most part I strongly recommend using them: they're guaranteed to be accessible and well-tested. Building your own player controls will almost certainly be a more difficult task. That said, we may still want to synchronize other parts of the page with audio playback—say, for transcripts, slides, or special effects. To do so, we need to be able to handle media events.

Let's say that we have an audio tag that we want to make "live":

<audio src="recording.mp3"></audio>

One thing that's important to know from the start is that media events, unlike most UI events, do not bubble up through the DOM, so you can't delegate your event listeners. We have to attach listeners directly to the element itself, such as these listeners on the play and pause events.

var audio = document.querySelector("audio");

var onPause = function(e) {
  // the element will have its "paused" property set accordingly
  if (e.target.paused) {
    console.log("Audio is currently paused");
  } else {
    console.log("Audio playback started");
  }
};

// monitor start/stop
["pause", "play"].forEach(event => audio.addEventListener(event, onPause));

Our next task is to update whenever the audio changes location, either because the user used the scrubber to jump around (the seek event) or because of normal playback (timeupdate). To find the exact location within the file, we can look at the media element's currentTime property, which will contain the file's playback location in seconds:

var onUpdate = function(e) {
  var element = e.target;
  var time = element.currentTime;
  console.log(`Playback currently at ${time} seconds`);
};

["timeupdate", "seek"].forEach(event => audio.addEventListener(event, onUpdate));

One possible use from combining these four events is to create a live visual transcript of audio, with labels identifying speakers. For one project with a recording from court, we used the standard WebVTT caption file (which you can get from many video transcription services) to do just that, and reused it to provide readers with transcripts of 911 audio a few months later (source code here). Since many readers may not have headphones in, or if audio is garbled, this is a great way to present spoken material.