Transcoding files ready for HTTP Live Streaming on Linux

HTTP Live Streaming (HLS) is an IETF draft standard created by Apple Inc. It's use is pretty widespread, although it was primarily designed to allow video to be easily delivered to iOS devices it works well with a wide range of clients. Later versions of Android support it, as do players like VLC.

Unfortunately, a lot of the tools for creating (and testing) HLS streams are created and released by Apple. Unless you develop for iOS devices, you probably lack a developer login!

It's actually pretty easy to set up at a basic level. In this documentation we'll be looking at what HLS is, and how to prepare video for transmission using HLS.

 

What is HTTP Live Streaming?

It's basically a protocol allowing media to be streamed from any bog-standard HTTP server whilst keeping some of the functionality provided by dedicated media servers. For example; HLS allows you to adjust the video quality in real-time, based on the bandwidth available to the client. 

It can be used for 'Live' broadcasts and Video on Demand, and it's file based nature means that it plays really well with Content Distribution Networks. Once prepared for delivery, the content needed to provide a stream is static (though a 'Live' stream, but it's very nature, won't be), so the stream can be served from any HTTP(S) server.

Great news for those in the creative industry - HLS also supports (incredibly) basic DRM in the form of AES-128 encryption. We won't be covering that though as it's outside the scope of this piece.

 

What are the preparation steps?

Simply put, there are three stages to the preparation & delivery of HLS content

  • Encode in H.264 & AAC/MP3 audio within an MPEG2TS container (at a variety of bitrates/resolutions if you want to offer differing qualities)
  • Segment the video into 5-10 second clips and create a M3U8 playlist file
  • Deliver the content.

The video client parses the M3U8 and fetches the appropriate quality video. Generally, it'll start at a lower quality and then attempt to ramp up the quality as far as possible, based on the available connection. Timed metadata (such as subtitles) can be delivered embedded within the content, within the ID3 tags.

 

What Does a Stream Consist of?

A HLS stream essentially consists of two components

  • The M3U8 file
  • A sequence of short video files

The M3U8 file is essentially an M3U playlist with extended meta-data. The client begins by fetching the playlist, and then having parsed the metadata will fetch and play each segment in sequence. Through the metadata we can use one playlist to identify several sequences, each at a specific video quality.

 

There must be some server configuration to do?

Assuming the server is simply hosting the files, there should be no changes necessary. The only exception being if the server is not returning an appropriate MIME type for the files;

M3U8 File apple.vnd.mpegURL OR application/x-mpegURL
Video Segments video/MP2T

 This is quite easily adjusted if needed though, 

#Apache
nano /etc/httpd/conf/mime.types
# NGinx
nano /etc/nginx/mime.types

# Apache or Nginx, add
apple.vnd.mpegurl          m3u8
video/mp2t                    ts

It's that simple! If you're planning on streaming Live content, you might also want to set a really low cache expiry time on your M3U8 file though - otherwise when the client attempts to retrieve and updated copy they'll get the cached one instead.

NGinx users might also like to take note of the rtmp module as it uses the asynchronous event model to deliver incredibly high performance when serving HLS content.

 

Encoding your Video

This is what you actually came here for, taking an arbitrary video (in our example, rick_roll.avi) and encoding it for delivery using HLS.

Before we begin, there are a few pre-requisites

  • The LAME MP3 encoder (and it's development libraries)
  • The x264 encoder (and it's development libraries)
  • ffmpeg, compiled with LAME and libx264 support

Now, we simply  need to segment the video and create our M3U8 file. For the sake of convenience, I've created a script to automate the task, so grab HLS Stream Creator from GitHub.

Usage is pretty simple, to split our example rick_roll.avi into 10 second segments we copy the video in the HLS Stream Creator directory and run

./HLS-Stream-Creator.sh rick_roll.avi 10

Once the transcoding has completed, the directory output will contain one M3U8 file and a number of video files (if rick_roll.avi is 2 minutes long, there'll be 12 video files).

Copy these files onto your web server and you're ready to go. It's a pretty basic example and doesn't include differing qualities (a feature that's coming soon), but a look at the source should give you a better understanding of just how easy it is to deliver via HLS.

 

The M3U8 File

The best documentation on what the M3U8 file can contain (and can do) is found over in Apple's documentation. However, as a basic guide, the M3U8 file we created will be in the following format

#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXTINF:10, no desc
rick_roll.avi_001.ts
#EXTINF:10, no desc
rick_roll.avi_002.ts
#EXTINF:10, no desc
rick_roll.avi_003.ts
#EXTINF:5, no desc
rick_roll.avi_004.ts
#EXT-X-ENDLIST

The first line simply identifies the file type. EXT-X-MEDIA-SEQUENCE allows you to specify the sequence number of the first URL in the playlist, so if (for some reason) you want to start the sequence at 12, that's what you'd set this to.

EXT-X-VERSION identifies which version of the HLS protocol the M3U8 file conforms to. In truth there's nothing in there that would prevent us using version 4, but as we're not using anything version 4 specific 3 felt like a better fit for the time being.

EXT-X-TARGETDURATION specifies the maximum length (in seconds) of any segment within the playlist.

EXTINF:10, no desc Specifies the duration (in this case 10 seconds) of the following segment as well as a meta-description (in this case, no desc).

rick_roll.avi_001.ts is the filename of the segment. This can (as here) be a relative path - relative to the location of the M3U8 - or can be an absolute URL.

To provide the same stream in a variety of qualities we use the EXT-X-STREAM-INF tag (see the Apple documentation for more information)

 

Requirements

There are a few basic requirements of the protocol

  • Each segment must be preceded by an EXTINF tag
  • Whitespace is not permitted.
  • All line endings must be CRLF.
  • EXT-X-VERSION must be specified, otherwise some media players will report an error

 

Playing your content

There are quite a few players to choose from, obviously Apple would probably prefer you buy an i(phone|pad) to watch your content, but there are other options;

  • ffplay (part of FFMpeg)
  • VLC (2.0.2 and greater)
  • QuickTime (obviously)
  • Android (Honeycomb and above)
  • XBMC (v12 and above)

To have the content play in a browser, you're likely to need to encode in a variety of formats - Safari appears to be the only browser which can play directly from the stream, others need something like JW Player in front of them.

 

Additional Tweaks

Reducing the file count

If you're only hosting a few streams, then segmenting media into separate files probably isn't too much of an issue. If, however, you host a lot of streams, or a lot of streams in a variety of qualities, you could quickly find you have thousands of inodes in use.

An alternative option is to encode the media into a single file and then use byterange requests instead. So our example above becomes

#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXT-X-BYTERANGE:65132@0
#EXTINF:10, no desc
rick_roll.avi_001.ts
#EXTINF:10, no desc
#EXT-X-BYTERANGE:134560@65132
rick_roll.avi_002.ts
#EXTINF:10, no desc
#EXT-X-BYTERANGE:195685@134560
rick_roll.avi_003.ts
#EXTINF:5, no desc
#EXT-X-BYTERANGE:234578@195685
rick_roll.avi_004.ts
#EXT-X-ENDLIST

We now have just the two files (the media file and the playlist) on our hard-drive.

 

Dynamic Playlists

Because the protocol is essentially focused around a single text file, you have a reasonable amount of flexibility in what you can do if you're happy to generate that file dynamically. Consider the following example, where we call a (fake) geo-location function and if the user is outside the UK insert ads into the stream

<?php
$adstring='';
$user-loc = geolocate_ip($_SERVER['REMOTE_ADDR']);

if ($user-loc != 'UK'){
$adstring = "\n#EXTINF:10.0,\nad0.ts\n".
"#EXTINF:8.0,\nad1.ts"
"\n#EXT-X-DISCONTINUITY";
}

ob_start(); 

?>
#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10<php echo $adstring;?>
#EXTINF:10, no desc
rick_roll.avi_001.ts
#EXTINF:10, no desc
rick_roll.avi_002.ts
#EXTINF:10, no desc
rick_roll.avi_003.ts
#EXTINF:5, no desc
rick_roll.avi_004.ts
#EXT-X-ENDLIST
<?php
$op = ob_get_clean();
header('Content-Type: apple.vnd.mpegurl');
header('Content-Length:  '.strlen($op));
echo $op;
?>

It's a pretty inflexible example, but hopefully it highlights the changes you can make if you're not firmly set on needing a static file.

 

Conclusion

HLS provides an easy inexpensive way to stream video, but does have the downside that the protocol draft proscribes H.264. This does mean that a number of the popular browsers may opt never to support it natively, but as iOS, Android and Windows Phone 8 all support HLS it does at least mean that mobile users are all very well catered for by HLS.

When taken with the EXT-X-STREAM-INF tag, it's possible to quickly and easily provide video clients with the information they need to adjust quality, so video can easily be streamed across Wifi, 3G and GPRS connections.

If you're a content distributor/provider, it also allows you to use traditional Content Delivery Networks rather than having to rely on those with multimedia streaming software.