Delivering 8K using AVC/H.264

YouTube launched 8K streaming back in 2015, but the lack of cameras available to content creators meant 8K uploads didn’t start in earnest until late 2016.  That’s around the time when we uploaded our first 8K video to YouTube, and while we ran into some interesting problems getting it up there (which aren’t worth discussing because they’ve all been fixed), overall we're impressed with YouTube’s ability to stream in 8K.

Being naturally curious, I wanted to know more about what they were using for 8k compression, so I downloaded the mp4 version YouTube streams to see which codec it was using.  Let me save you some time finding it yourself and show you what settings YouTube uses for 8K streaming on the desktop:

MediaInfo of a YouTube video file showing the 8K resolution in the AVC/H.264 codec

Does anything look weird to you? Unless you’re a compressionist, maybe not.

Here’s what’s strange: it lists the codec as AVC, otherwise known as H.264.  The problem with that is the largest frame size permitted by the H.264 video codec standard is 4,096 x 2,304, and yet somehow this video has a resolution of 7,680 x 4,320.  Which means that either this video, or the video standard must be lying.

Well, not exactly.  The frame resolution is Full Ultra High Definition (FUHD - 7,680 x 4,320), and the video codec is H.264 / AVC.  It’s just a non-standard H.264 / AVC.

Being able to make and use your own non-standard H.264 (or any other codec) video files is a really useful trick, and right now it’s an important thing to know for working with 8K video files.  Specifically, it’s important to know what benefits and drawbacks working outside the standard format offers and how to make the best use out of them.


Background

In 2014, a client asked about 5K, high frame rate footage to use on a demonstration display.  Since we’d been filming all of our videos at 5K resolution, remastering the files at their native camera resolution wasn't an issue and we were happy to work with them.

But as things moved forward with their marketing team we ran into a little problem.  We had no problem creating and playing 5K files on our systems, but when their team tried to play back the ProRes or DPX master files on their Windows based computer (which they were required to use for the presentation), they weren’t available to get real-time playback.  Why not? The ProRes couldn’t be decoded fast enough by the 32 bit QuickTime player on Windows, and the DPX files had too high of a data rate to be read from anything but a SAN at the time.

Fortunately, we’d already been experimenting with encoding 5K files in a few different delivery formats: High Efficiency Video Coding (HEVC / H.265), VP8 and VP9, and Advanced Video Coding (AVC / H.264).  The HEVC was too computationally complex to be decoded in real time for 5K HFR, since there were no hardware decoders that could handle the format (even in 8 bit precision) and FFMPEG still needed optimizations to playback HEVC beyond 1080p60 in real-time, on almost every system.  The VP8 and VP9 scared the client, since they weren’t comfortable working with the Matroska video container (for reasons they never explained - quality wise, this was the best choice at the time), which left us with H.264.

Which is how we delivered the final files: AVC Video with AAC Audio in an MP4 container, at a resolution of 5,120 x 2880, though we ended up dropping the playback frame rate to only 30fps for better detail retention.

Finding a way to encode and to play back these 5K files in H.264 wasn’t easy.  But once we did, we opened up the possibility of delivering files in any resolution to any client, regardless of the quality of their hardware.

So how did we do it?  We cheated the standard.  Just like Google does for 8K streaming on YouTube.  And for delivering VR video out of Google’s Jump VR system.

And since you’re probably now asking: “how do you cheat a standard?”, let’s review exactly what standards are.


Standards

Standards like MPEG-4 Part 10, Advanced Video Coding (AVC) / ITU-T Recommendation H.264 (H.264) exist to allow different hardware and software manufacturers to exchange video files with the guarantee they’ll actually work on someone else’s system.

Because of this standards have to impose limits on things like frame size, frame rate for a given frame size, and data rate in bits per second.  For AVC/H264, the different sets of limits are called Levels.  At its highest level, Level 5.2, AVC/H.264 has a maximum frame size of 4,096 x 2,304 pixels @ 56 frames per second, or 4,096 x 2160 @ 60 frames per second, so that standard H.264 decoders don’t have to accommodate any frame size or frame rate larger than that.

Commercial video encoders like those paired with the common NLEs Adobe Premiere, AVID Media Composer, and Final Cut Pro X, assume that you’ll want the broadest compatibility with the video file, so the software makes most of the decisions on how to compress the file, and strictly adheres to the available limits.  Which for H.264 means that you’ll never be able to create an 8K file out of one of these apps.

While standards allow for broad compatibility, sometimes codecs are needed to work in a more limited use setting. “Custom video solutions” are built for a specific purposes, and may need frame sizes, frame rates, or data rates that aren’t standard. This is where the standard commercial AVC/H.264 encoding softwares often won’t work, and you either write a new encoder yourself (time consuming and expensive) or turn to the open source community.

Open source projects for codec encoding and decoding, like the x264 encoder/decoder implementation of the H.264 standard, often write code for all parts of the standard. x264 even includes playback features beyond the AVC/H.264 standard, specifically an ‘undefined’ or ‘unlimited’ profile or level where you can apply H.264 compression to any frame size or frame rate. The catch is that it just won’t playback with hardware acceleration because it’s out of standard; it’ll need a software package that can decode it.

Spend enough time with codecs and compression and you’ll run across a term: FFMPEG.  FFMPEG is an open source software package that provides a framework for encoding or decoding audio and video. It’s free, it’s fast, and it’s scriptable (meaning it can be automated by a server) so a lot of companies who don’t write audio-video software themselves can simply incorporate FFMPEG and codec libraries like x264 for handling the multimedia aspect of their programs.

Which is exactly what YouTube does.

"Writing application : Lavf56.40.101" indicates the file was written using FFMPEG in this 8K file from YouTube.

That’s right, when you upload a video to YouTube, Google’s servers create encoding scripts for FFMPEG, which are sent off to various physical servers in Google’s data centers to encode all of the different formats that YouTube uses to optimize the video experience for desktops, televisions, phones, and tablets, and for internet connections ranging from dial-up to fiber optic.

And for 8K content streaming on the desktop, that means encoding it in 8K H.264.


Why AVC/H.264 for 8K?

Which, of course, leads us to our last two questions: Why H.264 and not something else? And How can you do it too?

For YouTube, using AVC/H.264 is a matter of convenience.  At the time that YouTube launched 8K support (and even today) HEVC/H.265, which officially supports 8K resolutions, is still too new to see broad hardware acceleration support - and even then few hardware solutions support at 8K resolution.  (Side note - as of the last time we tested it [Jan 2017] the open source HEVC/H265 encoder x265 struggles with 8K resolutions, so there’s that too).  Google’s own VP9/VP10 codecs still weren't ready for broad deployment when 8K support was announced, and hardware VP9 support is just starting to appear.

YouTube selecting either HEVC/H.265 or the VP9/VP10 codecs would severely limit where 8K playback would be allowed.  And since software decoding 8K H.264 can work in real time while H.265 doesn’t on most computers (H.264 is about 5 - 8 times less processor intensive than H.265) we have YouTube streaming in 8K in the AVC/H.264 codec, at least until VP10 or H265 streaming support is added to the platform.


Encoding 8K Video into H.264

So you want to encode your own 5K or 8K H.264?  It’s easy - just download FFMPEG and run it from the command line.  Just kidding, that’s a horrible experience.  Use FFMPEG, but run it through a frontend instead.

The syntax for running FFMPEG from the command line can get a little complicated.

An FFMPEG frontend is a piece of software that gives you a nicer user interface to decide your settings, then sends off your decisions to FFMPEG and its associated software to do the actual work.  Handbrake is a good example of a user-friendly cross platform front end for simple jobs, but it doesn’t give you access to all the options available.  The best that I’ve found for that is a frontend called Hybrid.

Hybrid is a little more complicated than, say, Adobe Media Encoder, but it gives you access to all of the features that are available in the x264 library (i.e. all of the AVC/H.264 standard + out of standard encoding) instead of the more limited features that other packages give you.  It’s a cross-platform solution that works on Windows and MacOS, it’s updated regularly to add new features and optimizations, and it by default hides some of the complexity if you just want to do a basic encode with it.


Hybrid

Here are the settings we’d use for a 5K or higher H.264 video:

Main Pane of Hybrid showing where to select the audio and video codecs, and where to set the output file name.

On the first pane of the program, select your input file, generate or select your output file name, and decide on which video codec you want to use (in this example, x264) and whether to include audio or not (set it to custom).

Set the Profile and Level to None/Unrestricted to encode high bitrate 8K video

Now, under the x264 tab, make the following changes: Switch the encoding mode to “average bitrate (1-pass)”, and change the Bitrate (kbits/s) value to 200,000.  That’ll set our target bitrate to 200Mbps, which for 8K it’s the equivalent quality as 50Mbps for 4K.

Then, under the restriction settings, change the “AVC Profile/Level” drop downs to “none” and “unrestricted”.  Leave everything else the same and jump over to the Audio tab at the top.

Add the audio by selecting "Audio Encoding Options" and then clicking the plus to add it to the selected audio options

In the audio tab, add an audio source if your main source file doesn’t have one, turn on the Audio Encoding Options pane by using the check box, choose your audio format and bit rate (in this case I’m using the default AAC with 128 kbps audio, then click the big plus sign at the top right of the audio cue to add that track of audio to your output file.

What to click to add your job to the queue and get the queue started

That’s it.  You’re done.  Jump back to the Main tab, click the “add to queue” button to add your job to the batch, and either follow the same steps to add another, or click on “start queue” to get things rendering.

When you’re done you’ll find yourself with a perfectly useable 8K file compressed into H.264!


Who Cares?

Is this useless knowledge to have?  Not if you regularly create 8K video for YouTube, or if you create VR content using the GoPro Odyssey rig with Google Jump VR.  In both of those cases you’ll need to upload an 8K file.  While the ProRes format works, it’s quite large (data wise) and may be problematic for upload times.  Uploading AVC/H.264 is a better option in some cases, and it can always be used as a delivery file for 8K content when data rates prohibit DPX or an intermediate format.

To playback files created this way, you need a video player that also leverages lightweight playback and non-standard video, like MPC-HC on Windows or MPV on Windows or MacOS.  Sometimes QuickTime will work, though it rarely works on Windows because it’s still a 32 bit core, and VLC is also a solid option in many cases.  But both of those have more overhead than FFMPEG core players and can cause jittery playback.

Spending time learning new programs, especially ones that aren’t at face value user friendly like Hybrid or FFMPEG, doesn't seem like it’ll pay off.  But the process of discovery, trial, and error is your friend when you’re trying to stay ahead of the game in video.  Don’t be afraid to test out something new.

It’s how we were able to deliver 5K video content to a client when no one else could, and how we still stay at the forefront of video technologies today.

Written by Samuel Bilodeau, Head of Technology and Post Production

How to Upload HDR Video to YouTube (with a LUT)

Today YouTube announced via their blog official HDR streaming support.  I alluded to the fact that this was coming in my article about grading in HDR because we've been working with them the past month to get our latest HDR video onto the platform. It's officially live now, so we can go into detail.


How to Upload HDR Video to YouTube

Similar to VR support, there are no flags on the platform itself that will allow the user to manually flag the video as HDR after it's been uploaded, so the uploaded file must include the proper HDR metadata.  But YouTube doesn't support uploading in HEVC, so there are two possible pathways to getting the right metadata into your file: DaVinci Resolve Studio 12.5.2 or higher, or the YouTube HDR Metadata Tool.  They are generally outlined in the YouTube support page, but not very clearly, so I think more detail is useful.

I did include a lengthy description on how to manage HDR metadata in DaVinci Resolve Studio 12.5.2+, with a lot more detail than they include on their support page, so if you want to use the Resolve method, head over there and check that out.  I've covered it once, so I don't see the need to cover the how-to's again.

I should note that Resolve doesn't include the necessary metadata for full HDR10 compatibility, lacking fields for MaxFALL, MaxCLL, and the Mastering Display values of SMPTE ST.2086.  It does mark the BT.2020 primaries and the transfer characteristics as either ST.2084 (PQ) or ARIB STD-B67 (HLG), which will let YouTube recognize the file as HDR Video.  YouTube will then fill in the missing metadata for you when it prepares the streaming version for HDR televisions, by assuming you're using the Sony BVM-X300.  So this works, and is relatively easy.  BUT, you don't get to include your own SDR cross conversion LUT; for that you'll need to use YouTube's HDR Metadata Tool.

 

***UPDATE: April 20, 2017*** We've discovered in our testing that if you pass uncompressed 24 bit audio into your QuickTime container out of some versions of Adobe Media Encoder / Adobe Premiere into the mkvmerge tool described below the audio will be distorted.  We recommend using 16 bit uncompressed audio or AAC instead until the solution is found.

 

YouTube's HDR Metadata Tool

Okay, let's talk about option two: YouTube's HDR Metadata Tool.  

Alright, not to criticize or anything here, but the VR metadata tool comes in a nice GUI, but the link to the HDR tool sends you straight to GitHub.  Awesome.  Don't panic, just follow the link, download the whole package, and un-Zip the file.

So the bad news: whether you're working on Windows or on a Mac, you're going to need to use the command line to run the utility.  Fire up Command Prompt (Windows) or Terminal (MacOS) to get yourself a shell.

So the really bad news: If you're using a Mac, the binary you need to run is actually inside the app package mkvmerge.app.  If you're on Windows, drag the 32 or 64 bit version of mkvmerge.exe into Command Prompt to get thing started; if you're on MacOS, right click on mkvmerge.app, select "Show Package Contents", and drag the binary file ./Contents/MacOS/mkvmerge into Terminal to get started:

Right click on mkvmerge.app and select "Show Package Contents"

Drag the mkvmerge binary into Terminal

The README.md file includes some important instructions and the default syntax to run the tool, with the assumption that you're using the Sony BVM-X300 and mastering in SMPTE ST.2084.  I've copied the relevant syntax here (I'm using a Mac; delete anything in bold before copying the command over, and replace the file paths in the **s with your content:)

./hdr_metadata-master/macos/mkvmerge.app/Contents/MacOS/mkvmerge \
-o *yourfilename.mkv* \
--colour-matrix 0:9 \
--colour-range 0:1 \
--colour-transfer-characteristics 0:16 \
--colour-primaries 0:9 \
--max-content-light 0:1000 \
--max-frame-light 0:300 \
--max-luminance 0:1000 \
--min-luminance 0:0.01 \
--chromaticity-coordinates 0:0.68,0.32,0.265,0.690,0.15,0.06 \
--white-colour-coordinates 0:0.3127,0.3290 \

If using a LUT, add the lines
--attachment-mime-type application/x-cube \
--attach-file *file-path-to-your-cube-LUT* \

In all cases end with
*yourfilename.mov*

Beyond the initial call to the binary or executable, the syntax is identical on MacOS and Windows.

The program's full syntax can be found here, but it's a little overwhelming.  If you want to look it up, just focus on section 2.8, which include the arguments we're using here.   The first four arguments set the color matrix (BT.2020 non-constant), color range (Broadcast), transfer function (ST.2084), and color space (BT.2020) by referencing specific index values, which you can find on the linked page.  If you want to use HLG instead of PQ, switch the value of --colour-transfer-characteristics to 0:18, which will flag for ARIB STD-B67.

(Note to the less code savvy: the backslashes at the end of each line allow you to break the syntax across multiple lines in the command prompt or terminal window.  You'll need them at the end of every line you copy and paste in, except for the last one)

The rest of the list of video properties should be fairly self explanatory, and match the metadata required by HDR10, which I go over in more detail here.

Now, if you want to include your own SDR cross conversion LUT, you'll need to include the arguments --attachment-mime-type application/x-cube, which tells the program you want to attach a file that's not processed (specifically, a cube LUT), and --attach-file filepath, which is the actual file you're attaching.

If you don't attach your own LUT, YouTube will handle the SDR cross conversion with their own internal LUT.  It's not bad, but personally I don't like the hard clipping above 300 nits and the loss of detail in the reds, but that's largely a personal preference.  See the comparison screenshots below to see how theirs works.

Once you've pasted in all of the arguments and set your input file path, hit enter to let it run and it'll make a new MKV.  It doesn't recompress any video data, just copies it over, so if you gave it ProRes, it'll still be the same ProRes stream but with the included HDR metadata and LUT that YouTube needs to recognize the file.

Overall, it's It's a pretty fast tool, and extremely useful beyond just YouTube applications.  You can see what it's done in this set of screenshots below.  The first is the source ProRes clip, the second is the same after passing it through mkvmerge to add the metadata only, and the third went through mkvmerge to get the metadata and my own LUT:

ProRes 422 UHD Upload Without Metadata Injection

ProRes 422 UHD Upload in MKV File. Derived from the ProRes File above and passed through the mkvmerge tool to add HDR Metadata, but no LUT.

ProRes 422 UHD Upload in MKV file. Derived from the ProRes file above and passed through the mkvmerge tool to add HDR Metadata and including our SDR cross conversion LUT. Notice the increased detail in the brights of the snake skin, and the regained detail in the red flower.


All of us at Mystery Box are extremely excited to see HDR support finally widely available on YouTube.  We've been working in the medium for over a year, and haven't been able to distribute any of our HDR content in a way that consumers would actually be able to use.  But now, there's a general content distribution platform available with full HDR support, and we're excited to see what all creators can do with these new tools!

Written by Samuel Bilodeau, Head of Technology and Post Production