HDR Video Part 5: Grading, Mastering, and Delivering HDR

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 5: Grading, Mastering, and Delivering HDR.

In our series on HDR so far, we’ve covered the basic question of “What is HDR?”, what hardware you need to see it, the new terms that apply to it, and how to prepare and shoot with HDR in mind.  Arguably, we’ve saved the most complicated subject for last: grading, mastering, and delivering.

First, we’re going to look at setting up an HDR grading project, and the actual mechanics of grading in the two HDR spaces.  Next, we’re going to look at how to prepare cross conversion grades to convert from one HDR space to the other, or from HDR to SDR spaces.  Then, we’re going to look at suitable compression options for master & intermediate files, before discussing how to prepare files suitable for end-user delivery.

Now, if you don’t handle your own coloring and mastering, you may be tempted simply to ignore this part of our series.  I’d recommend you don’t - not just because I’ve taken the time to write it, but I sincerely believe that if you work at any step along an image pipeline, from acquisition to exhibition, your work will benefit from learning how the image is treated in other steps along the way.  Just my two cents.

Let’s dive in.

NOTE: Much of this information will be dated, probably within the next six months to a year or so. As programs incorporate more native HDR features, some of the workarounds and manual processes described here will likely be obsolete.


Pick Your Program

Before diving into the nitty gritty of technique, we need to talk applications.  Built-in color grading tools or plugins for Premiere, Avid, or FCP-X are a no-no.  Until all of the major grading applications have full and native HDR support, you’re going to want to pick a program that offers full color flexibility and precision in making adjustments.

I’m going to run you through my workflow using DaVinci Resolve Studio, which I’ve been using to grade in HDR since October 2015, long before Resolve contained any native HDR tools.  My reasoning here is threefold: one, it’s the application I actually use for grading on a regular basis; two, the tricks I developed to grade HDR in DaVinci can be applied to most other color grading applications; and three, it offers some technical benefits that we find important to HDR grading, including:

  • 32 bit internal color processing

  • Node based corrections offering both sequential and parallel operations

  • Color space agnostic processing engine

  • Extensive LUT support, including support for multiple LUTs per shot

  • Ability to quickly apply timeline & group corrections

  • Extensive, easily accessible correction toolset with customizable levels of finesse

  • Timeline editing tools for quick edits or sequence changes

  • Proper metadata inclusion in QuickTime intermediate files

Now, I’m not going to say that DaVinci Resolve is perfect.  I have a laundry list of beefs that range from minor annoyances to major complaints (but the same is basically true for every program that I’ve used…), but for HDR grading its benefits outweigh its drawbacks.

My philosophy tends to be that if you can pretty easily make a program you’re familiar with do something, use that program.  So while we’re going to look at how to grade in DaVinci Resolve Studio, you should be able to use any professional color grading application to achieve similar results, by translating the technique of the grade into that application’s feature set.*

If you are using DaVinci Resolve Studio, I recommend upgrading to version 12.5.2 or higher, for reasons I’ll clarify in a bit.

DaVinci Resolve Studio version 12.5.2 has features that make it very useful for HDR grading and encoding.


Grading in HDR

So now that we’re clear on what we need in a color grading program, let’s get to the grading technique itself.  For starters, I’m going to focus on grading with the PQ EOTF rather than HLG, simply because there’s a lot of overlap between the two.  The initial subsections will focus on PQ grading, but I’ll conclude the section with a bit about how to adapt the advice (and your PQ grade!) to grading in HLG.

Set up the Project

I assume, at this point, that you’re familiar with how to import and set up a DaVinci Resolve Studio project for normal grading using your own hardware, adding footage, and importing the timeline with your sequence.  Most of that hasn’t changed, so go ahead and set up the project as usual, and then take a look at the settings that need to be different for HDR.

First, under your Master Project Settings you’re going to want to turn on DaVinci’s integrated color management by changing the Color Science value to “Davinci YRGB Color Managed”.  Enabling DaVinci’s color management allows you to set the working color space, which as of Resolve Studio 12.5.2 and higher will embed the correct color space, transfer function, and transform matrix metadata to QuickTime files using ProRes, DNxHR, H.264, or Uncompressed codecs.  As more and more applications become aware of how to move between color spaces, especially BT.2020 and the HDR curves, this is invaluable.

Enabling DaVinci YRGB Color Management as a Precursor for HDR Grading

Side note: I’m actually not recommending using their color management for input color space transformations; in fact, for my HDR grades, I actually set the input to “bypass” and the timeline and output color space values to the same values, because I don’t like how these transformations affect how basic grading operations act.  Color management is however a useful starting point for HDR and SDR cross conversions, which we’ll discuss in a bit.

Once color management is turned on, you’ll want to set it up for the HDR grade.  Move to the Color Management pane of the project settings and enable setting “Use Separate Color Space and Gamma”.  This will give you fine-tuneable controls over the input, timeline, and output values.  If you want to keep these flat, i.e. preventing any actual color management by DaVinci, set the Input Color Space to “Bypass” and the Timeline and Output Color Space to “Rec.2020” - “ST.2084”.  This will enable the proper metadata in your renders without affecting any grading operations.

For the purposes of what I’m demonstrating here, if you are using DaVinci’s color management for color transformations, use these settings:

  • Input Color Space - <”Bypass,” Camera Color Space or Rec 709> - <”Bypass,” Camera Gamma or Rec 709>

  • Timeline Color Space - “Rec.2020” - “ST.2084”

  • Output Color Space - “Rec.2020” - “ST.2084”

DaVinci Resolve Studio for embedding HDR metadata in master files, without affecting overall color management.

NOTE: At the time of this writing DaVinci's ACES doesn’t support HLG at all, or PQ within the BT.2020 color space; in the future, this may be a better option to use, if you’re comfortable grading in ACES.

After setting your color management settings, you’ll want to enable your HDR scopes by flagging “Enable HDR Scopes for ST.2084” in the Color settings tab of the project settings.  This changes the scale on DaVinci’s integrated scopes from 10 bit digital values to a logarithmic brightness scale showing the output brightness of each pixel in nits.

How to Enable HDR Scopes for ST.2084 in DaVinci Resolve Studio 12.5+

DaVinci Resolve Studio scopes in standard digital scale, and in ST.2084 nits scale.

If you’re connected to your HDMI reference display, under Master Project Settings flag “Enable HDR Metadata over HDMI”, and under Color Management flag “HDR Mastering is for X nits” to trigger the HDR mode on your HDMI display.

How to enable HDR Metadata over HDMI to trigger HDR on consumer displays.

If you’re connected to a reference monitor over SDI, set the display’s color space to BT.2020 and its gamma curve to ST.2084 (and its Transform Matrix to BT.2020 or BT.709, depending on whether you’re using subsampling and what your output matrix is).

Settings for enabling SMPTE ST.2084 HDR on the Sony BVM-X300

That’s it for settings.  It’s really that simple.


Adjusting the Brightness Range

Now that we’ve got the project set up properly, we’re going to add the custom color management compensation that will allow the program’s mathematical engine to process changes in brightness and contrast in a way more conducive to grading in ST.2084.

The divergence of the PQ EOTF from a linear scale is pretty hefty, especially in the high values.  Internally, the mathematical engine operates on the linear digital values, with a slight weighting towards optimization for Gamma 2.4.  What we want to do is make the program respond more uniformly to the brightness levels (output values) of HDR, rather than to the digital values behind them (input values).

We’re going to do this by setting up a bezier curve that compresses the lights and expands the darks:

Bezier curve for expanding the darks and compressing the whites of ST.2084, for grading with natural movement between exposure values in HDR

For best effect, we need to add the curve to a node after the rest of the corrections, either as a serial node after other correctors on individual clips, on the timeline as a whole (timeline corrections are processed in serial, after clip corrections), or exported as a LUT and attached to the overall output.

Where to attach the HDR bezier curve for best HDR grading experience - serial to each clip, or serial to all clips by attaching it to the timeline.

So what effect does this have on alterations?  Look at the side by side effect of the same gain adjustment on the histogram with and without the custom curve in serial:

Animated GIF of brightness adjustments with and without the HDR Bezier Curve

Without the curves, the upper range of brightnesses race through the HDR brights.  This is, as you can imagine, very unnatural and difficult to control.  On the other hand, the curve forces the bright ranges to move more slowly, still increasing, but at a pace that’s more comparable to a linear adjustment of brightnesses, rather than a linear adjustment of digital values: exactly what we want.

NOTE: DaVinci Resolve Studio includes a feature called “HDR Mode”, accessible through the context menu on individual nodes, that in theory is supposed to accomplish a similar thing.  I’ve found it has really strange effects on Lift - Gamma - Gain that I can’t figure out how is supposed to help HDR grading: Gain races faster through the brights, Gamma is inverted and seems to compress the space, and so does Lift, but at different rates.  If you’ve figured out how to make these controls useful, let me know…

If you've figured out how to use HDR Mode in DaVinci Resolve Studio for HDR grading, let me know!

Once that curve’s in place, grading in HDR becomes pretty normal, in some ways even easier than grading for SDR.  But there are a few differences that need to be noted, and a couple more tricks that will get your images looking the best.  And the first one of these we’ll look at is the HDR frenemy, MaxFALL.


Grading with MaxFALL

If you read the last part in this HDR series about shooting for HDR, you’ll remember that MaxFALL was an important consideration when planning the full scene for HDR.  In color grading you’re likely going to discover why MaxFALL is such an important consideration: it can become frustratingly limiting to what you think you want to do.

Just a quick recap: MaxFALL is the maximum frame average light level permitted by the display.  We calculate each frame average light level by measuring the light level, in nits, of each pixel and taking the average across each individual frame.  The MaxFALL value is the maximum encoded within an HDR stream, or permitted by a display.  The MaxFALL permitted by your reference or target display is what we really need to think about with respect to color grading.

Without getting into the technical reasons behind the MaxFALL, you can imagine it as collapsing all of the color and brightness within a frame into a single, full frame, uniform grey screen, and the MaxFALL is how bright that grey (white) screen can be before the display would be damaged.  Every display has a MaxFALL value, and will hard-limit the overall brightness by dimming the overall image when you send it a signal that exceeds the MaxFALL.

Average Pixel Brightness with Accompanying Source Image

On the BVM-X300, you’ll notice the over range indicator turns on when you exceed the MaxFALL, so that when you look at the display, you can see when the display is limiting the brightness.  On consumer television displays, there is no such indicator, so if the dimming happens when you’re looking away from the screen, you’re likely to not notice the decreased range.  Use the professional reference when it’s available!

BVM-X300 Over Range Indicator showing MaxFALL Exceeded

Just like with CRT displays, the MaxFALL tends to be lower on reference displays than on consumer displays with the same peak brightness, since the size of the consumer displays often reduces the damage produced through the heat generated from the higher current, and the tolerable color deviation in consumer displays allows for lower color fidelities with higher MaxFALLs than a reference display.

So what do we do in grading that can be limited by the MaxFALL attribute?  Here are some scenarios that I’ve run into limitations with MaxFALL:

  1. Bright, sunny, outdoors scenes

  2. Large patches of blue skies

  3. Large patches of white clouds

  4. Large patches of clipped whites

  5. Large gradients into the brightest whites

When I first started grading in HDR, running into MaxFALL was infuriating.  You’re working at a nice clip, when suddenly, no matter how much I raise the brightness of the scene, it just never got brighter!  I didn’t understand initially, since I was looking at the scopes and I was well below the peak brightness available on my display, yet every time I added gain, the display bumped up, then suddenly dimmed down.

When MaxFALL is exceeded, the Over Range indicator lights up and the display brightness is notched down to prevent damage.

Now that I know what I was fighting against, it’s less infuriating, but still annoying.  In generally, I know that I need to keep the majority of the scene around 100-120 nits, and pull only a small amount into the superwhites of HDR.  When my light levels are shifting across frames, as in this grade with the fire breather, I’ll actually allow a few frames to exceed the display’s MaxFALL temporarily, so long as it’s very, very brief, so as not to damage the display when it temporarily averages brighter.

Grading with brief exceeding of target MaxFALL.

When I’m grading content that’s generally bright, with long sets of even brighter, such as this outdoor footage from India, it can be a good idea to keyframe an upper-brightness adjustment to drop the MaxFALL, basically dropping the peak white values as the clipped or white patch takes up more and more of the scene.  This can be visible, though, as a greying of the whites, so be careful.

Tilt-up Shot of Taj Mahal where brightness keyframes were required to limit MaxFALL. In an ideal world, no keyframes would have been necessary and the final frame would have been much brighter (as shot) than the first.

In other cases, it may be necessary to drop the overall frame brightness, to allow for additional peak brightness in a part of the frame, such as what happened with this shot of Notre Dame Cathedral, where I dropped the brightness of the sky, tree, and cathedral to less than what I wanted to allow the clouds to peak higher into the HDR white range.

Average brightness was limited so that more of the cloud details would push higher into the superwhites without exceeding MaxFALL

In some cases, you really have no choice but to darken the entire image and reduce the value of peak white, such as this shot of the backflip in front of the direct sun - the gradient created nearby the sun steps if I pull the center up to the peak white of the sun, while the MaxFALL is exceeded if I pull up the overall brightness of the image.

MaxFALL limited the white point to only 200 nits because of the quantity of the bright portion of the image and the softness of the gradient around the sun.

The last consideration with MaxFALL comes with editing across scenes, and is more important when maintaining consistency across a set of shots that should look like they’re in the same location.  You may have to decrease the peak white within the series of shots so that on no edit does the white suddenly appear grey, or rather, ‘less white’ than the shot before it.

Three shots with their possible peak brightnesses (due to MaxFALL limitations of the BVM-X300) vs the values I graded them at.

What do I mean by ‘less white’?  I mentioned it in Part 4: Shooting for HDR, but to briefly reiterate and reinforce:


In HDR grading, there’s no such thing as absolute white and black.


HDR Whites & Blacks

From a grading paradigm point of view, this may be the biggest technical shift: in HDR, there is no absolute white or absolute black.

Okay, well, that’s not entirely true, since there is a ‘lowest permitted digital code’ which is essentially the blackest value possible, and a ‘highest permitted digital code’ the can be called the peak brightness - essentially the whitest value possible within the system (encoded video + display).  However, in HDR, there is a range of whites available through the brights, and a range of blacks available through the darks.

Black and white have always been a construct in video systems, limited by the darkest and brightest values displays could produce.  There were the hard-coded limits of the digital and voltage values available.  In traditional SDR color grading, crushing to blacks was simply: push the darks below the lowest legal dark value, and you have black.  Same thing with whites - set the brightness to the highest legal value and that was the white that was available: anything less than that tends to look grey, especially in contrast with ‘true white’ or ‘legal white’.

But in the real world, there is a continuum that exists between blacks and whites.  With the exception of a black hole, there is nothing that is truly ‘black’, and no matter how bright an object is, there’s always something brighter, or whiter than it.

Of course, that’s not how we see the world - we see blacks and whites all around us.  Because of the way that the human visual system works, we perceive as blacks any part of a scene (that is, what is in front of our eyes) that is either very low in relative illumination and reflects all wavelengths of light relatively uniformly, or that is very low in relative illumination such that few of our cones are activated in our eyes and we therefore can’t perceive the ratio of wavelengths reflected with any degree of certainty.  Or, in other words, everything that is dark with little saturation, or so dark that we can’t see the saturation, we perceive as black.

The same thing is true with whites, but in reverse.  Everything illuminated or emitting brightness beyond a specific value, with little wavelength variance (or along the normal distribution of wavelengths) we see as white, or if things are so bright that we can’t differentiate between the colors reflected or emitted, we see it as white.

Why do I bring this up?  Because unlike in SDR video where there is a coded black and coded white, in HDR video, there are ranges of blacks and whites (and colors of blacks and whites), and as a colorist you have the opportunity to decide what level of whiteness and blackness you want to add to the image.

Typically, any area that’s clipped should be pushed as close as possible to the scene-relative white level where the camera.  Or, in other words, as high as possible in a scene with a very large range of exposure values, or significantly lower when the scene was overexposed and therefore clipped at a much lower relative ratio.

Clipping in an image with wide range of values and tones vs clipping in image with limited range of values and tones

Since this is different for every scene and every camera, it’s hard to recommend what that level should be.  I usually aim for the maximum value of the display or the highest level permitted by MaxFALL if my gradient into the white or size of the clipped region won’t permit it to be brighter.

So long as the light level is consistent across edits, the whites will look the same and be seen as white.  If, within a scene, you have to drop the peak brightness level of one shot because of MaxFALL or other considerations, it’s probably going to look best if you drop the brightness level of the whites across every other shot within that same scene.  In DaVinci, you can do this quickly by grouping your shots and applying a limiting corrector (in the Group Post-Clip, to maintain the fidelity of any shot-based individual corrections).

Sometimes you may actually want a greyer white, or a colored white that reads more blue or yellow, depending on the scene.  In fact, when nothing within the image is clipping and you don’t have other MaxFALL considerations, it’s very liberating to decide the absolute white level within an image.  Shots without any ‘white’ elements can still have colored brights at levels well above traditional white, which helps separate the relative levels within a scene in a way that could not be possible with traditional SDR video.

The only catch, and this is a catch, is that when you do an SDR cross conversion, some of that creativity can translate into gross looking off-whites, but if you plan specifically for it in your cross conversion to SDR, you should be able to pull it off in HDR without any issues.

Blacks have a similar tonal range available to them.  You have about 100 levels of black available below traditional SDR’s clipping point, and that in turn creates some fantastic creative opportunities.  Whole scenes can play out with the majority of values below 10 nits.  Some parts of the darks can be so dark that they appear uniform black, until you block out the brighter areas of the screen and suddenly find that you can see even deeper into the blacks.  Noise, especially chromatic noise, disappears more in these deep darks, making the image appear cleaner than it would in SDR.  All of these offer incredible creative opportunities when planning for production, and I discussed them in more detail in Part 4: Shooting for HDR.

So how do you play with these whites and blacks?

The two tools I use on a regular basis to adjust my HDR whites and blacks are the High and Low LOG adjustments within DaVinci.  These tools allow me to apply localized gamma corrections to specific parts of the image, that is, those above a specific value for the highs adjustment, and those below a specific value for the lows adjustment.

DaVinci Resolve Studio's LOG Adjustment Panel

In SDR video, I typically use LOG adjustments on the whites to extend contrast, or to adjust the color of the near-whites.  In HDR, I first adjust the “High Range” value to ‘bite’ the part of the image that I want, and then pull it towards the specific brightness value I’m looking for.  This often (but not always) involves pulling up a specific part of the whites (say, the highlights on the clouds) to a higher brightness value in the HDR range, for a localized contrast enhancement, though I do use it to adjust the peak brightness too.

Effect of LOG Adjustments on an HDR Image with Waveform. Notice the extended details in the clouds.

In SDR video, I’d typically use the low adjustment to pull down my blacks to ‘true black’, or to fix a color shift in the blacks I’d introduced with another correction (or the camera captured). In HDR, I use the same adjustment to bite a portion of the lows and extend them through the range of blacks, increasing the local contrast in the darks to make the details that are already there more visible.

The availability of the LOG toolset is one of the major reasons I have a preference for color grading in DaVinci, and what it lets you do quickly with HDR grading really helps speed up the process.  When it’s not available its functionality is difficult to emulate, with finesse, using tools such as curves or lift-gamma-gain.  Typically, I’ve found it generally requires a secondary corrector limited to a specific color range and then using a gamma adjustment, which is a very inelegant workaround, but one that works.


Futureproofing

Once the grade is nearly finalized, there’s a couple of things that you may consider doing to clean up the grade and make it ‘futureproof’, or, to make sure that things you do now don’t come back to haunt the grade later.

If you’ve been grading by eye, any value above the maximum brightness of your reference display will be invisible, clipped at the maximum display value.  If you’re only ever using the footage internally, and on that display only, don’t worry about making it future proof.  If, however, you’re intending on sharing that content with anyone else, or upgrading your display later, you’ll want to actually add the clip to your grade.

The reasoning here I think is pretty easy to see: if you don’t clip it your video signal, your master will contain information that you can’t actually see.  In the future, or on a different display with greater latitude, it may be visible.

There are a couple of ways of doing this.

One that’s available in DaVinci is to generate a soft-clip LUT in the Color Management of the project settings, setting the top clip value to the 10 bit digital value of your display’s maximum nits value (767, for instance for 1000 nits max brightness display using PQ space).  Once you generate the LUT, attach it to the output and you’ve got yourself a fix.

Generating a Soft Clipping LUT for ST.2084 at 1000 nits in DaVinci Resolve

Alternatively, you can adjust your roll off curve that we’re using for making uniform brightness adjustments so that it comes as close to limiting the maximum displayable value as you can get, by extending the bezier curve into a near flat line that lands at your target maximum

Bezier curve for HDR grading with flatter whites to minimize peak range

But sometimes you may want to leave those values there, so that when the next generation of brighter displays comes around, you may find a little more detail in the lights.  What’s really important here is that you make white white, and not accidentally off-white.

If you’re working with RAW footage that allows you to adjust the white balance later, you may find that where white ‘clipped’ on the sensor isn’t uniform in all three channels.  This can happen too with a grading correction that adjusts the color balance of the whites - you can end up with separate clips in the red, green, and blue channels that may be clipped an invisible on your display, but will show up in the future.

Waveform of clipped whites with separated RGB Channels. This is common with RAW grading with clipped whites at the sensor and the ability to control decoded color temperature.

The simple fix here is to add a serial node adjustment that selects, as a gradient, all values above a specific point, and desaturate the hell out of.  Be careful to limit your range to low saturation values only (so long as they encompass what you’re trying to hit) so that you don’t accidentally desaturate other more intentionally colorful parts of the image that just happen to be bright.

How to fix RGB separated clipped whites: add a serial node with a Hue/Saturation/Luminance restriction to just the whites and reduce their saturation to zero.

Working with Hybrid Log Gamma

Up to this point the grading techniques I’ve been discussing have been centered on grading in PQ space.  Grading in Hybrid Log Gamma is slightly different in a couple of important ways.

As a quick refresher, Hybrid Log Gamma is an HDR EOTF that intends to be partially backwards compatible with traditional gamma 2.4 video.  This is a benefit and a drawback when it comes to HDR grading.

If you have multiple reference displays available, this is an important time to break them out.  Ideally, one display should be set up in HLG with a system gamma of 1.2 (or whatever your target system gamma is), and the second should be set up in regular gamma 2.4.  That way, whatever grading you do you can see the effect immediately on both target systems.  Otherwise you’ll need to flip back and forth between two HDR and SDR modes on a single reference display in your search for ‘the happy medium’.

Grading HLG with two reference displays - one in HDR, one in SDR, to ensure the best possible contrast in both.

Most of the project and grading setup is identical to grading with the PQ EOTF, with the exception of the bezier curve in serial that adjusts the brightness response.  In HLG we don’t want to expand the darks, since the HLG darks are identical to the gamma 2.4 darks, so we want that part of the curve to be more linear, before easing into our compression of the highs.

Bezier curve for HDR grading in Hybrid Log Gamma. This curve replaces the ST.2084 Bezier curve added earlier.

Once that’s in place, the rest of the grading process is similar to grading in PQ.  In fact, you can replace the ST.2084 bezier curve with this curve and your grade should be nearly ready to go in HLG.  The major exception to this is that you still need to regularly be evaluating how the image looks in SDR, on a shot by shot basis.

The biggest complaint I have with grading in HLG is the relative contrast between the HDR and the SDR images.  Because HLG runs up to 5000 nits with its top digital values, if you’re grading in 1000 nits you end up with a white level in the SDR version below the usual peak white.  This often means that the whites in the SDR version look muddied and lower contrast than the same content graded for SDR natively.  This is especially true when the MaxFALL dictates a darker image is necessary and a lower white point is necessary, landing values solidly in the middle ranges of brightness.

Hybrid Log Gamma occasionally has much dimmer and muddied whites, when compared to SDR natively graded footage, due to MaxFALL limitations.

And as if muddied whites weren’t enough, it’s difficult in HLG to find a contrast curve that works for both the HDR and the SDR image: because of how our brains perceive contrast, when the contrast looks right and natural in HDR, it looks flat in SDR because of the more limited dynamic range, while when it looks right in SDR it looks overly contrasty in HDR.

Personally, I find grading in HLG to compounds the minor problems of HDR with the problems of SDR, which I find extremely irritating.  Rather than being happy with the grade, I’m often left with a sense of “It’s okay, I guess”.

But on the other hand, when it’s done, you won’t necessarily have to regrade for other target gamma systems, which is what you have to do when working in PQ.



Cross Converting HDR to HDR & HDR to SDR

Let’s be honest.  A PQ encoded image displayed in standard gamma 2.4 rendering looks disgusting.  The trouble is, we only really want to do the bulk of the grading once, so how can we cheat and make sure we don’t have to regrade every project two or more times?

LUTs, LUTs, and more LUTs!  Also, Dolby Vision.

Dolby Vision is an optional (paid to Dolby) add-in for DaVinci Resolve Studio that allows you to encode the metadata for the SDR cross conversion into your output files.  Essentially, the PQ HDR image is transmitted with metadata that describes how to transform the HDR into a properly graded SDR image.  It’s a nifty process that seeks to solve the dilemma of backwards compatibility.

But I’ve never used it, because we’ve had no need and I don’t have a license.  DaVinci Resolve’s documentation on how to use it with DaVinci is extensive though, and it requires a similar process to doing a standard SDR cross conversion, so take that as you will.  I’ve also heard rumors that some major industry players are looking for / looking to create a royalty-free dynamic metadata alternative that everyone can use as a global standard for transmitting this information - but that’s just a rumor.

For everyone not using Dolby Vision, you’re going to have to render the SDR versions separately from the HDR versions as separate video files.  Here at Mystery Box, we prefer to render the entire HDR sequence as set of clip-separated 12bit intermediate files to make the SDR grade from them, versus adding additional corrector elements to the HDR grade.  This tends to render faster, because you only render from the RAWs once, and make any other post-processing adjustments once instead of on every version.

NOTE: I’m going to cover the reason why later, but it’s important that you use a 12 bit intermediate if you want a 10 bit master, since the cross conversion from PQ to any other gamma system cuts the detail levels preserved by about 2-4 times, or an effective loss of 1-2 bits of information per channel.

When I’m cross converting from PQ in the BT.2020 space to gamma 2.4 in the BT.2020 space, after reimporting and reassembling the HDR sequence (and adding any logos or text as necessary), I’ll duplicate the HDR sequence and add a custom LUT to the timeline.

The fastest way to build this LUT is to use the built-in DaVinci Color Management (set the sequence gamma to ST.2084 and the output gamma to Gamma 2.4) or the HDR 1000 nits to Gamma 2.4 LUT, and then add a gain and gamma adjustment to bring the brightness range and contrast back to where you want it to be.  It’s a pretty good place to start building your own LUT on, and while these tools weren’t available when I started building my first cross conversion LUT, the process they use is nearly identical to what I did.

Using DaVinci Resolve Studio to handle HDR to SDR cross conversion

Using DaVinci Resolve Studio to handle HDR to SDR cross conversion

Once you’ve attached that correction to the timeline, it’s a pretty fast process to run through each shot and simply do minor brightness, contrast, white point, and black point adjustments - Using DaVinci’s built-in LUT / Color Management I can do a full SDR cross conversion for 5 minutes of footage in less than half an hour using this LUT method.  Using my own custom LUT this processes can take less than five minutes.

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 01

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 02

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 03

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 03

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 04

Notice the detail loss in the pinks, reds, and oranges because of over saturation in the simple downconversion process (images 01 and 04), the milkiness and hue shifting in the darks (images 02) and the fluorescence of the pinks and skin tones (images 03) with a straight downconversion.  This happens largely in the BT.2020 to BT.709 color space conversion, when colors land outside of the BT.709 gamut.  Building a custom LUT can be a great solution to retain the detail.

After prepping the BT.2020 version, making a BT.709 version, for web or demonstration purposes is incredibly easy.  All that you have to do is duplicate the BT.2020 sequence (this is why I like adding LUTs to timelines, instead of to the output globally) and add an additional LUT to the timeline that does a color space cross conversion from BT.2020 to BT.709.  (Alternatively, change the color management settings).  Since the BT.2020 and BT.709 contrast is the same, all I need to do then is run through the sequence looking for regions where reds, blues, or greens end up out of gamut, and bring those back in.  That’s usually less than 5 minutes for a 5 minute project.

Stacked LUTs on a Timeline to combine transformations.

Cross converting from HLG to PQ is fairly simple, since PQ encompasses a larger range of brightnesses than the HLG range and it can fairly easily be directly moved over with a simple LUT or color management tool; you may want to adjust your low-end blacks to take advantage of the deeper PQ space, but it’s otherwise straightforward.

Cross-grading from PQ to HLG is a different animal altogether.  It’s still faster to work from the intermediate than the RAWs themselves, but it’s more than just a simple LUT or color management solution.  Because of the special considerations for HLG, that its contrast has to look good in both HLG and gamma 2.4, you have a lot more work to do finessing the contrast then when you convert ST.2084 into gamma 2.4.  You’ll also run into issues with balancing the MaxFALL in HLG, which in some cases you’ll just have to ignore.

DaVinci’s built-in color management is actually quite good starting point for cross converting from HLG to PQ or PQ to HLG.  It’s important, though, to be aware of how color management injects metadata into QuickTime files, which I’ll address in a second, so that you don’t accidentally flag the incorrect color space or gamma in your master files.

Using DaVinci Color Management to apply an HLG to ST.2084 cross conversion.

Understanding how LUTs work to handle SDR cross conversions is really important, because until there’s a universal metadata method for including SDR grades with HDR content, which in and of itself would essentially be a version of a shot-by-shot LUT, display manufacturers and content delivery system creators rely on LUTs (or their mathematical equivalent) to convert your HDR content into something that can be shown on SDR displays!


Metadata & DaVinci’s Color Management

If you’re using color management to handle parts of your color space and gamma curve transformations, you’re going to need to adjust the Output Color Space each time you change sequences, to match the targeted space of that timeline (in addition to switching the settings on your reference display).  This is actually the biggest reason I prefer using LUTs over color management - it just becomes a hassle to continually have to reset the color management when I’m grading.

Even if you’re not using the color management to handle color space conversions, you’re going to need to make some changes to the color management settings when rendering out QuickTime masters, so that the correct metadata is included into the master files.

Proper Metadata Inclusion for BT.2020 / ST.2084 QuickTime File, encoded in ProRes 4444 out of DaVinci Resolve Studio.

The settings you use depend when you go to render will depend on whether you’re using color management for the transformation or not.  If you are using color management for the transform, change just the Output Color Space to match the target color space and gamma of the timeline to be rendered.  If you aren’t using color management to handle the color conversion, switch both the Timeline Color Space and the Output Color Space to match your target color space and gamma immediately before rendering the matching timeline.  Again, and unfortunately, you will need to make this adjustment every time you go to render a new sequence.  Sorry, no batch processing.

DaVinci Resolve Studio Color Management Settings for transforming color and adding metadata, and adding metadata only.

Grading in HDR isn’t as hard as it originally seems, once you figure out the tricks that allow the grading system to respond to your input as you would expect and predict.  And despite how different HDR is from traditional video, SDR and HDR cross conversions aren’t as hard as they seems, especially when you’re using prepared LUTs specifically designed for that process.


Mastering in HDR

When it comes to picking an appropriate master or intermediate codec for HDR video files, the simplest solution would always be to pick an uncompressed format with an appropriate per-channel bit depth.  Other than the massive file size considerations (especially when dealing with 4K+ video), there are a few cautions here.  

First, for most of the codecs available today that use chroma subsampling, the transfer matrix that converts from RGB to YCbCr is the BT.709 transfer matrix, and not the newer BT.2020 transfer matrix, which should be used with the BT.2020 color space.  This isn’t a problem per-se, and actually benefits out of date decoders that don’t honor the BT.2020 transfer matrix, even with the proper metadata.  It’s also possible to use the use the BT.2020 transfer matrix and improperly flag the matrix used when working with a transcoding application that requires manual flagging instead of metadata flagging.  At its very worst, it can create a very small amount of color shifting on decode.

As slightly more concerning consideration, however, is the availability of high quality 12+ bit codecs for use in intermediate files.  Obviously any codec using 8 bits / channel only are out of the question for HDR masters or intermediates, since 10 bits are required by all HDR standards.  10 bit encoding is completely fine for mastering space, and codecs like ProRes 422, DNxHR HQX/444, 10 bit DPX, or any of the many proprietary ‘uncompressed’ 10 bit formats you’ll find with most NLEs and color correction softwares should all work effectively.

However, if you’re considering which codecs to use as intermediates for HDR work, especially if you’re planning on an SDR down-grade from these intermediates, 12 bits per channel as a minimum is important.  I don’t want to get sidetracked into the math behind it, but just a straight cross conversion from PQ HDR into SDR loses about ½ bit of precision in data scaling, and another ¼ - ½ bit precision in redistributing the values to the gamma 2.4 curve, leaving a little more 1 bit of precision available for readjusting the contrast curve (these are not uniform values).  So, to end up with an error-free 10 bit master (say, for UHD broadcast) you need to encode 12 bits of precision into your HDR intermediate.

ProRes 4444 / 4444 (XQ), DNxHR 444, 12 bit DPX, Cineform RGB 12 bit, 16 bit TIFFs, or OpenEXR (Half Precision) are all suitable intermediate codecs,** though it’s important to double check all of your downstream applications to make sure that whichever you pick will work later.  Similarly, any of these codecs should be suitable for mastering, with the possibility of creating a cross converted grade from the master later.

I just want to note before anyone actually asks: intermediate and master files encapsulating HDR video are still reeditable after rendering - they can be assembled, cut, combined, etc just like regular video files.  You don’t need to be using an HDR display to do that either - they just look a little flatter on a regular display (except if you’re using HLG).  So long as you don’t pass them through a process that drops the precision of the encoded video, you should be fine to work with them in other applications as usual, though you may want to return to DaVinci to add the necessary metadata to whatever your final sequence ends up being.


Metadata

After you’ve made the master, it’s easy to assume you’re done.  But HDR specifications call for display referenced metadata during encoding of the final deliverable stream, so it’s actually important to record this metadata at the time of creation, if you aren’t handling the final encode yourself.  Unfortunately, currently none of the video file formats have a standardized place to record this metadata.

Your options are fairly limited; the simplest solution is to include a simple text file with a list of attribute:value pairs.

Text file containing necessary key : value pairs for an HDR master file that doesn't provide embedded metadata.

What metadata should you include?  It’s a good idea to include everything that you’d need to include in the standard VUI for HDR transmission:

  • Color Primaries

  • Transfer Matrix

  • Transfer Characteristics (for chroma subsampled video)

  • MaxCLL

  • MaxFALL

  • Master Display

When you’re creating distribution files, each of these values need to be properly set to flag a stream as HDR Video to the decoding display.  It’s possible to guess many of these (color space, transfer matrix, etc) if you’ve been provided with a master file without metadata, but it’s much easier to record and provide this metadata at the time of creation so that no matter how long down the line you come back to the master, none of the information is lost.


Distributing HDR

If you’ve made it this far through the HDR creation process, there should really only be one major question remaining: how do we encode HDR video in a way that consumers can see it?

First, the bad news.  There’s no standardization for HDR in digital cinema yet.  So if your intention is a theatrical HDR delivery, you’re probably need to work with Dolby.  At the moment, they’re the only ones with the actual installations that can display HDR, and they have specialists who will handle that step for you.  For most people, what we want to know is how to get an HDR capable television to display the video file properly.

This is where things get more tricky.

I don’t say that because it’s a necessarily complicated process, but only because there’s no ‘drop in’ solutions that are generally available to do it (other than YouTube, very soon).

There are only three codecs that can, by specification, actually be used for distributing HDR video, HEVC, VP9 and AV1 (AV1 is the successor to VP9), and within these only specific operational modes support HDR.  And of these three, the only real option at the moment is HEVC, simply because HDR televisions support hardware based 10 bit HEVC decoding - it’s the same hardware decoder needed for the video stream of UHD broadcasts.

HEVC encoding support is still rather limited, and finding an application with an encoder that supports all of the optional features needed to encode HDR is still difficult.  Adobe Media Encoder, for instance, supports 10 bit HEVC rendering, but doesn’t allow for the embedding of VUI metadata, which means that the file won’t trigger the right mode in the end-viewer’s televisions.

Unfortunately, there’s only one encoder freely available that gives you access to all of the options you need for HDR video encoder: x265 through FFmpeg.

If you’re not comfortable using FFmpeg through a command line, I seriously recommend downloading Hybrid (http://www.selur.de), which is one of the best, if not the best, FFmpeg frontend I’ve found.

Here are the settings that I typically use for encoding HEVC using FFmpeg for a file graded in SMPTE ST.2084 HDR using BT.2020 primaries on our BVM-X300, at a UHD resolution with a frame rate of 59.94fps:

Profile: Main 10
Tier: Main
Bit Depth: 10-bit
Encoding Mode: Average Bitrate (1-Pass)
Target Bitrate: 18,000 - 50,000 kbps
GOP: Closed
Primaries: BT.2020
Matrix: BT.2020nc
Transfer Characteristics: SMPTE ST.2084
MaxCLL: 1000 nits
MaxFALL: 180 nits
Master Display: G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)
Repeat Headers: True
Signaling: HRD, AUD, SEI Info

I’ve only listed the settings that I are different from the default x265 settings, so let me run through what they do, and why I use these values.

First, x265 needs to output a 10-bit stream in order to be compliant with UHD broadcast, SMPTE ST.2084, ARIB STD-B67 or HDR10 standards.  To trigger that mode, that I set the Profile to Main 10 and the Bit Depth to 10-bit.  Unless you’re setting a really high bit rate, or using 8K video, you shouldn’t need a higher Tier than Main.

Next, I target 18 - 50 mbps as an average bitrate, with a 1 pass encoding scheme.  If you can tolerate a little flexibility in the final bitrate, I prefer using this mode, simply because it balances render time with quality, without padding the final result.  If you need broadcast compliant UHD, you’ll need to drop the target bitrate from 18 to 15 mbps, to leave enough headroom on the 20 mbps available bandwidth for audio programs, closed captions, etc.

x265 Main Compression Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

However, I’ve found that 15mbps does introduce some artifacts, in most cases, when using high frame rates such as 50 or 60p.  18 seems to be about the most that many television decoders can handle seamlessly, though individual manufacturers vary and it does depend significantly on the content you’re transmitting.  Between 30 and 50 mbps you end up with a near-lossless encode, so if you happen to know the final display system can handle it, pushing the bitrate up can give you better results.  Above 50 mbps, there are no perceptual benefits to raising the bitrate.

A closed GOP is useful for random seeks and to minimize the amount of memory used by the decoder.  By default, x265 uses a GOP of at most 250 frames, so reference frames can end up being stored for quite some time when using an open GOP; it’s better just to keep it closed.

x265 Frames Compression Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

Next we add the necessary HDR metadata into the Video Usability Information (VUI).  This is the metadata required by HDR10, and records information about your mastering settings, including color space, which HDR EOTF you’re using, the MaxCLL of the encoded video, the MaxFALL of the encoded video (if you’ve kept your MaxFALL below your display’s peak, you can estimate this value using the display’s MaxFALL), and the SMPTE ST.2086 metadata that records the primaries, white point, and brightness range of the display itself.

x265 Video Usability Information Compression Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

This metadata is embedded into the headers of the video stream itself, so even f you change containers the information will still be there.  To make sure that the metadata is stored at regular intervals, and to enable smoother random access to the video stream, the last step is to turn on the option for repeating the headers and to include HRD, AUD, and SEI Info.

x265 Stream Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

The HEVC stream can be wrapped in either a .mp4 or a .ts container; both are valid MPEG containers and should work properly on HDR televisions.  Be aware that it can take a while to get your settings right on the encode; if you’re using Hybrid you may need to tweak some of the settings to get 10-bit HEVC to encode without crashing (I flag on “Prefer FFmpeg” and “Use gpu for decoding” to get it to run stable) - don’t leave testing to the last minute!


Grading, mastering, and delivering HDR are the last pieces you need to understand to create excellent quality HDR video.  We hope that the information in this guide to HDR video will help you to be confident in working in this new and exciting video format.

HDR Video is the future of video.  It’s time to get comfortable with it, because it’s not going anywhere.  The sooner you get on board with it and start working with the medium, the more prepared you’ll be for the forthcoming time when HDR video becomes the defacto video standard.

Written by Samuel Bilodeau, Head of Technology and Post Production


Endnotes


*The rationale behind the technical requirements will become clear over the course of the article.  I would recommend that you look at the documentation for the application you use to make sure it meets the same minimum technical requirements as DaVinci Resolve when grading in HDR.  Most major color grading programs meet most or all of these technical criteria, and it’s always better to grade in the program you know than in the program you don’t.


However, if you are looking to pick a program right off the bat, I’d recommend DaVinci Resolve Studio, primarily since you can learn on regular Resolve level to learn the application and toolset before even having to spend a dime.


** You should always test that these codecs actually perform as expected with HDR in your workflow, even if you’ve used them for other applications in the past.  I’ve run into an issue where certain applications decode the codecs in different ways that have little effect in SDR, but create larger shifts and stepping in HDR.

HDR Video Part 3: HDR Video Terms Explained

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 3: HDR Video Terms Explained.

In HDR Video Part 1 we explored what HDR video is, and what makes it different from traditional video.  In Part 2, we looked at the hardware you need to view HDR video in a professional environment.  Since every new technology comes with a new set of vocabulary, here in Part 3, we’re going to look at all of the new terms that you’ll need to know when working with HDR video.  These fall into three main categories: key terms, standards, and metadata.


Key Terms

HDR / HDR Video - High Dynamic Range Video - Any video signal or recording using one of the new transfer functions (PQ or HLG) to capture, transmit, or display a dynamic range greater than the traditional CRT gamma or BT.1886 Gamma 2.4 transfer functions at 100-120 nits reference.

The term can also be used as a compatibility indicator, to describe any camera capable of capturing and recording a signal this way, or a display that either exhibits the extended dynamic range natively or is capable of automatically detecting an HDR video signal and renormalizing the footage for its more limited or traditional range.


SDR / SDR Video - Standard Dynamic Range Video - Any video signal or recording using the traditional transfer functions to capture, transmit, or display a dynamic range limited to the traditional CRT gamma or BT.2886 Gamma 2.4 transfer functions at 100-120 nits reference. SDR video is fully compatible with all pre-existing video technologies.


nit - A unit of brightness density, or luminance. It’s the colloquial term for the SI units of candelas per square meter (1 nit = 1 cd/m2). It directly converts with the United States customary unit of foot-lamberts (1 fl = 1 cd/foot2), with 1 fl = 3.426 nits = 3.426 cd/m2.

Note that the peak nits / foot-lamberts value of a projector is often lower than that of a display, even in HDR video: because a projected image covers more area and the image is viewed in a darker environment than consumer’s homes, the same psychological and physiological responses exist at lower light levels.

For instance, a typical digital cinema screen will have a maximum brightness of 14fl or 48 cd/m2 vs. the display average of 80-120nits for reference and 300 for LCDs and Plasmas in the home. HDR cinema actual light output ranges in theaters are adjusted accordingly, since 1000 cd/m2 on a theater’s 30 foot screen is perceived to be far brighter than on a 65” flat screen.


EOTF - Electro-Optical Transfer Function - A mathematical equation or set of instructions that translate voltages or digital values into brightness values. It is the opposite of the Optical-Electro Transfer Function, or OETF, that defines how to translate brightness levels into voltages or digital values.

Traditionally, the OETF and EOTF were incidental to the behavior of the cathode ray tube, which could be approximated by a 0-1 exponential curve with a power value (gamma) of 2.4. Now they are defined values like ‘Linear”, “Gamma 2.4” or any of the various LOG formats. OETFs are used at the acquisition end of the video pipeline (by the camera) to convert brightness values into voltages/digital values, and EOTFs are used by displays to translate voltages/digital values into brightness values for each pixel.


PQ - Perceptual Quantization - Name of the EOTF curve developed by Dolby and standardized in SMPTE ST.2084, designed to allocate bits as efficiently as possible with respect to how the human vision perceives changes in light levels.

Perceptual Quantization (PQ) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

Dolby’s tests established the Barten Threshold (also called the Barten Limit or the Barten Ramp), the point at what the difference in light levels between two values does that difference become visible.

PQ is designed that when operating at 12 bits per channel, the stepping between single digital values is always below the Barten threshold, for the whole range from 0.0001 to 10,000 nits, without being so far below that threshold that the resolution between bits is wasted. At 10 bits per channel, the PQ function is just slightly above the Barten threshold, where in some (idealized) circumstances stepping may be visible, but in most cases should be unnoticeable.

Barten Thesholds for 10 bit and 12 bit Rec. 1886 and PQ curves. Source

For comparison, current log formats waste bits on the low end (making them suitable for acquisition to preserve details in the darks, but not transmission and exhibition), while the current standard gamma functions waste bits on the high end, while creating stepping in the darks.

HDR systems using PQ curves are not directly backwards compatible with standard dynamic range video.


HLG - Hybrid Log Gamma - A competing EOTF curve to PQ / SMPTE ST.2084 designed by the BBC and NHK to preserve a small amount of backwards compatibility.

Hybrid Log Gamma (HLG) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

HLG vs. SDR gamma curve with and without knees.  Source

HLG vs. SDR gamma curve with and without knees. Source

On this curve, the first 50% of the curve follows the output light levels of standard Gamma 2.4, while the top 50% steeply diverges along a log curve, covering the brightness range from about 100 to 5000 nits. As with PQ, 10 bits per channel is the minimum permitted.

HLG does not expand the range of the darks like PQ curve, and as an unfortunate side effect of the backwards compatibility coupled with the max-fall necessitated by the technology of HDR displays, whites can appear grey, when viewed in standard gamma 2.4, especially when compared to footage natively graded in gamma 2.4.


Standards

SMPTE ST. 2084 - First official standardization of HDR video transfer function by a standardization body, and is at the moment (October 2016), the most widely implemented. SMPTE ST.2084 officially defines the PQ EOTF curve for translating a set of 10 bit, or 12 bit per channel digital values into a brightness range of 0.0001 to 10,000 nits. SMPTE ST.2084 provides the basis for HDR 10 Media Profile and Dolby Vision implementation standards.

This is the transfer function to select in HEVC encoding to signal a PQ HDR curve.


ARIB STD-B67 - Standardized implementation of Hybrid Log Gamma by the Association of Radio Industries and Businesses. Defines the use of the HLG curve, with 10 or 12 bits per channel color and the same color primaries as BT.2020 color space.

This is the transfer function to select in HEVC encoding to signal an HLG HDR curve.


ITU-T BT.2100 - ITU-T Recommendation BT.2100 - ITU-T’s standardization of HDR for television broadcast. Ratified in 2016, this document is the HDR equivalent of ITU-T Recommendation BT.2020 (Rec.2020 / BT.2020). When compared with BT.2020, BT.2100 includes the FHD (1920x1080) frame size in addition to the UHD and FUHD, and defines two acceptable transfer functions (PQ and HLG) for HDR broadcast, instead of the single transfer function (BT.1886 equivalent) found in BT.2020.

BT.2100 uses the same color primaries and the same RGB to YCbCr signal format transform as BT.2020, and includes similar permissions of 10 or 12 bits per channel as BT.2020, although BT.2100 also permits full range code values in 10 or 12 bits where BT.2020 is limited only to traditional legal.

BT.2100 also includes considerations for a chroma subsampling methodology based on the LMS color space (human visual system tristimulus values), called ICTCP, and a transform for ‘gamma weighting’ (in the sense of the PQ and HLG equivalent of gamma weighting) the LMS response as L’M’S’.


HDR 10 Media Profile - The Consumer Technologies Association (CTA)’s official HDR video standard for use in HDR Televisions. HDR 10 requires the use of the SMPTE ST.2084 EOTF, BT.2020 color space, 10 bits per channel, 4.2.0 chroma subsampling, and the inclusion of SMPTE ST.2086 and associated MaxCLL and MaxFALL metadata values.

HDR 10 Media Profile defines the signal televisions can decode for the inclusion of “HDR compatibility” term in the marketing of televisions.

Note that “HDR compatibility” does not necessarily define the ability to display in the higher dynamic range, simply to the compatibility to decode and renormalize footage in the HDR 10 specification for whatever the dynamic range and color space of the display happen to be.


Dolby Vision - Dolby’s proprietary implementation of the PQ curve, for theatrical setups and home devices. Dolby Vision supports both the BT.2020 and the DCI-P3 color space, at 10 and 12 bits per channel, for home and theater, respectively.

The distinguishing feature of Dolby Vision is the inclusion of shot-by-shot transform metadata that adapts the PQ graded footage into a limited range gamma 2.4 or gamma 2.6 output for SDR displays and projectors. The colorist grades the film in the target HDR space, and then runs a second adaptation pass to adapt the HDR grade into SDR, and the transform is saved into the rendered HDR output files as metadata. This allows for a level of backwards compatibility with HDR transmitted footage, while still being able to make the most of the SDR and the HDR ranges.

Because Dolby Vision is a proprietary format, it requires a license issued by Dolby and the use of qualified hardware, which at the moment (October 2016) is only the Dolby PRM-4220, the Sony BVM-X300, or the Canon DP-V2420 displays


Metadata

MaxCLL Metadata - Maximum Content Light Level - An integer metadata value defining the maximum light level, in nits, of any single pixel within an encoded HDR video stream or file. MaxCLL should be measured during or after mastering. However if you keep your color grade within the MaxCLL of your display’s HDR range, and add a hard clip for the light levels beyond your display’s maximum value, you can use your display’s maximum CLL as your metadata MaxCLL value.


MaxFALL Metadata - Maximum Frame Average Light Level - An integer metadata value defining the maximum average light level, in nits, for any single frame within an encoded HDR video stream or file. MaxFALL is calculated by averaging the decoded brightness values of all pixels within each frame (that is, converting the digital value of each frame into its corresponding nits value, and averaging all of the nits values within each frame).

MaxFALL is an important value to consider in mastering and color grading, and is usually lower than the MaxCLL value. The two values combined define how bright any individual pixel within a frame can be, and how bright the frame as a whole can be.

Displays are limited differently on both of those values, though typically only the peak (single pixel) brightness of a display is reported. As pixels get brighter and approach their peak output, they draw more power and heat up. With current technology levels, no display can push all of its pixels into the maximum HDR brightness level at the same time - the power draw would be extremely high, and the heat generated would severely damage the display.

As a result, displays will abruptly notch down the overall image brightness when the frame average brightness exceeds the rated MaxFALL, to keep the image under the safe average brightness level, regardless of what the peak brightness of the display or encoded image stream may be.

For example, while the BVM-X300 has a peak value of 1000 nits for any given pixel (MaxCLL = 1000), on average, the frame brightness cannot exceed about 180 nits (MaxFALL = 180). The MaxCLL and MaxFALL metadata included in the HDR 10 media profile allows consumer displays to adjust the entire stream’s brightness to match their own display limits.


SMPTE ST.2086 Metadata - Metadata Information about the display used to grade the HDR content. SMPTE ST.2086 includes information on six values: the three RGB primaries used, the white point used, and the display maximum and minimum light levels.

The RGB primaries and the white point values are recorded as ½ of their (X,Y) values from the CIE XYZ 1931 chromaticity standard, and expressed as the integer portion of the the first five significant digits, without a decimal place. Or, in other words:

f(XPrimary) = 100,000 × XPrimary ÷ 2

f(YPrimary) = 100,000 × YPrimary ÷ 2.

For example, the (X,Y) value of DCI-P3’s ‘red’ primary is (0.68, 0.32) in CIE XYZ; in SMPTE ST.2086 terms it’s recorded as

R(34000,16000)

because

for R(0.68,0.32):

f(XR) = 100,000 × 0.68 ÷ 2 = 34,000

f(YR) = 100,000 × 0.32 ÷ 2 = 16,000

Maximum and minimum luminance values are recorded as nits × 10,000, so that they too end up as positive integers. For instance, a display like the Sony BVM-X300 with a range from 0.0001 to 1000 nits would record its luminance as

L(10000000,1)

The full ST.2086 Metadata is ordered Green, Blue, Red, White Point, Luminance with the values as

G(XG,YG)B(XB,YB)R(XR,YR)WP(XWP,YWP)L(max,min)

all strung together, and without spaces. For instance, the ST.2086 for a DCI-P3 display with a maximum luminance of 1000 nits, a minimum of 0.0001 nit would be, and using white point D65 would be:

G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)

while a display like the Sony BVM-X300, using BT.2020 primaries, with a white point of D65 and the same max and min brightness would be:

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)

In an ideal situation, it would be best to use a colorimeter and measure the display’s native R-G-B and white point values; however, in all practicality the RGB and white point values the display conforms to that was used in mastering, are sufficient in communicating information about the mastery to the end unit display.


That should be a good overview of the new terms that HDR video has (so far) introduced into the extended video technologies vocabulary, and are a good starting point for diving deeper into learning about and using HDR video on your own, at the professional level.

In Part 4 of our series we’re going to take the theory of HDR video and start talking about the practice, and look specifically about how to shoot with HDR in mind.

Written by Samuel Bilodeau, Head of Technology and Post Production


HDR Video Part 1: What is HDR Video?

It’s October 2016, and here at Mystery Box we’ve been working in HDR video for a little over a year.

While it’s easier today to find out information about the new standard than it was when I first started reading the research last year, it’s still not always clear what it is and how it works.  So, to kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 1: What is HDR Video?

HDR video is as much of a revolution and leap forward as the jump from analog standard definition, to digital 4K.

Or, to put it far less clinically, it’s mind-blowingly, awesomesauce, revolutionarily, incredible!  If it doesn’t get you excited, I’m not sure why you’re reading this…

So what is it about HDR video that makes it so special, so much better than what we’ve been doing?  That’s what we’re going to dive into here.


HDR Video vs. HDR Photography

If you’re a camera guy or a even an image guy, you’re probably familiar with HDR photography.  And if you’re thinking “okay, what’s the big deal, we’ve had HDR for years”, think again.  HDR video is completely unrelated to HDR photography, except for the ‘higher dynamic range’ part.

In general, any high dynamic range technique seeks to capture or display more levels of brightness within a scene, that is, increase the overall dynamic range.  It’s kind of a ‘duh’ statement, but let’s go with it.

In photography, this usually means using multiple exposures at different exposure values (EVs), and blending the results into a single final image.  The catch, of course, has always been that regardless of how many stops of light you capture with your camera or HDR technique, you’re still limited by the same 256 levels of brightness offered by 8 bit JPEG compression and computer/television displays, or the slightly bigger, but still limited set of tonality offered by inks for print.

So, most HDR photography relies on creating regions of local contrast throughout the image, blending in the different exposure levels to preserve the details in the darks and the lights:

Photograph with standard contrast vs. the same with local contrast

While the results are often beautiful, they are, at their core, unnatural or surreal.

HDR Video is Completely Different

Instead of trying to compress the natural dynamic range of a scene into a very limited dynamic range for display, HDR video expands the dynamic range of the display itself by increasing the average and peak display brightnesses (measured in nits), and by increasing the overall image bit depth from 8 bit to at least 10 bits / channel, or from 255 brightness levels & 16 million colors, to at least 1024 brightness levels & 1.02 billion colors.

Standard Video / Photography Range vs. HDR Photography vs. HDR Video Ranges

The change of the display light level allows for extended ranges of tonalities through the darks and the lights, so that the final displayed image itself is a more natural rendering of a scene, one that’s able to match the overall dynamic range of today’s digital cinema and film-sourced cameras. And perhaps more importantly, when fully implemented, HDR video will almost completely match the dynamic range of the human eye itself.

How big of a deal is it?  I can’t describe it better than my younger brother did the first time I showed him HDR video:

 

“I want to say that it’s like you’re looking through a window into another world, except that when you look through a window, it’s not as crisp, or as clean, or as clear as this”.

 

First Impressions to HDR Video

First Impressions to HDR Video


How did we get here?

So if HDR video is so much better than what we’ve been using so far, why haven’t we been using it all along?

And now, for a history lesson (it’s interesting; but it’s not essential to know, so skip down if you don’t care).

Cathode Ray Tubes as scientific apparatus and ‘display’ devices have been around in some form or another since the late 1880s, but the first CRT camera wasn’t invented until the late  1920s.  Early cameras were big with low resolutions; televisions were grainy, noisy, and low fidelity.

Things changed quickly in the early years of television. As more companies jumped on board the CRT television bandwagon, each created slightly different, and incompatible, television systems in an effort to avoid patent rights infringement.  These different systems, with different signal types, meant that home television sets had to match the cameras used by the broadcaster, i.e., they had to be the made by the same company.  As a result, the first broadcaster in an area created a local monopoly for the equipment manufacturer they sourced their first cameras from, and consumers had no choice.

Foreseeing a large problem when more people started buying televisions sets, and more broadcasters wanted to enter an area, the United States government stepped in and said that the diversity of systems wouldn’t fly - all television broadcasts and television sets had to be compatible.  To that end they created a new governing body, the National Television System Committee, or NTSC, which went on to define the first national television standard in 1941.

We’ve had to deal with the outcomes of standardization, good and bad, ever since.

The good, obviously, has been that we don’t have to buy a different television for every channel we want to watch, or every part of the country we want to live in (though transnationals are often still out of luck).  The bad is that every evolution of the standard since 1941 has required backwards compatibility: today’s digital broadcast standards, and computer display standards too, are still limited in part by what CRTs could do in the 1940s and 50s.

Don’t believe me?  Even ignoring the NTSC 1/1.001 frame rate modifier, there’s still a heavy influence: let’s look at the list:

  1. Color Space: The YIQ color space for NTSC and the YUV color space used in both PAL and SECAM are both based on the colors that can be produced by the short glow phosphors, which coat the inside of CRT screens and form the light and color producing element of the CRT.  In the transition to digital, YIQ and YUV formed the basis for Rec. 601 color space (SD Digital), which in turn is the basis for Rec. 709 (HD Digital) color space (Rec. 709 uses almost the same primaries as Rec. 601).

    And just in case your computer feels left out, the same color primaries are used in the sRGB display standard too, because all of these color spaces were display referenced, and they were all built on the same CRT technology.  Because up until the early 2000s, CRTs were THE way of displaying images electronically - LCDs were low contrast, plasma displays were expensive, and neither LEDs nor DLPs had come into their own.
     

  2. Transfer Function: The transfer function (also called the gamma curve) used in SD and HD is also based on the CRT’s natural light-to-electrical and electrical-to-light response.  The CRT camera captured images with a light-to-voltage response curve of approximately gamma 1/2.2, while the CRT display recreated images with a voltage-to-light response curve of approximately gamma 2.4.  Together, these values formed the standard approximate system gamma of 1.2, and form the basis for the current reference display gamma standard of 2.4, found in ITU-T Recommendation BT.1886.
     

  3. Brightness Limits: Lastly, and probably most frustratingly, color accurate CRT displays require limited brightness to maintain their color accuracy. Depending on the actual phosphors used for primaries, that max-brightness value typically lands in the 80-120 nits range.  And consumer CRT displays, while bigger, brighter, and less color accurate, still only land in the 200 nit max brightness levels.  For comparison, the brightness levels found on different outdoor surfaces during a sunny day land in the 5000-14,000 range (or more!).

    This large brightness disparity between reference and consumer display levels has been accentuated in recent years with the replacement of CRTs with LCD, Plasma and OLED displays, which can easily push 300-500 nits peak brightness.  Those brightness levels skew the overall look of images graded at reference, while being very intolerant of changes in ambient light conditions.  In short this means that with the current standards, consumers rarely have the opportunity to see content in their homes as filmmakers intended.

So, because of the legacy cathode ray tube, (a dead technology), we’re stuck with a set of legacy standards that limit how we can deliver images to consumers.  But because CRTs are a dead technology, we now have an opportunity where we can choose to either be shackled by the 1950s for the rest of time, or, to say “enough is enough,” and use something better.  Something forward thinking.  Something our current technology can’t even match 100% yet.  Something like, HDR video.


The HDR Way

At the moment, there two different categories and multiple standards covering HDR video, including CTA’s HDR 10 Media Profile, Dolby’s Dolby Vision, and the BBC’s Hybrid Log Gamma.  And naturally, they all do things just a little differently.  I’ll cover their differences in depth in Part 3: HDR Video Terms Explained, but for now I’m going to lump them all together and just focus on the common aspects of all HDR video, and what makes it different than video from the past.

There are four main things that are required to call something HDR video: ITU-T Recommendation BT.2020 or DCI-P3 color space, a high dynamic range transfer function, 10 bits per channel transmission and display values, and transmitted metadata.

BT.709, DCI-P3, and BT.2020 on CIE XYZ 1931

1. Color Space: For the most part, HDR video is seen by many as an extension of the existing BT.2020 UHD/FUHD and DCI specifications, and as such uses either the wider BT.2020 color gamut (BT.2020 is the 4K/8K replacement for BT.709/Rec.709 HD broadcast standards), or the more limited, but still wide, DCI-P3 gamut.

BT.2020 uses pure wavelength primaries, instead primary values based on the light emissions of CRT phosphors or any material.  The catch is, of course, we can’t fully show these in a desktop display (yet), and only the most recent laser projectors can cover the whole color range. But ultimately, the breadth of the color space covers as many of the visible colors as is possible with three real primaries*, and includes all color values already available in Rec.709/sRGB and DCI-P3, as well as 100% of Adobe RGB and most printer spaces available with today’s pigments and dyes.

2. Transfer Function: Where HDR video diverges from standard BT.2020 and DCI specs is in its light-level-to-digital-value and digital-value-to-light-level relationship, called the OETF and EOTF respectively.  I’m going to go into more depth on OETFs and EOTFs at another time, but for now what we need to know is that the current relationship between light levels and digital values is a legacy of the cathode ray tube days, and approximates gamma 2.4.  Under this system, full white digital value of 235 translates to a light output of between 80-120nits.

Extending this same curve into a higher dynamic range output proves problematic because of the non-linear response of the human eye: it would either cause severe stepping in the darks and lights, or it would require 14-16 bits per channel while wasting digital values in increments that can’t actually be seen.  And it still wouldn’t be backwards compatible, in which case, what’s the point?

So instead, HDR video uses one of two new transfer curves: the BBC’s Hybrid Log Gamma (HLG), standardized in ARIB STD-B67, which allows for output brightness levels from 0.01 nit up to around 5000 nits, and Dolby’s Perceptual Quantization (PQ) curve, standardized in SMPTE ST.2084, which allows for output brightness levels from 0.0001 nit up to 10,000 nits.

PQ is the result of direct research done by Dolby to measure the response of the human eye, and to create a curve where no value is wasted with no visible stepping between values.  The advantage of PQ is pretty clear, in terms of maximizing future output brightness (the best experimental single displays currently max out at 4000 nits; Dolby’s test apparatus ranged from 0.004 to 20,000 nits) and increasing the amount of detail captured in the darks.

HLG, on the other hand, provides a degree of backwards compatibility, matching the output levels of gamma 2.4 for the first 50% of the curve, and reserving the top 50% of the values to the higher light level output.  Generally, HLG content with a system gamma of 1.2 looks pretty close to standard dynamic range content, though it’s whites sometimes end up compressed and greyer than content mastered in SDR to begin with.

Footage graded in Rec. 709 and the same graded in HLG.

(Side note: I prefer grading in SMPTE ST.2084 because of the extended dynamic range through the blacks, and smoother roll-into the whites.)
 

3. Bit Depth: The new transfer curves accentuate a problem that’s been with video since the switch from analog to digital values: stepping.  As displays have gotten brighter, the difference between two code values (say, digital value of 25 and 26) is sometimes enough that we can see a clear distinguishing line between the two greys.  This is especially true when using a display whose maximum brightness is greater than reference standard, and is more common in the blacks than in the whites.

Both the BT.2020 and DCI standards already have requirements to decrease stepping by switching signal encoding and transmission from 8 bits per channel to 10 bits minimum (12 bits for DCI), allowing for at least a 4 times smoother gradient.  However, BT.2020 still permits 8 bit rendering at the display, which is what you’ll find on the vast majority of televisions and reference displays on the market today.

On the other hand, HDR video goes one step further and requires 10 bit rendering at the display panel itself; that is, each color sub pixel must be capable of between 876 and 1024 distinguishable light levels, in all operational brightness and contrast modes.

The reason that HDR requires a 10 bit panel while BT.2020 doesn’t, is that our eyes are more susceptible to stepping in the value of a color or gradient than to stepping in its hue or saturation: the eye can easily make up for lower color fidelity (8 bits per channel in BT.2020 space) by filling in the gaps, but with an HDR curve the jump in light levels between two codes in 8 bits per channel is big enough that it’s clearly noticeable.

Comparison between gradients step sizes at 8 bit, 10 bit, and 12 bit precisions (contrast emphasized)

4. Metadata: The last thing that HDR video requires that standard BT.2020 doesn’t, is metadata.  All forms of HDR video should include information about both the content and the mastering environment.  This includes which EOTF was used in the grade, the maximum and frame average brightnesses of the content and display, and which RGB primaries were used.  Dolby Vision even includes metadata to define, shot by shot, how to translate the HDR values into the SDR range!

Consumer display manufacturers use this information to adapt content for their screens in real time, knowing when to clip or compress the highlights and darks (based on the capability of the screen it’s being shown on), and for the automatic selection of operational mode (switching from Rec. 709 to BT.2020, and in and out of HDR mode, without the end user ever having to change a setting).

 

So, in summary, what does HDR video do differently?  Wider color gamuts, new transfer function curves to allow for a much larger range of brightnesses, 10 bits per channel minimum requirement at the display to minimize stepping, and the transmission of metadata to communicate information about the content and its mastering environment to the end user.

All of which are essential, none of which are completely backwards compatible.


Yes, but what does it look like?

Unfortunately, the only way to really show you what HDR looks like is to tell you to go to a trade show or post house with footage to show, or buy a TV with HDR capabilities and stream some actual HDR content.  Because when you show HDR content on a normal display, it does not look right:

Images in SMPTE ST.2084 HDR Video formats do not appear normal when directly brought into Rec. 709 or sRGB Gamma 2.4 systems

You can get a little bit of a feel for it if I cut the brightness levels of a standard dynamic range image by half, and put it side-by-side with one that more closely follows the HDR range of brightnesses:

Normalized & Scaled SMPTE ST.2084 HDR Video vs Rec. 709 with Brightness Scaled

But that doesn’t capture what HDR video actually does.  I don’t quite know how to describe it - it’s powerful, beautiful, clear, real, present and multidimensional.  There’s an actual physiological and psychological response to the image that you don’t get with standard dynamic range footage - not simply an emotional response to the quality of the image, but the higher brightness levels actually trigger things in your eyes and brain that let you literally see it differently than anything you’ve seen before.

And once you start using it on a regular basis, nothing else seems quite as satisfactory, no other image quite as beautiful.  You end up with a feeling that everything else is just a little bit inadequate.  That’s why HDR will very rapidly become the new normal of future video.


So that's it for Part 1: What is HDR Video?  In Part 2 of our series on HDR video, we’re going to cover what you need to grade in HDR, and how can you cheat a bit to get a feel for the format by emulating its response curve on your existing reference hardware.

Written by Samuel Bilodeau, Head of Technology and Post Production


Endnotes:

* While ACES does cover the entire visible color spectrum, it’s primary RGB values are imaginary, which means that while it can code for all possible colors, there’s no way of building a piece of technology that actually uses the ACES RGB values as its primary display colors.  Or in other words, if you were to try and display ACES full value RED, you couldn’t, because that color doesn’t exist.