Multi-Space Color Correction: The New Paradigm

Our last technical blog post talked about color management, including considerations for maintaining color accuracy through the post process by keeping displays & projectors calibrated and understanding how each application manages the colors you see.  We alluded to the fact that at face value it can seem pretty complicated to manage color between all of the different kinds of camera technologies and technical standards, especially when dealing with multiple delivery specifications.

With this post we want to go into more detail about the issue of color grading for multiple standards, and how the new paradigm for color correction simplifies the process and keeps your grades more futureproof.  We’re going to examine a few things today.  First, how do computers process colors; second, where the problems we’re trying to fix come from; third, what the solutions are; and lastly, how to implement the solutions and why.


Color Engines

In previous posts (here, here, and here) we’ve mentioned the importance of using a color space agnostic color engine when doing specific color correction tasks - something like DaVinci Resolve Studio.  As a quick review, a color space agnostic engine is like a glorified R-G-B or R-G-B-Y calculator: It takes a specific set of decoded R-G-B or R-G-B-Y data, applies a specific set of mathematical operations based on the user input, and outputs the new R-G-B or R-G-B-Y data.

Agnostic color engines don’t care about what the data is supposed to mean, instead simply producing the results of an operation.  It’s up to the user to know if the results are right, or if the operation has put values out of spec or create unwanted distortions.  This is a double edged tool, since it places far more importance on user understanding to get things right, while being powerful enough to apply its corrections to any combination of custom situations.

As an example of how an agnostic engine works, let’s look at three of the simplest color correction operations: lift, gamma, and gain, operating strictly on the brightness (Y) component of the image.

Lift operates essentially as a global addition value: add or subtract a specific amount to each pixel’s value.  Because of the way that traditional EOTFs work and our human perception of brightness changes, lift tends to have the greatest effect on the darks: quickly raising or lowering the blacks while having a much more reduced effect on the mids and lights.

Gain operates essentially as a global multiplication value: multiply or divide the value of each pixel by a specific amount.  Since the operation essentially affects all tones within the image evenly, all parts of the image see a similar increase or decrease in brightness, though once again because of the EOTF considerations it has the greatest effect in the brights.

Gamma operates as an exponential value adjustment, affecting the linearity of values between the brights and the darks.  Lowering the gamma value has the effect of pulling more of the middle values towards the darks, while raising the gamma value has the effect of pushing the middle values brighter.  Once again, it still affects the brights and the darks, but at a much lower rate.

Notice that these operations don’t take into account what the data is supposed to mean.  And with new HDR EOTFs, especially with the Perceptual Quantization EOTF, you may find extreme changes across the image with very small values, which is why I recommend adding a roll-off curve as the last adjustment to your HDR grading workflow.

The combination of lift, gamma, and gain allow the colorist to adjust the overall brightness and contrast of the image, with fairly fine granularity and image control.

Compare these functions of an agnostic engine to their equivalents in a color space dependant engine.  In a color space dependent engine you’re more likely to find only two adjustments for controlling brightness and contrast: brightness and contrast.

The same color transformation operation has different effects on the image. Here, I've applied the same hue adjustment curve is applied to eight different color spaces and the effects on the vec

The brightness and contrast controls tend to be far more color space dependent, since they’re designed to affect the image in a way that more evenly affects the brightness or contrast along the expected EOTFs.  For the end user, this works a far simpler and often faster approach for minor corrections, at the expense of power, precision, and adaptability.  Which hasn’t been too bad of a trade off, so long as all digital video data operated in the same color space.

But adding support for new color spaces and EOTFs to a brightness and contrast operation requires rewriting the rules for how each of the new color spaces behave as digital RGB values.  That takes time to get right, and is oftimes not done at all.  Meaning that color space dependant engines tend to adapt more slowly to the emerging standards, and there’s no clear path for how to implement the upgrades.

Every color engine, whether we’re talking about a computer application or a chip found in a camera or display, makes assumptions about how to interpret the operations it's instructed to do.  Where the engines lie on the scale from fully color managed to completely color agnostic defines how the operations work, and what effect the ‘same’ color transformation has on the image.

The overall point here is that the same color transformations applied to different color spaces have different effects on the end image.  A hue rotation will accomplish something completely different in Rec. 709 than it will in Rec. 2020; standard gain affects HDR curves in ways that are somewhat unpredictable when compared with SDR curves.  Color engines can either try to compensate for this, or simply assume the user knows what he or she is doing.  And the more assumptions any single operation within an engine makes about the data, the more pronounced the differences if it’s applied to another color space.  These seemingly small differences can create massive problems in today’s color correction and color management workflows.


Understanding the Problem

With that background in mind, let’s explore where these problems come from.

Here’s something that may come as a shock if you haven’t dived into color management before: every camera ‘sees’ the world differently.  We’re not just talking about the effects of color temperature of light or the effects of the lenses (though those are important to keep in mind), but we’re talking about the camera sensors themselves.  All things being equal, different makes and models of camera will ‘see’ the same scene with different RGB values at the sensor.  In inexpensive cameras you may even see variation between individual cameras of the same make and model.

This isn’t a problem, it’s just how cameras work.  Variations in manufacturing process, decisions about which microfilters, microlenses, and OLPF to use, and the design of the sensor circuitry all play a part in changing the raw values the sensor sees.  But to keep things consistent, these unique camera RGB color values are almost always conformed to an output color standard using the camera’s image processor (or by the computer’s RAW interpreter) before you see them.

In the past, all video cameras conformed to analog video color spaces: NTSC/SMPTE-C, PAL, etc., and their early digital successors conformed to the digital equivalent standards: first Rec. 601 and then Rec. 709.

When it comes to conforming camera primaries to standard primaries, manufacturers had two choices: apply the look effects before or after the conforming step.  If you apply color transformations before the conforming step, you often have more information available for the change.  But by conforming first to the common color space, color correction operations would behave the same way between different camera makes and models.  

Most camera manufacturers took a hybrid approach, applying some transformations like gain and white balance before the conforming step, and then applying look effects after the conforming step.  And everything was golden, until the advent of digital cinema cameras.

Digital cinema class cameras started edging out film as the medium of choice for high quality television and feature film production a decade ago, and now vastly outnumber the quantity of film-first productions.  And here’s where we run into trouble.  Because digital cinema uses a different color space than digital video: DCI-P3.  Oh, and recently the video broadcast standards shifted to a much wider color space, Rec. 2020, to shake off the limiting shackles of the cathode ray tube.

Color space selection suddenly became an important part of the camera selection and workflow process, one that few people talked about.  Right from the get go the highest end cameras started offering multiple spaces that you could shoot and conform to, one of which was usually camera RGB.  But changing the conforming space means that any corrections or effects added to the image after conforming behaved differently than they did before, and many user generated looks would be color space dependent.

To fix this, many (but not all) digital cinema camera manufactures moved the ‘look’ elements of their color processing to before the conforming step.  This way, regardless of which color space you, as the operator, chooses any looks you apply will have the same effect on the final image.

Which is fine in camera, and fine through post production.  Unless your color correcting platform doesn’t understand what the primary color values mean, or if it can’t directly transform the values into your working space.  Then you need to create and add additional conforming elements as correction layers, which can increase the computational complexity and reduce the overall image quality.

Oh, and if you start working with multiple cameras with different look settings available, you can get into trouble almost instantly, since there isn’t usually a simple way of conforming all of them to your working space if it’s not Rec. 709.

Oh, and you may have to deliver to all of the different color spaces: Rec. 2020 for 4K television broadcast, DCI-P3 for your digital cinema delivery, Rec. 709 for HD Blu-ray and traditional broadcast, and Rec. 601 for DVDs.  And for sanitiy’s sake, let’s add HDR.

And don’t break the bank.


The Solution

What if there was a way to make sure that a) all of your looks would move simply between cameras, regardless of make and manufacturer, and b) you could color grade once and deliver in all formats simply, without needing to manage multiple grades?

There is a way to do it: create a new RGB color space that encompasses all possible color values, and do all of your color corrections there.

Here’s the block diagram:

Camera RGB -> Very Wide Gamut Working Space (Log or Linear) -> Color Correction / Looks -> Tone Map to Standard Space

By mapping all of the camera sensor values into the same log or linear space, with very wide RGB color primaries1, you can make sure that you have access to all of the image data captured by every camera and that all operations will have the same effect on all images.

But what do I mean by a “very wide gamut RGB space”?  There are two types of gamuts I’m talking about here, both of which have advantages and disadvantages.

The first kind is a color space with virtual RGB primaries:  the RGB color primaries land outside of the CIE 1931 color colorimetry diagram.  Remember that CIE 1931 maps the combinations of various wavelengths of light and the perceivable colors they produce onto an X-Y coordinate plane, and that any color space requires at least three primary color vertices on this chart.  But since the chart is bigger than all mapped colors, you can put these values outside of the actual set of real colors.

By putting the values outside of the visible color ranges you’re defining ‘red’, ‘green’, and ‘blue’ values that simply don’t and can’t exist. But they end up being quite useful, because when you map your primaries this way you can define an RGB color space that includes up to all possible color values.  Yes, you could simply use CIE XYZ values to map all colors, but all of the math needed for color manipulations would need to be redefined and rebuilt from the ground up (and it always requires at least 16 bits precision).  But an RGB space with virtual primaries allows you to use standard RGB math, while maintaining as many colors as possible.

Comparison of eight very wide gamut color spaces, five with virtual primaries, three with real (or mostly real) primaries.

Examples of color spaces using virtual RGB primaries include the color space defined by the Academy of Motion Picture Arts & Sciences, ACES AP0, and many manufacturer specific spaces like Sony S-Gamut3 / S-Gamut3.cine, ARRI Log C Wide Gamut, Canon Cinema Gamut, and the new RedWideGamutRGB found in RED’s IPP2.

The catch with these virtual primaries is that many operations that you as a colorist may be accustomed to doing won’t behave exactly the same way.  The reason being is that the RGB balance as it relates to hues and saturations doesn’t quite apply the same way.  Without getting mired in the details, the effects of these operations are related to the relative shape of the triangle produced by the RGB color primaries, and the color space triangles using all virtual primaries tend to be more dissimilar with the traditional RGB color spaces than the RGB spaces themselves.

So instead some wide gamut formats use all real, or mostly all real primaries to somewhat match the shape (i.e. color correction feel) of working with the smaller color gamuts.  A couple of examples here are Rec. 2020 (called Wide Gamut on 4K televisions), Adobe Wide Gamut, and ACES AP1.  While not covering all possible color values, these spaces cover very large portions of the visible color gamut, making them very useful for color correction working spaces.

Whichever very wide color space you choose to work in is up to you and your needs.  If your company or workflow requires ACES, use ACES.   If you’re only using one type of camera, such as a RED Weapon or an ARRI Alexa, you may find it beneficial to work in as specific manufacturer’s RGB space.

For most of the work we do here at Mystery Box that’s destined for anything other than web, I typically conform everything to Rec. 2020 and do my coloring and mastering in that space.  There are a couple of reasons for this:

  1. As a defined color spaces it uses real, pure wavelength primaries. Meaning that so long as only 3 color primaries are used for image reproduction, it’s about as wide as we’ll ever go.

  2. It encompasses 100% of Rec. 709 / sRGB and 99.98% of DCI-P3 (only losing a tiny amount of the reds)

  3. It encompasses 99.9% of Pointer’s gamut, a gamut that maps all real-world reflectable colors (not perceivable colors, just those found in the real world) onto the CIE XYZ gamut - essentially every color producible through the subtractive primaries.

  4. While it behaves differently than DCI-P3 and Rec. 709, they all behave fairly similarly so the learning curve is low.

  5. It requires fewer tone mapping corrections for the final output.

Whether these reasons are convincing enough for you or not is up to you.  Personally, I don’t find the 0.02% of DCI-P3 it doesn’t cover to actually matter, nor do the set of greens and blue-greens it doesn’t store (and no three color system can produce). These differences are so small that only in the absolute best side-by-sides in a lab could you hope to see a difference.

Whatever you do choose to use as a working space, it’s worth investing the time to pick one and stick with it.  Since the grading transformations do behave differently in the different color spaces, it’s easiest to pick one and refine your technique there to get the best possible results.


Conversions Implementation

Looking at the generalized workflow block diagram you’ll want to consider how to implement the different conversion steps for your own productions, in order to maintain the highest quality image pipeline with the lowest time and resource costs.  So let’s go into the two main places that you need to make new choices in the pipeline, and how to plan for them.

Conforming Camera RGB to Very Wide Gamut Working Space

Moving from Camera RGB to a very wide gamut space is a slightly different process for each camera system, and can depend on whether you’re capturing RAW data or a compressed video image.

When you’re using RAW formats, you’ll manage this step in the color correction or DIT software, which is the prefered workflow when image quality is paramount.  If you’re recording to non-raw intermediate format, like ProRes, DNxHR, or H.264 or any other flavor of MPEG video, you’ll need to select camera settings that best match your target wide gamut space.

Most RAW formats ignore camera looks applied by the operator and store the color decisions as metadata, but most video formats don’t.  Once again, camera settings vary, so it’s important to look at your specific system and run tests to find out where looks are applied in your camera’s image pipeline, and whether you can add a separate look on the video outputs for on-set monitoring while capturing a flattened LOG or linear image.

If your camera can’t separate the looks applied to the video files and the video output, and you want to capture a flat image but need to see it normalized on set, loading LUTs into on-set monitors is the ideal choice for image monitoring.  The process of creating and applying monitoring LUTs varies with your workflow, but we often find ourselves using a two-step process that uses Lattice to generate color space conversion LUTs, which we bring into DaVinci Resolve Studio to add creative looks and generate the final monitoring LUT.

Some cameras or DIT applications export their look settings as CDLs, LUTs, or other metadata for you to use later in the grading process, which you can then apply in post as the starting point for the grading process.  Again, workflows vary.

Generally you’ll want to move directly from camera RGB into the working space to preserve as much sensor information as possible.  That implies that you need to decide what your working space will be before capture (ACES AP0 or Rec. 2020 are recommended for future broadest compatibility), though it’s sometimes not an option.  While RAWs maintain camera primaries and allow you to jump directly into a wide working space later, if you’re forced to conform to standardized RGB for video intermediates you'll need to make that decision as early as possible.  In that case, put them into the widest color space available by the camera, whether that’s Rec. 2020, DCI-P3, or the manufacturer’s proprietary Wide Gamut space.

If RAWs aren’t an option, using a 12 bit log format video is your next best choice.  10 bit is fine too, but you won’t get as clean of corrections later, and may see some banding in fine gradients.  Anything less than 10 bits per channel create severe problems when color grading and really should only be used as a last resort.  When recording using an 8 bit format, you should only use a standard SDR EOTF (never LOG) - LOG with only 8 bits of precision can create MASSIVE amounts of banding.

To summarize: To maintain the highest image quality with the smallest resource pain, use RAW formats when possible, convert to the working or widest color space if you have to record as video files, and use LUTs on display outputs to avoid baking camera looks into the video data.

Tone Mapping the Working Space to Output

Moving from a wide working space to a final deliverable space is generally relatively simple process: simply convert each color value from the working space to the color value equivalent of the target space, and discard any data that lands outside of the target range.  In most Rec. 2020 -> DCI-P3, or Rec. 2020 -> Rec. 709 conversions, this is completely fine.  You may find minor clipping in a few of the most saturated colors, but overall you shouldn’t see too many places where the color is so bad you can’t live with it.

Where you do run into problems is when you’ve graded using an HDR transfer function and are moving into SDR.  A straight translation here results in very, very large amounts of clipping.  I haven’t mentioned EOTFs much yet, simply because most color engines where you’ll be doing wide gamut work use linear internals, since that tends to offer the most dynamic range and manipulation potential.

However, displays rarely offer a linear EOTF and so you’ll have to be monitoring in some transfer function or another.  Display monitoring is another reason I typically grade in BT.2020 (and usually in HDR), since displays need to be set to a specific color space and EOTF.  Which means that if you’re using a very wide working space, you must apply a tone map to your monitoring output, regardless of whether you’re grading in HDR or SDR (especially when you’re working in linear light).

The first series we published here on the blog about HDR video included a section on “Grading Mastering, and Delivering HDR”, where we presented a few bezier curves you can apply as the last element in your node structure for HDR grading in PQ or HLG.  These bezier curves are essentially luminance tone maps, converting the linear light values into the specific range of digital values you use for HDR.

A full tone map typically includes considerations for converting color information as well.  Just like the bezier curves control the roll off of the lights into your target range, tone maps roll off color values between color spaces, to minimize the amount of hard clipping.  Here its important to exercise caution and experiment with your specific needs before selecting a tone map, since this step can create hue or saturation shifts you don’t expect.

Tone mapping is the golden goose of simplifying multi-space color corrections.  It’s what brings everything together by making it possible to very, very quickly move from your working space into your delivery space.

If you grade in ACES AP0 or AP1, the tone maps are already prepared for your conversions.  Simply apply the tone map for the target system and voila, the conversion’s ready for rendering, preserving all (or rather most - they aren’t quite perfect) of the authorial intent of the grade.  We did this on our Yellowstone video to generate the HDR master.2

Grading in other wide color spaces often requires custom tone maps, or on-the-fly maps generated by a program such as DaVinci Resolve Studio.  RED Digital Cinema, for instance, has produced LUT based tone maps for converting their RedWideGamutRGB Log3G10 footage into various HDR and SDR color spaces.  The entire Dolby Vision format is essentially a shot by shot set of tone maps for various screen brightnesses.

Or, you may find yourself doing what we’ve done - spend time to create your own tone mapped LUTs for converting HDR and SDR of various formats, and refining these maps for each individual piece of HDR content so that you end up with the optimal SDR look for that work.


Why Bother?

Wide color space corrections and tone mapping for various output systems is the way that color correction will be treated and handled in the future.  With the arrival of BT. 2020 and HDR transforms, in just the last few years the number of delivery color encoding formats has increased three fold at the very least.  The only way to ensure your content will be compatible in the future is to adopt the new paradigm and multi-space coloring workflow.

DaVinci Resolve Studio’s latest update (release of version 14) saw a significant overhaul of the color management engine in the last few beta versions to optimize the core functionality for this kind of color management workflow.  If you’re using DaVinci Color Management or ACES color management in the latest version, DaVinci will automatically select the optimal RAW interpretation of your footage and conform it to your working space, removing the ambiguity of how to interpret your footage and maintaining the maximum image quality.

Another manufacturer who’s natively implemented a similar color pipeline is RED, with their new IPP2 color workflow.  They’ve moved all of their in-camera looks and apply them to the image data after the sensor RGB is converted into RedWideGamutRGB, tone mapping all outputs to your monitoring space.  With that they now allow you to select whether color adjustments in camera are burned into the ProRes files, or simply attached as a LUT or CDL.  This way, regardless of what your monitoring or eventual mastering space is, the color changes you make in camera will have the exact same effect across the board.

This is the workflow of the future.  Like with HDR, which we can assume will be the EOTF of the future, the efficiencies and simplicities of this particular workflow are so great that the sooner you get on board with it the better your position will be in a few years time.  Grading in ACES AP0 offers a level of future proofing that not even BT.2020 provides.  While BT.2020 still exceeds what current technologies can really do, ACES AP0 ensures that regardless of where color science heads in the future (4+ color primaries?), your footage will already be common format that’s simple to convert to the new standard, preserving all color data.

While there is a learning curve to this workflow, at a technical level it’s simpler to learn and apply than even understanding how HDR video works.  Yes, it takes some getting used to, but it’s worth learning.  Because in the end, you’ll find better quality than you can otherwise hope for.

Written by Samuel Bilodeau, Head of Technology and Post Production
 

Footnotes:

1 Yes, I’m making up the term “Very Wide Gamut” or “Very Wide Gamut RGB” simply because “Wide Gamut” and “Wide Gamut RGB” can refer to many different specific spaces, depending on the circumstances. Here I’m referring to any of these typical wide gamut spaces, or any space that covers a very large portion of the perceivable gamut.

2 A caveate note about ACES tone mapping: We used ACES AP0 with an ACEScc EOTF for our Yellowstone video. The tone mapping into HDR was fantastic and allowed me to skip my own range limiting map, and the ability to select different input transforms for each shot was fantastic. However, ACES failed when trying to generate an SDR version of the film: instead of tone mapping the higher dynamic range into the smaller SDR range, it clipped at the limits of SDR. This limitation makes me hesitant about recommending ACES for mixed-dynamic range work. It works wonderful for one or the other, but don't expect it to tone map directly between the two.

HDR Video Part 3: HDR Video Terms Explained

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 3: HDR Video Terms Explained.

In HDR Video Part 1 we explored what HDR video is, and what makes it different from traditional video.  In Part 2, we looked at the hardware you need to view HDR video in a professional environment.  Since every new technology comes with a new set of vocabulary, here in Part 3, we’re going to look at all of the new terms that you’ll need to know when working with HDR video.  These fall into three main categories: key terms, standards, and metadata.


Key Terms

HDR / HDR Video - High Dynamic Range Video - Any video signal or recording using one of the new transfer functions (PQ or HLG) to capture, transmit, or display a dynamic range greater than the traditional CRT gamma or BT.1886 Gamma 2.4 transfer functions at 100-120 nits reference.

The term can also be used as a compatibility indicator, to describe any camera capable of capturing and recording a signal this way, or a display that either exhibits the extended dynamic range natively or is capable of automatically detecting an HDR video signal and renormalizing the footage for its more limited or traditional range.


SDR / SDR Video - Standard Dynamic Range Video - Any video signal or recording using the traditional transfer functions to capture, transmit, or display a dynamic range limited to the traditional CRT gamma or BT.2886 Gamma 2.4 transfer functions at 100-120 nits reference. SDR video is fully compatible with all pre-existing video technologies.


nit - A unit of brightness density, or luminance. It’s the colloquial term for the SI units of candelas per square meter (1 nit = 1 cd/m2). It directly converts with the United States customary unit of foot-lamberts (1 fl = 1 cd/foot2), with 1 fl = 3.426 nits = 3.426 cd/m2.

Note that the peak nits / foot-lamberts value of a projector is often lower than that of a display, even in HDR video: because a projected image covers more area and the image is viewed in a darker environment than consumer’s homes, the same psychological and physiological responses exist at lower light levels.

For instance, a typical digital cinema screen will have a maximum brightness of 14fl or 48 cd/m2 vs. the display average of 80-120nits for reference and 300 for LCDs and Plasmas in the home. HDR cinema actual light output ranges in theaters are adjusted accordingly, since 1000 cd/m2 on a theater’s 30 foot screen is perceived to be far brighter than on a 65” flat screen.


EOTF - Electro-Optical Transfer Function - A mathematical equation or set of instructions that translate voltages or digital values into brightness values. It is the opposite of the Optical-Electro Transfer Function, or OETF, that defines how to translate brightness levels into voltages or digital values.

Traditionally, the OETF and EOTF were incidental to the behavior of the cathode ray tube, which could be approximated by a 0-1 exponential curve with a power value (gamma) of 2.4. Now they are defined values like ‘Linear”, “Gamma 2.4” or any of the various LOG formats. OETFs are used at the acquisition end of the video pipeline (by the camera) to convert brightness values into voltages/digital values, and EOTFs are used by displays to translate voltages/digital values into brightness values for each pixel.


PQ - Perceptual Quantization - Name of the EOTF curve developed by Dolby and standardized in SMPTE ST.2084, designed to allocate bits as efficiently as possible with respect to how the human vision perceives changes in light levels.

Perceptual Quantization (PQ) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

Dolby’s tests established the Barten Threshold (also called the Barten Limit or the Barten Ramp), the point at what the difference in light levels between two values does that difference become visible.

PQ is designed that when operating at 12 bits per channel, the stepping between single digital values is always below the Barten threshold, for the whole range from 0.0001 to 10,000 nits, without being so far below that threshold that the resolution between bits is wasted. At 10 bits per channel, the PQ function is just slightly above the Barten threshold, where in some (idealized) circumstances stepping may be visible, but in most cases should be unnoticeable.

Barten Thesholds for 10 bit and 12 bit Rec. 1886 and PQ curves. Source

For comparison, current log formats waste bits on the low end (making them suitable for acquisition to preserve details in the darks, but not transmission and exhibition), while the current standard gamma functions waste bits on the high end, while creating stepping in the darks.

HDR systems using PQ curves are not directly backwards compatible with standard dynamic range video.


HLG - Hybrid Log Gamma - A competing EOTF curve to PQ / SMPTE ST.2084 designed by the BBC and NHK to preserve a small amount of backwards compatibility.

Hybrid Log Gamma (HLG) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

HLG vs. SDR gamma curve with and without knees.  Source

HLG vs. SDR gamma curve with and without knees. Source

On this curve, the first 50% of the curve follows the output light levels of standard Gamma 2.4, while the top 50% steeply diverges along a log curve, covering the brightness range from about 100 to 5000 nits. As with PQ, 10 bits per channel is the minimum permitted.

HLG does not expand the range of the darks like PQ curve, and as an unfortunate side effect of the backwards compatibility coupled with the max-fall necessitated by the technology of HDR displays, whites can appear grey, when viewed in standard gamma 2.4, especially when compared to footage natively graded in gamma 2.4.


Standards

SMPTE ST. 2084 - First official standardization of HDR video transfer function by a standardization body, and is at the moment (October 2016), the most widely implemented. SMPTE ST.2084 officially defines the PQ EOTF curve for translating a set of 10 bit, or 12 bit per channel digital values into a brightness range of 0.0001 to 10,000 nits. SMPTE ST.2084 provides the basis for HDR 10 Media Profile and Dolby Vision implementation standards.

This is the transfer function to select in HEVC encoding to signal a PQ HDR curve.


ARIB STD-B67 - Standardized implementation of Hybrid Log Gamma by the Association of Radio Industries and Businesses. Defines the use of the HLG curve, with 10 or 12 bits per channel color and the same color primaries as BT.2020 color space.

This is the transfer function to select in HEVC encoding to signal an HLG HDR curve.


ITU-T BT.2100 - ITU-T Recommendation BT.2100 - ITU-T’s standardization of HDR for television broadcast. Ratified in 2016, this document is the HDR equivalent of ITU-T Recommendation BT.2020 (Rec.2020 / BT.2020). When compared with BT.2020, BT.2100 includes the FHD (1920x1080) frame size in addition to the UHD and FUHD, and defines two acceptable transfer functions (PQ and HLG) for HDR broadcast, instead of the single transfer function (BT.1886 equivalent) found in BT.2020.

BT.2100 uses the same color primaries and the same RGB to YCbCr signal format transform as BT.2020, and includes similar permissions of 10 or 12 bits per channel as BT.2020, although BT.2100 also permits full range code values in 10 or 12 bits where BT.2020 is limited only to traditional legal.

BT.2100 also includes considerations for a chroma subsampling methodology based on the LMS color space (human visual system tristimulus values), called ICTCP, and a transform for ‘gamma weighting’ (in the sense of the PQ and HLG equivalent of gamma weighting) the LMS response as L’M’S’.


HDR 10 Media Profile - The Consumer Technologies Association (CTA)’s official HDR video standard for use in HDR Televisions. HDR 10 requires the use of the SMPTE ST.2084 EOTF, BT.2020 color space, 10 bits per channel, 4.2.0 chroma subsampling, and the inclusion of SMPTE ST.2086 and associated MaxCLL and MaxFALL metadata values.

HDR 10 Media Profile defines the signal televisions can decode for the inclusion of “HDR compatibility” term in the marketing of televisions.

Note that “HDR compatibility” does not necessarily define the ability to display in the higher dynamic range, simply to the compatibility to decode and renormalize footage in the HDR 10 specification for whatever the dynamic range and color space of the display happen to be.


Dolby Vision - Dolby’s proprietary implementation of the PQ curve, for theatrical setups and home devices. Dolby Vision supports both the BT.2020 and the DCI-P3 color space, at 10 and 12 bits per channel, for home and theater, respectively.

The distinguishing feature of Dolby Vision is the inclusion of shot-by-shot transform metadata that adapts the PQ graded footage into a limited range gamma 2.4 or gamma 2.6 output for SDR displays and projectors. The colorist grades the film in the target HDR space, and then runs a second adaptation pass to adapt the HDR grade into SDR, and the transform is saved into the rendered HDR output files as metadata. This allows for a level of backwards compatibility with HDR transmitted footage, while still being able to make the most of the SDR and the HDR ranges.

Because Dolby Vision is a proprietary format, it requires a license issued by Dolby and the use of qualified hardware, which at the moment (October 2016) is only the Dolby PRM-4220, the Sony BVM-X300, or the Canon DP-V2420 displays


Metadata

MaxCLL Metadata - Maximum Content Light Level - An integer metadata value defining the maximum light level, in nits, of any single pixel within an encoded HDR video stream or file. MaxCLL should be measured during or after mastering. However if you keep your color grade within the MaxCLL of your display’s HDR range, and add a hard clip for the light levels beyond your display’s maximum value, you can use your display’s maximum CLL as your metadata MaxCLL value.


MaxFALL Metadata - Maximum Frame Average Light Level - An integer metadata value defining the maximum average light level, in nits, for any single frame within an encoded HDR video stream or file. MaxFALL is calculated by averaging the decoded brightness values of all pixels within each frame (that is, converting the digital value of each frame into its corresponding nits value, and averaging all of the nits values within each frame).

MaxFALL is an important value to consider in mastering and color grading, and is usually lower than the MaxCLL value. The two values combined define how bright any individual pixel within a frame can be, and how bright the frame as a whole can be.

Displays are limited differently on both of those values, though typically only the peak (single pixel) brightness of a display is reported. As pixels get brighter and approach their peak output, they draw more power and heat up. With current technology levels, no display can push all of its pixels into the maximum HDR brightness level at the same time - the power draw would be extremely high, and the heat generated would severely damage the display.

As a result, displays will abruptly notch down the overall image brightness when the frame average brightness exceeds the rated MaxFALL, to keep the image under the safe average brightness level, regardless of what the peak brightness of the display or encoded image stream may be.

For example, while the BVM-X300 has a peak value of 1000 nits for any given pixel (MaxCLL = 1000), on average, the frame brightness cannot exceed about 180 nits (MaxFALL = 180). The MaxCLL and MaxFALL metadata included in the HDR 10 media profile allows consumer displays to adjust the entire stream’s brightness to match their own display limits.


SMPTE ST.2086 Metadata - Metadata Information about the display used to grade the HDR content. SMPTE ST.2086 includes information on six values: the three RGB primaries used, the white point used, and the display maximum and minimum light levels.

The RGB primaries and the white point values are recorded as ½ of their (X,Y) values from the CIE XYZ 1931 chromaticity standard, and expressed as the integer portion of the the first five significant digits, without a decimal place. Or, in other words:

f(XPrimary) = 100,000 × XPrimary ÷ 2

f(YPrimary) = 100,000 × YPrimary ÷ 2.

For example, the (X,Y) value of DCI-P3’s ‘red’ primary is (0.68, 0.32) in CIE XYZ; in SMPTE ST.2086 terms it’s recorded as

R(34000,16000)

because

for R(0.68,0.32):

f(XR) = 100,000 × 0.68 ÷ 2 = 34,000

f(YR) = 100,000 × 0.32 ÷ 2 = 16,000

Maximum and minimum luminance values are recorded as nits × 10,000, so that they too end up as positive integers. For instance, a display like the Sony BVM-X300 with a range from 0.0001 to 1000 nits would record its luminance as

L(10000000,1)

The full ST.2086 Metadata is ordered Green, Blue, Red, White Point, Luminance with the values as

G(XG,YG)B(XB,YB)R(XR,YR)WP(XWP,YWP)L(max,min)

all strung together, and without spaces. For instance, the ST.2086 for a DCI-P3 display with a maximum luminance of 1000 nits, a minimum of 0.0001 nit would be, and using white point D65 would be:

G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)

while a display like the Sony BVM-X300, using BT.2020 primaries, with a white point of D65 and the same max and min brightness would be:

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)

In an ideal situation, it would be best to use a colorimeter and measure the display’s native R-G-B and white point values; however, in all practicality the RGB and white point values the display conforms to that was used in mastering, are sufficient in communicating information about the mastery to the end unit display.


That should be a good overview of the new terms that HDR video has (so far) introduced into the extended video technologies vocabulary, and are a good starting point for diving deeper into learning about and using HDR video on your own, at the professional level.

In Part 4 of our series we’re going to take the theory of HDR video and start talking about the practice, and look specifically about how to shoot with HDR in mind.

Written by Samuel Bilodeau, Head of Technology and Post Production