HDR Video Part 3: HDR Video Terms Explained

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 3: HDR Video Terms Explained.

In HDR Video Part 1 we explored what HDR video is, and what makes it different from traditional video.  In Part 2, we looked at the hardware you need to view HDR video in a professional environment.  Since every new technology comes with a new set of vocabulary, here in Part 3, we’re going to look at all of the new terms that you’ll need to know when working with HDR video.  These fall into three main categories: key terms, standards, and metadata.


Key Terms

HDR / HDR Video - High Dynamic Range Video - Any video signal or recording using one of the new transfer functions (PQ or HLG) to capture, transmit, or display a dynamic range greater than the traditional CRT gamma or BT.1886 Gamma 2.4 transfer functions at 100-120 nits reference.

The term can also be used as a compatibility indicator, to describe any camera capable of capturing and recording a signal this way, or a display that either exhibits the extended dynamic range natively or is capable of automatically detecting an HDR video signal and renormalizing the footage for its more limited or traditional range.


SDR / SDR Video - Standard Dynamic Range Video - Any video signal or recording using the traditional transfer functions to capture, transmit, or display a dynamic range limited to the traditional CRT gamma or BT.2886 Gamma 2.4 transfer functions at 100-120 nits reference. SDR video is fully compatible with all pre-existing video technologies.


nit - A unit of brightness density, or luminance. It’s the colloquial term for the SI units of candelas per square meter (1 nit = 1 cd/m2). It directly converts with the United States customary unit of foot-lamberts (1 fl = 1 cd/foot2), with 1 fl = 3.426 nits = 3.426 cd/m2.

Note that the peak nits / foot-lamberts value of a projector is often lower than that of a display, even in HDR video: because a projected image covers more area and the image is viewed in a darker environment than consumer’s homes, the same psychological and physiological responses exist at lower light levels.

For instance, a typical digital cinema screen will have a maximum brightness of 14fl or 48 cd/m2 vs. the display average of 80-120nits for reference and 300 for LCDs and Plasmas in the home. HDR cinema actual light output ranges in theaters are adjusted accordingly, since 1000 cd/m2 on a theater’s 30 foot screen is perceived to be far brighter than on a 65” flat screen.


EOTF - Electro-Optical Transfer Function - A mathematical equation or set of instructions that translate voltages or digital values into brightness values. It is the opposite of the Optical-Electro Transfer Function, or OETF, that defines how to translate brightness levels into voltages or digital values.

Traditionally, the OETF and EOTF were incidental to the behavior of the cathode ray tube, which could be approximated by a 0-1 exponential curve with a power value (gamma) of 2.4. Now they are defined values like ‘Linear”, “Gamma 2.4” or any of the various LOG formats. OETFs are used at the acquisition end of the video pipeline (by the camera) to convert brightness values into voltages/digital values, and EOTFs are used by displays to translate voltages/digital values into brightness values for each pixel.


PQ - Perceptual Quantization - Name of the EOTF curve developed by Dolby and standardized in SMPTE ST.2084, designed to allocate bits as efficiently as possible with respect to how the human vision perceives changes in light levels.

Perceptual Quantization (PQ) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

Dolby’s tests established the Barten Threshold (also called the Barten Limit or the Barten Ramp), the point at what the difference in light levels between two values does that difference become visible.

PQ is designed that when operating at 12 bits per channel, the stepping between single digital values is always below the Barten threshold, for the whole range from 0.0001 to 10,000 nits, without being so far below that threshold that the resolution between bits is wasted. At 10 bits per channel, the PQ function is just slightly above the Barten threshold, where in some (idealized) circumstances stepping may be visible, but in most cases should be unnoticeable.

Barten Thesholds for 10 bit and 12 bit Rec. 1886 and PQ curves.  Source

For comparison, current log formats waste bits on the low end (making them suitable for acquisition to preserve details in the darks, but not transmission and exhibition), while the current standard gamma functions waste bits on the high end, while creating stepping in the darks.

HDR systems using PQ curves are not directly backwards compatible with standard dynamic range video.


HLG - Hybrid Log Gamma - A competing EOTF curve to PQ / SMPTE ST.2084 designed by the BBC and NHK to preserve a small amount of backwards compatibility.

Hybrid Log Gamma (HLG) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

HLG vs. SDR gamma curve with and without knees.  Source

HLG vs. SDR gamma curve with and without knees.  Source

On this curve, the first 50% of the curve follows the output light levels of standard Gamma 2.4, while the top 50% steeply diverges along a log curve, covering the brightness range from about 100 to 5000 nits. As with PQ, 10 bits per channel is the minimum permitted.

HLG does not expand the range of the darks like PQ curve, and as an unfortunate side effect of the backwards compatibility coupled with the max-fall necessitated by the technology of HDR displays, whites can appear grey, when viewed in standard gamma 2.4, especially when compared to footage natively graded in gamma 2.4.


Standards

SMPTE ST. 2084 - First official standardization of HDR video transfer function by a standardization body, and is at the moment (October 2016), the most widely implemented. SMPTE ST.2084 officially defines the PQ EOTF curve for translating a set of 10 bit, or 12 bit per channel digital values into a brightness range of 0.0001 to 10,000 nits. SMPTE ST.2084 provides the basis for HDR 10 Media Profile and Dolby Vision implementation standards.

This is the transfer function to select in HEVC encoding to signal a PQ HDR curve.


ARIB STD-B67 - Standardized implementation of Hybrid Log Gamma by the Association of Radio Industries and Businesses. Defines the use of the HLG curve, with 10 or 12 bits per channel color and the same color primaries as BT.2020 color space.

This is the transfer function to select in HEVC encoding to signal an HLG HDR curve.


ITU-T BT.2100 - ITU-T Recommendation BT.2100 - ITU-T’s standardization of HDR for television broadcast. Ratified in 2016, this document is the HDR equivalent of ITU-T Recommendation BT.2020 (Rec.2020 / BT.2020). When compared with BT.2020, BT.2100 includes the FHD (1920x1080) frame size in addition to the UHD and FUHD, and defines two acceptable transfer functions (PQ and HLG) for HDR broadcast, instead of the single transfer function (BT.1886 equivalent) found in BT.2020.

BT.2100 uses the same color primaries and the same RGB to YCbCr signal format transform as BT.2020, and includes similar permissions of 10 or 12 bits per channel as BT.2020, although BT.2100 also permits full range code values in 10 or 12 bits where BT.2020 is limited only to traditional legal.

BT.2100 also includes considerations for a chroma subsampling methodology based on the LMS color space (human visual system tristimulus values), called ICTCP, and a transform for ‘gamma weighting’ (in the sense of the PQ and HLG equivalent of gamma weighting) the LMS response as L’M’S’.


HDR 10 Media Profile - The Consumer Technologies Association (CTA)’s official HDR video standard for use in HDR Televisions. HDR 10 requires the use of the SMPTE ST.2084 EOTF, BT.2020 color space, 10 bits per channel, 4.2.0 chroma subsampling, and the inclusion of SMPTE ST.2086 and associated MaxCLL and MaxFALL metadata values.

HDR 10 Media Profile defines the signal televisions can decode for the inclusion of “HDR compatibility” term in the marketing of televisions.

Note that “HDR compatibility” does not necessarily define the ability to display in the higher dynamic range, simply to the compatibility to decode and renormalize footage in the HDR 10 specification for whatever the dynamic range and color space of the display happen to be.


Dolby Vision - Dolby’s proprietary implementation of the PQ curve, for theatrical setups and home devices. Dolby Vision supports both the BT.2020 and the DCI-P3 color space, at 10 and 12 bits per channel, for home and theater, respectively.

The distinguishing feature of Dolby Vision is the inclusion of shot-by-shot transform metadata that adapts the PQ graded footage into a limited range gamma 2.4 or gamma 2.6 output for SDR displays and projectors. The colorist grades the film in the target HDR space, and then runs a second adaptation pass to adapt the HDR grade into SDR, and the transform is saved into the rendered HDR output files as metadata. This allows for a level of backwards compatibility with HDR transmitted footage, while still being able to make the most of the SDR and the HDR ranges.

Because Dolby Vision is a proprietary format, it requires a license issued by Dolby and the use of qualified hardware, which at the moment (October 2016) is only the Dolby PRM-4220, the Sony BVM-X300, or the Canon DP-V2420 displays


Metadata

MaxCLL Metadata - Maximum Content Light Level - An integer metadata value defining the maximum light level, in nits, of any single pixel within an encoded HDR video stream or file. MaxCLL should be measured during or after mastering. However if you keep your color grade within the MaxCLL of your display’s HDR range, and add a hard clip for the light levels beyond your display’s maximum value, you can use your display’s maximum CLL as your metadata MaxCLL value.


MaxFALL Metadata - Maximum Frame Average Light Level - An integer metadata value defining the maximum average light level, in nits, for any single frame within an encoded HDR video stream or file. MaxFALL is calculated by averaging the decoded brightness values of all pixels within each frame (that is, converting the digital value of each frame into its corresponding nits value, and averaging all of the nits values within each frame).

MaxFALL is an important value to consider in mastering and color grading, and is usually lower than the MaxCLL value. The two values combined define how bright any individual pixel within a frame can be, and how bright the frame as a whole can be.

Displays are limited differently on both of those values, though typically only the peak (single pixel) brightness of a display is reported. As pixels get brighter and approach their peak output, they draw more power and heat up. With current technology levels, no display can push all of its pixels into the maximum HDR brightness level at the same time - the power draw would be extremely high, and the heat generated would severely damage the display.

As a result, displays will abruptly notch down the overall image brightness when the frame average brightness exceeds the rated MaxFALL, to keep the image under the safe average brightness level, regardless of what the peak brightness of the display or encoded image stream may be.

For example, while the BVM-X300 has a peak value of 1000 nits for any given pixel (MaxCLL = 1000), on average, the frame brightness cannot exceed about 180 nits (MaxFALL = 180). The MaxCLL and MaxFALL metadata included in the HDR 10 media profile allows consumer displays to adjust the entire stream’s brightness to match their own display limits.


SMPTE ST.2086 Metadata - Metadata Information about the display used to grade the HDR content. SMPTE ST.2086 includes information on six values: the three RGB primaries used, the white point used, and the display maximum and minimum light levels.

The RGB primaries and the white point values are recorded as ½ of their (X,Y) values from the CIE XYZ 1931 chromaticity standard, and expressed as the integer portion of the the first five significant digits, without a decimal place. Or, in other words:

f(XPrimary) = 100,000 × XPrimary ÷ 2

f(YPrimary) = 100,000 × YPrimary ÷ 2.

For example, the (X,Y) value of DCI-P3’s ‘red’ primary is (0.68, 0.32) in CIE XYZ; in SMPTE ST.2086 terms it’s recorded as

R(34000,16000)

because

for R(0.68,0.32):

f(XR) = 100,000 × 0.68 ÷ 2 = 34,000

f(YR) = 100,000 × 0.32 ÷ 2 = 16,000

Maximum and minimum luminance values are recorded as nits × 10,000, so that they too end up as positive integers. For instance, a display like the Sony BVM-X300 with a range from 0.0001 to 1000 nits would record its luminance as

L(10000000,1)

The full ST.2086 Metadata is ordered Green, Blue, Red, White Point, Luminance with the values as

G(XG,YG)B(XB,YB)R(XR,YR)WP(XWP,YWP)L(max,min)

all strung together, and without spaces. For instance, the ST.2086 for a DCI-P3 display with a maximum luminance of 1000 nits, a minimum of 0.0001 nit would be, and using white point D65 would be:

G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)

while a display like the Sony BVM-X300, using BT.2020 primaries, with a white point of D65 and the same max and min brightness would be:

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)

In an ideal situation, it would be best to use a colorimeter and measure the display’s native R-G-B and white point values; however, in all practicality the RGB and white point values the display conforms to that was used in mastering, are sufficient in communicating information about the mastery to the end unit display.


That should be a good overview of the new terms that HDR video has (so far) introduced into the extended video technologies vocabulary, and are a good starting point for diving deeper into learning about and using HDR video on your own, at the professional level.

In Part 4 of our series we’re going to take the theory of HDR video and start talking about the practice, and look specifically about how to shoot with HDR in mind.