Multi-Space Color Correction: The New Paradigm

Our last technical blog post talked about color management, including considerations for maintaining color accuracy through the post process by keeping displays & projectors calibrated and understanding how each application manages the colors you see.  We alluded to the fact that at face value it can seem pretty complicated to manage color between all of the different kinds of camera technologies and technical standards, especially when dealing with multiple delivery specifications.

With this post we want to go into more detail about the issue of color grading for multiple standards, and how the new paradigm for color correction simplifies the process and keeps your grades more futureproof.  We’re going to examine a few things today.  First, how do computers process colors; second, where the problems we’re trying to fix come from; third, what the solutions are; and lastly, how to implement the solutions and why.


Color Engines

In previous posts (here, here, and here) we’ve mentioned the importance of using a color space agnostic color engine when doing specific color correction tasks - something like DaVinci Resolve Studio.  As a quick review, a color space agnostic engine is like a glorified R-G-B or R-G-B-Y calculator: It takes a specific set of decoded R-G-B or R-G-B-Y data, applies a specific set of mathematical operations based on the user input, and outputs the new R-G-B or R-G-B-Y data.

Agnostic color engines don’t care about what the data is supposed to mean, instead simply producing the results of an operation.  It’s up to the user to know if the results are right, or if the operation has put values out of spec or create unwanted distortions.  This is a double edged tool, since it places far more importance on user understanding to get things right, while being powerful enough to apply its corrections to any combination of custom situations.

As an example of how an agnostic engine works, let’s look at three of the simplest color correction operations: lift, gamma, and gain, operating strictly on the brightness (Y) component of the image.

Lift operates essentially as a global addition value: add or subtract a specific amount to each pixel’s value.  Because of the way that traditional EOTFs work and our human perception of brightness changes, lift tends to have the greatest effect on the darks: quickly raising or lowering the blacks while having a much more reduced effect on the mids and lights.

Gain operates essentially as a global multiplication value: multiply or divide the value of each pixel by a specific amount.  Since the operation essentially affects all tones within the image evenly, all parts of the image see a similar increase or decrease in brightness, though once again because of the EOTF considerations it has the greatest effect in the brights.

Gamma operates as an exponential value adjustment, affecting the linearity of values between the brights and the darks.  Lowering the gamma value has the effect of pulling more of the middle values towards the darks, while raising the gamma value has the effect of pushing the middle values brighter.  Once again, it still affects the brights and the darks, but at a much lower rate.

Notice that these operations don’t take into account what the data is supposed to mean.  And with new HDR EOTFs, especially with the Perceptual Quantization EOTF, you may find extreme changes across the image with very small values, which is why I recommend adding a roll-off curve as the last adjustment to your HDR grading workflow.

The combination of lift, gamma, and gain allow the colorist to adjust the overall brightness and contrast of the image, with fairly fine granularity and image control.

Compare these functions of an agnostic engine to their equivalents in a color space dependant engine.  In a color space dependent engine you’re more likely to find only two adjustments for controlling brightness and contrast: brightness and contrast.

The same color transformation operation has different effects on the image.  Here, I've applied the same hue adjustment curve  is applied to eight different color spaces and the effects on the vec

The brightness and contrast controls tend to be far more color space dependent, since they’re designed to affect the image in a way that more evenly affects the brightness or contrast along the expected EOTFs.  For the end user, this works a far simpler and often faster approach for minor corrections, at the expense of power, precision, and adaptability.  Which hasn’t been too bad of a trade off, so long as all digital video data operated in the same color space.

But adding support for new color spaces and EOTFs to a brightness and contrast operation requires rewriting the rules for how each of the new color spaces behave as digital RGB values.  That takes time to get right, and is oftimes not done at all.  Meaning that color space dependant engines tend to adapt more slowly to the emerging standards, and there’s no clear path for how to implement the upgrades.

Every color engine, whether we’re talking about a computer application or a chip found in a camera or display, makes assumptions about how to interpret the operations it's instructed to do.  Where the engines lie on the scale from fully color managed to completely color agnostic defines how the operations work, and what effect the ‘same’ color transformation has on the image.

The overall point here is that the same color transformations applied to different color spaces have different effects on the end image.  A hue rotation will accomplish something completely different in Rec. 709 than it will in Rec. 2020; standard gain affects HDR curves in ways that are somewhat unpredictable when compared with SDR curves.  Color engines can either try to compensate for this, or simply assume the user knows what he or she is doing.  And the more assumptions any single operation within an engine makes about the data, the more pronounced the differences if it’s applied to another color space.  These seemingly small differences can create massive problems in today’s color correction and color management workflows.


Understanding the Problem

With that background in mind, let’s explore where these problems come from.

Here’s something that may come as a shock if you haven’t dived into color management before: every camera ‘sees’ the world differently.  We’re not just talking about the effects of color temperature of light or the effects of the lenses (though those are important to keep in mind), but we’re talking about the camera sensors themselves.  All things being equal, different makes and models of camera will ‘see’ the same scene with different RGB values at the sensor.  In inexpensive cameras you may even see variation between individual cameras of the same make and model.

This isn’t a problem, it’s just how cameras work.  Variations in manufacturing process, decisions about which microfilters, microlenses, and OLPF to use, and the design of the sensor circuitry all play a part in changing the raw values the sensor sees.  But to keep things consistent, these unique camera RGB color values are almost always conformed to an output color standard using the camera’s image processor (or by the computer’s RAW interpreter) before you see them.

In the past, all video cameras conformed to analog video color spaces: NTSC/SMPTE-C, PAL, etc., and their early digital successors conformed to the digital equivalent standards: first Rec. 601 and then Rec. 709.

When it comes to conforming camera primaries to standard primaries, manufacturers had two choices: apply the look effects before or after the conforming step.  If you apply color transformations before the conforming step, you often have more information available for the change.  But by conforming first to the common color space, color correction operations would behave the same way between different camera makes and models.  

Most camera manufacturers took a hybrid approach, applying some transformations like gain and white balance before the conforming step, and then applying look effects after the conforming step.  And everything was golden, until the advent of digital cinema cameras.

Digital cinema class cameras started edging out film as the medium of choice for high quality television and feature film production a decade ago, and now vastly outnumber the quantity of film-first productions.  And here’s where we run into trouble.  Because digital cinema uses a different color space than digital video: DCI-P3.  Oh, and recently the video broadcast standards shifted to a much wider color space, Rec. 2020, to shake off the limiting shackles of the cathode ray tube.

Color space selection suddenly became an important part of the camera selection and workflow process, one that few people talked about.  Right from the get go the highest end cameras started offering multiple spaces that you could shoot and conform to, one of which was usually camera RGB.  But changing the conforming space means that any corrections or effects added to the image after conforming behaved differently than they did before, and many user generated looks would be color space dependent.

To fix this, many (but not all) digital cinema camera manufactures moved the ‘look’ elements of their color processing to before the conforming step.  This way, regardless of which color space you, as the operator, chooses any looks you apply will have the same effect on the final image.

Which is fine in camera, and fine through post production.  Unless your color correcting platform doesn’t understand what the primary color values mean, or if it can’t directly transform the values into your working space.  Then you need to create and add additional conforming elements as correction layers, which can increase the computational complexity and reduce the overall image quality.

Oh, and if you start working with multiple cameras with different look settings available, you can get into trouble almost instantly, since there isn’t usually a simple way of conforming all of them to your working space if it’s not Rec. 709.

Oh, and you may have to deliver to all of the different color spaces: Rec. 2020 for 4K television broadcast, DCI-P3 for your digital cinema delivery, Rec. 709 for HD Blu-ray and traditional broadcast, and Rec. 601 for DVDs.  And for sanitiy’s sake, let’s add HDR.

And don’t break the bank.


The Solution

What if there was a way to make sure that a) all of your looks would move simply between cameras, regardless of make and manufacturer, and b) you could color grade once and deliver in all formats simply, without needing to manage multiple grades?

There is a way to do it: create a new RGB color space that encompasses all possible color values, and do all of your color corrections there.

Here’s the block diagram:

Camera RGB -> Very Wide Gamut Working Space (Log or Linear) -> Color Correction / Looks -> Tone Map to Standard Space

By mapping all of the camera sensor values into the same log or linear space, with very wide RGB color primaries1, you can make sure that you have access to all of the image data captured by every camera and that all operations will have the same effect on all images.

But what do I mean by a “very wide gamut RGB space”?  There are two types of gamuts I’m talking about here, both of which have advantages and disadvantages.

The first kind is a color space with virtual RGB primaries:  the RGB color primaries land outside of the CIE 1931 color colorimetry diagram.  Remember that CIE 1931 maps the combinations of various wavelengths of light and the perceivable colors they produce onto an X-Y coordinate plane, and that any color space requires at least three primary color vertices on this chart.  But since the chart is bigger than all mapped colors, you can put these values outside of the actual set of real colors.

By putting the values outside of the visible color ranges you’re defining ‘red’, ‘green’, and ‘blue’ values that simply don’t and can’t exist. But they end up being quite useful, because when you map your primaries this way you can define an RGB color space that includes up to all possible color values.  Yes, you could simply use CIE XYZ values to map all colors, but all of the math needed for color manipulations would need to be redefined and rebuilt from the ground up (and it always requires at least 16 bits precision).  But an RGB space with virtual primaries allows you to use standard RGB math, while maintaining as many colors as possible.

Comparison of eight very wide gamut color spaces, five with virtual primaries, three with real (or mostly real) primaries.

Examples of color spaces using virtual RGB primaries include the color space defined by the Academy of Motion Picture Arts & Sciences, ACES AP0, and many manufacturer specific spaces like Sony S-Gamut3 / S-Gamut3.cine, ARRI Log C Wide Gamut, Canon Cinema Gamut, and the new RedWideGamutRGB found in RED’s IPP2.

The catch with these virtual primaries is that many operations that you as a colorist may be accustomed to doing won’t behave exactly the same way.  The reason being is that the RGB balance as it relates to hues and saturations doesn’t quite apply the same way.  Without getting mired in the details, the effects of these operations are related to the relative shape of the triangle produced by the RGB color primaries, and the color space triangles using all virtual primaries tend to be more dissimilar with the traditional RGB color spaces than the RGB spaces themselves.

So instead some wide gamut formats use all real, or mostly all real primaries to somewhat match the shape (i.e. color correction feel) of working with the smaller color gamuts.  A couple of examples here are Rec. 2020 (called Wide Gamut on 4K televisions), Adobe Wide Gamut, and ACES AP1.  While not covering all possible color values, these spaces cover very large portions of the visible color gamut, making them very useful for color correction working spaces.

Whichever very wide color space you choose to work in is up to you and your needs.  If your company or workflow requires ACES, use ACES.   If you’re only using one type of camera, such as a RED Weapon or an ARRI Alexa, you may find it beneficial to work in as specific manufacturer’s RGB space.

For most of the work we do here at Mystery Box that’s destined for anything other than web, I typically conform everything to Rec. 2020 and do my coloring and mastering in that space.  There are a couple of reasons for this:

  1. As a defined color spaces it uses real, pure wavelength primaries.  Meaning that so long as only 3 color primaries are used for image reproduction, it’s about as wide as we’ll ever go.
  2. It encompasses 100% of Rec. 709 / sRGB and 99.98% of DCI-P3 (only losing a tiny amount of the reds)
  3. It encompasses 99.9% of Pointer’s gamut, a gamut that maps all real-world reflectable colors (not perceivable colors, just those found in the real world) onto the CIE XYZ gamut - essentially every color producible through the subtractive primaries.
  4. While it behaves differently than DCI-P3 and Rec. 709, they all behave fairly similarly so the learning curve is low.
  5. It requires fewer tone mapping corrections for the final output.

Whether these reasons are convincing enough for you or not is up to you.  Personally, I don’t find the 0.02% of DCI-P3 it doesn’t cover to actually matter, nor do the set of greens and blue-greens it doesn’t store (and no three color system can produce). These differences are so small that only in the absolute best side-by-sides in a lab could you hope to see a difference.

Whatever you do choose to use as a working space, it’s worth investing the time to pick one and stick with it.  Since the grading transformations do behave differently in the different color spaces, it’s easiest to pick one and refine your technique there to get the best possible results.


Conversions Implementation

Looking at the generalized workflow block diagram you’ll want to consider how to implement the different conversion steps for your own productions, in order to maintain the highest quality image pipeline with the lowest time and resource costs.  So let’s go into the two main places that you need to make new choices in the pipeline, and how to plan for them.

Conforming Camera RGB to Very Wide Gamut Working Space

Moving from Camera RGB to a very wide gamut space is a slightly different process for each camera system, and can depend on whether you’re capturing RAW data or a compressed video image.

When you’re using RAW formats, you’ll manage this step in the color correction or DIT software, which is the prefered workflow when image quality is paramount.  If you’re recording to non-raw intermediate format, like ProRes, DNxHR, or H.264 or any other flavor of MPEG video, you’ll need to select camera settings that best match your target wide gamut space.

Most RAW formats ignore camera looks applied by the operator and store the color decisions as metadata, but most video formats don’t.  Once again, camera settings vary, so it’s important to look at your specific system and run tests to find out where looks are applied in your camera’s image pipeline, and whether you can add a separate look on the video outputs for on-set monitoring while capturing a flattened LOG or linear image.

If your camera can’t separate the looks applied to the video files and the video output, and you want to capture a flat image but need to see it normalized on set, loading LUTs into on-set monitors is the ideal choice for image monitoring.  The process of creating and applying monitoring LUTs varies with your workflow, but we often find ourselves using a two-step process that uses Lattice to generate color space conversion LUTs, which we bring into DaVinci Resolve Studio to add creative looks and generate the final monitoring LUT.

Some cameras or DIT applications export their look settings as CDLs, LUTs, or other metadata for you to use later in the grading process, which you can then apply in post as the starting point for the grading process.  Again, workflows vary.

Generally you’ll want to move directly from camera RGB into the working space to preserve as much sensor information as possible.  That implies that you need to decide what your working space will be before capture (ACES AP0 or Rec. 2020 are recommended for future broadest compatibility), though it’s sometimes not an option.  While RAWs maintain camera primaries and allow you to jump directly into a wide working space later, if you’re forced to conform to standardized RGB for video intermediates you'll need to make that decision as early as possible.  In that case, put them into the widest color space available by the camera, whether that’s Rec. 2020, DCI-P3, or the manufacturer’s proprietary Wide Gamut space.

If RAWs aren’t an option, using a 12 bit log format video is your next best choice.  10 bit is fine too, but you won’t get as clean of corrections later, and may see some banding in fine gradients.  Anything less than 10 bits per channel create severe problems when color grading and really should only be used as a last resort.  When recording using an 8 bit format, you should only use a standard SDR EOTF (never LOG) - LOG with only 8 bits of precision can create MASSIVE amounts of banding.

To summarize: To maintain the highest image quality with the smallest resource pain, use RAW formats when possible, convert to the working or widest color space if you have to record as video files, and use LUTs on display outputs to avoid baking camera looks into the video data.

Tone Mapping the Working Space to Output

Moving from a wide working space to a final deliverable space is generally relatively simple process: simply convert each color value from the working space to the color value equivalent of the target space, and discard any data that lands outside of the target range.  In most Rec. 2020 -> DCI-P3, or Rec. 2020 -> Rec. 709 conversions, this is completely fine.  You may find minor clipping in a few of the most saturated colors, but overall you shouldn’t see too many places where the color is so bad you can’t live with it.

Where you do run into problems is when you’ve graded using an HDR transfer function and are moving into SDR.  A straight translation here results in very, very large amounts of clipping.  I haven’t mentioned EOTFs much yet, simply because most color engines where you’ll be doing wide gamut work use linear internals, since that tends to offer the most dynamic range and manipulation potential.

However, displays rarely offer a linear EOTF and so you’ll have to be monitoring in some transfer function or another.  Display monitoring is another reason I typically grade in BT.2020 (and usually in HDR), since displays need to be set to a specific color space and EOTF.  Which means that if you’re using a very wide working space, you must apply a tone map to your monitoring output, regardless of whether you’re grading in HDR or SDR (especially when you’re working in linear light).

The first series we published here on the blog about HDR video included a section on “Grading Mastering, and Delivering HDR”, where we presented a few bezier curves you can apply as the last element in your node structure for HDR grading in PQ or HLG.  These bezier curves are essentially luminance tone maps, converting the linear light values into the specific range of digital values you use for HDR.

A full tone map typically includes considerations for converting color information as well.  Just like the bezier curves control the roll off of the lights into your target range, tone maps roll off color values between color spaces, to minimize the amount of hard clipping.  Here its important to exercise caution and experiment with your specific needs before selecting a tone map, since this step can create hue or saturation shifts you don’t expect.

Tone mapping is the golden goose of simplifying multi-space color corrections.  It’s what brings everything together by making it possible to very, very quickly move from your working space into your delivery space.

If you grade in ACES AP0 or AP1, the tone maps are already prepared for your conversions.  Simply apply the tone map for the target system and voila, the conversion’s ready for rendering, preserving all (or rather most - they aren’t quite perfect) of the authorial intent of the grade.  We did this on our Yellowstone video to generate the HDR master.2

Grading in other wide color spaces often requires custom tone maps, or on-the-fly maps generated by a program such as DaVinci Resolve Studio.  RED Digital Cinema, for instance, has produced LUT based tone maps for converting their RedWideGamutRGB Log3G10 footage into various HDR and SDR color spaces.  The entire Dolby Vision format is essentially a shot by shot set of tone maps for various screen brightnesses.

Or, you may find yourself doing what we’ve done - spend time to create your own tone mapped LUTs for converting HDR and SDR of various formats, and refining these maps for each individual piece of HDR content so that you end up with the optimal SDR look for that work.


Why Bother?

Wide color space corrections and tone mapping for various output systems is the way that color correction will be treated and handled in the future.  With the arrival of BT. 2020 and HDR transforms, in just the last few years the number of delivery color encoding formats has increased three fold at the very least.  The only way to ensure your content will be compatible in the future is to adopt the new paradigm and multi-space coloring workflow.

DaVinci Resolve Studio’s latest update (release of version 14) saw a significant overhaul of the color management engine in the last few beta versions to optimize the core functionality for this kind of color management workflow.  If you’re using DaVinci Color Management or ACES color management in the latest version, DaVinci will automatically select the optimal RAW interpretation of your footage and conform it to your working space, removing the ambiguity of how to interpret your footage and maintaining the maximum image quality.

Another manufacturer who’s natively implemented a similar color pipeline is RED, with their new IPP2 color workflow.  They’ve moved all of their in-camera looks and apply them to the image data after the sensor RGB is converted into RedWideGamutRGB, tone mapping all outputs to your monitoring space.  With that they now allow you to select whether color adjustments in camera are burned into the ProRes files, or simply attached as a LUT or CDL.  This way, regardless of what your monitoring or eventual mastering space is, the color changes you make in camera will have the exact same effect across the board.

This is the workflow of the future.  Like with HDR, which we can assume will be the EOTF of the future, the efficiencies and simplicities of this particular workflow are so great that the sooner you get on board with it the better your position will be in a few years time.  Grading in ACES AP0 offers a level of future proofing that not even BT.2020 provides.  While BT.2020 still exceeds what current technologies can really do, ACES AP0 ensures that regardless of where color science heads in the future (4+ color primaries?), your footage will already be common format that’s simple to convert to the new standard, preserving all color data.

While there is a learning curve to this workflow, at a technical level it’s simpler to learn and apply than even understanding how HDR video works.  Yes, it takes some getting used to, but it’s worth learning.  Because in the end, you’ll find better quality than you can otherwise hope for.
 

Footnotes:

1 Yes, I’m making up the term “Very Wide Gamut” or “Very Wide Gamut RGB” simply because “Wide Gamut” and “Wide Gamut RGB” can refer to many different specific spaces, depending on the circumstances. Here I’m referring to any of these typical wide gamut spaces, or any space that covers a very large portion of the perceivable gamut.

2 A caveate note about ACES tone mapping: We used ACES AP0 with an ACEScc EOTF for our Yellowstone video. The tone mapping into HDR was fantastic and allowed me to skip my own range limiting map, and the ability to select different input transforms for each shot was fantastic. However, ACES failed when trying to generate an SDR version of the film: instead of tone mapping the higher dynamic range into the smaller SDR range, it clipped at the limits of SDR. This limitation makes me hesitant about recommending ACES for mixed-dynamic range work. It works wonderful for one or the other, but don't expect it to tone map directly between the two.

RED MONSTRO VV SENSOR Thoughts

First off huge thanks to everyone at Red for getting me this camera so quickly. Like many of you, we’ve been holding out and waiting for VistaVision for a long time, and now know the wait seems to be coming to an end. 

I’ve only had the sensor for less than a day but I’m EXTREMELY IMPRESSED! I haven’t done side by sides with the Helium but I would say the sensor is just as clean, and like many others who have been shooting VistaVision, it’s pretty addicting. 

The following are my non-techie, non-polished, thoughts on this sensor and what it means to the industry. 

  1. 5K S35. This might be Red’s first true Arri competition. Finally, a super clean sensor with amazing highlight roll off that shoots at 5K! I love the resolution and the flexibility that it gives me but clients/agencies/post house aren’t always the biggest fans. No matter how many conversations I have about the benefits of R3D compression, it’s data rates and so on, we eventually have to bend over and give them what they want — and they love 4K ProRes 4444. Now, thanks to processor and GPU advancements, the most common video editing stations can handle 5K R3D just as well, if not better than 4K ProRes 4444 when you throw it in timeline- but you get the flexibility of RAW. While I loved the Dragon sensor and got to know it really well, it had its limitations which the Helium, and now Monstro sensor, have addressed. Now that you can get S35 5K from Monstro I can’t see a reason to shoot ProRes 4444 anymore. (Now that is still for the client to decided and I would still love if Red allowed for 4K ProRes 4444 only recording in-camera for those “back-up and walk away” clients). Besides just being 1K above 4K deliverable which is ideal for fast turn, non-future-proof productions, 5K also offers better rolling shutter performance compared to 8K even 7K with the Helium. Making it great for car work. 
  2. High-Speed. 2K looks amazing on this sensor!!!! It’s super, super, super clean and usable even for a 2K finish. I have to do more compression tests but just looking through the monitor at 2K 300fps looks amazing. Which means for commercial work with 1080P finishes you should be pretty safe to shoot! Bear in mind that I have been using Otus Zeiss, and your lenses are going to play a big role, am really excited about this! And 4K 120P looks amazing as well! When I have more time next week I’ll do some true compression test and post some R3D, but yeah this sensor’s low noise floor opens up a lot of possibilities. 
  3. 8K VV vs. 8K S35. For the first three hours after turning the Monstro on I was convinced that I would upgrade all of my cameras to Monstro! Just because it offers so much more flexibility and speed at lower resolutions which is especially useful for commercial workflows and the VistaVision field of view is just so so addicting. However, having both on hand is going to be a must for me. While I shoot a lot of commercials that only have 13 week life span and future proofing isn’t really necessary, the majority of my work is! 8K S35 allows me to capture 8K/7K with a wide variety of vintage and new lenses and gives me the needed crop factor for shooting wildlife on long lens. 8K VV gives me 8K with a field of view that is just breath taking! While people will always argue just use a wider lens with S35, I’ll say there is nothing like shooting VV! 

Anyways, I’ll post more informations and include more techie stuff (nothing compared to Phil knowledge) in this thread later as I have more time to test. But honestly, at the end of the day, I am a shooter so I’ll be shooting a lot with this camera in the field than shooting charts and doing side by side comparison tests. 

- Jacob Schwarz

Display Calibration & Color Management

There are many different ways for consumers to experience your content today - so many that it’s often difficult to predict exactly where and how it’ll be seen.  Is it going to a theater?  Will it be watched on a television?  Which of the many handheld devices or personal computers will an end consumer use to view and listen to your work?  And how is that device or environment going to be rendering the colors?

Color management is an important consideration for every modern digital content production company to keep in the forefront of their minds.  In larger post production environments, there will often be a dedicated team that manages the preservation of color accuracy across the many screens and displays found throughout the facility.  But for small companies and independent producers, the burden of color management often falls on an individual with multiple roles, and is easier to ignore and to hope for the best than to spend the time and money to make sure it’s done right.

Before going any further, it’s important to define what we’re talking about when we say ‘color management.’  Color management is different than color correction or color grading, which is the process of normalizing colors and contrasts, maintaining color consistency across edits, and applying creative looks to your footage.  Instead, color management is about making sure the colors you see on your screens match as closely to the what the digital values stored in your video files are actually describing, within the color space you’re using.

In practice this means making sure that your displays, televisions, projectors, or other screens, as well as your lighting environment, are all calibrated so that their RGB balance, brightness, and contrast all match as close to the target standard as you can get them.  This makes sure that you don’t accidently add corrections to your digital data when you’re trying to ‘fix’ what you see on your displays that’s only there because of your displays or environment.  “Burning in” these kinds of shifts adversely affects the quality of your content by creating perceptual color shifts for your clients and consumers.

While calibration is essential, color management also involves ensuring the preservation of color from camera to end user display, keeping the color consistent between programs and ensuring your final deliverables contain the appropriate metadata.  Both parts to color management are essential, so we’re going to talk about both.  We’ll focus more on the calibration aspect of color management since that’s essential to get right, before briefly addressing color management in applications without getting mired too deep in advanced technical talk.


The problem

How do I know that my red is the same as your red?

This is one of the fundamental philosophical questions of color perception.  How do I know know that the way that I perceive red is the same as the way that you perceive red, and not how you perceive blue or green?  There’s actually no way to measure or determine for certain that the perceived shades are identical in the minds of any two individuals, since color perception happens as the brain interprets the stimulus it receives from the eyes.

While being a fun (or maddening) thought provoking question, color sameness is actually a really important baseline to establish in science and imaging.  In this case we’re not asking about the perception of color, but whether the actual shade of color produced or recorded by two devices is the same.  Today we’re only going to focus on colors being produced, and not recorded - we’ll cover capturing colors accurately in our next post.

There are a LOT of different kinds of displays in the world - from the ones we find on our mobile devices, to computer displays, televisions, and consumer or professional projectors.  The core technologies used to create or display images, such as plasma, LCD, OLEDs, etc., all render shades of color in slightly different ways, leading to differences in how colors within images look between displays.

But it’s not just the core technology used that affects the color rendition. Other factors like the age of the display, the specific implementation of the core technology (like edge-lit or backlit LCDs), the manufacturing tolerances for the specific class of display, the viewing angle, and the ambient environment all affect the colors produced or the colors perceived.  Which makes it almost impossible to predict the accuracy of color perception and rendering for one viewer, let alone the thousands or millions who are going to see your work.

But rather than throw up your hands in despair at the impossibility of the task, shift your focus to what you, as the content creator, can do: if you can be reasonably sure that what you see in your facility is as close to what’s actually being encoded as possible, you can be confident that your end viewers will not be seeing something horrifying.  While every end viewer’s experience will be different, at very least your content will be consistent for them - it will shift in exactly the same way as everyone else’s content, a shift they’re already used to and don’t even know it.

For that reason it’s important that when you master your work you’re viewing it in an environment and with a display that’s as close to perfectly accurate as possible.  But unfortunately, color calibration isn’t something you can simply ‘set and forget’: it needs to be done on a recurring schedule, especially with inexpensive displays.


What is Color Calibration?

How do we make sure color looks or is measured the same everywhere?

This question was first ‘answered’ in 1931 with the creation of the CIE XYZ color space.  Based the results of a series of tests that measured the sensitivity of the human vision to various colors, the CIE created a reference chart that mapped how the brain perceived the combination of visible wavelengths as colors into a Cartesian plane (X-Y graph).  This is called the CIE 1931 Chromaticity Diagram 

Three different color spaces referenced on the CIE 1931 Chromasticity diagram.  The colors within each triangle represent the colors that can be produced by those three color primaries.  All three share the same white point (D65).

This chart allows color scientists to assign a number value to all perceivable colors, both those that exist as a pure wavelength of light, and those that exist as a combination of wavelengths.  Every color you can see has a set of CIE 1931 coordinates to define its chromaticity (combined hue & saturation, ignoring brightness), which means that while we may not have an answer to a philosophical question of individual color experience, we do have a way of scientifically determining that my red is the same as your red.

This standard reference for colors is a powerful tool, and we can use it to define color spaces. A color space is the formal name for all of the colors a device can capture or produce using a limited set of primary colors.  Map the primary colors onto the chromaticity diagram, join them as geometric shape, and your device you can create or capture any color within the enclosed shape.  With an accompanying white point, you have the fundamentals ingredients for a defined color space, like Rec. 709, sRGB, AdobeRGB, etc.

Defining and adhering to color spaces is actually quite important to managing and matching end to end color.  Digital RGB values have no meaning without knowing which of the many possible shades of red, green, or blue color primaries are actually being used.  Interpreting digital data using different RGB primaries than the original creator used almost always results in nonlinear hue shifts throughout the image.

This is where color calibration comes in.  Color calibration is the process whereby a technician reads the actual color values produced by a display, and either adjusts the display’s settings to conform more closely to the target color space, and / or adjusts the signal coming to the display to better match the targeted output values.

To do this, you need access to four things:

  1. A signal generator to send the display specific digital values
  2. A colorimeter to measure the actual colors produced
  3. Control of the display’s input signal or color balance settings to adjust the output
  4. Software to manage the whole process and correlate the signal to measurement

If you want to make sure you’re doing it right, though, an in-depth understanding of how color and every image generation technology works helps a lot too.

Some consumer, most prosumer, and almost all professional displays leave the factory calibrated, though consumer and commercial televisions and almost all projectors must be calibrated after installation, for reasons we’ll talk about later.  Unfortunately, displays lose their calibration with time, and each kind and quality of display will start showing more or less variance as they age.  Which means that in circumstances where calibration is important, such as in professional video applications, displays require regular recalibration.

For desktop displays, this usually involves creating or update the ICC color profile, while for reference displays this typically involves adjusting the color balance controls so that the display itself better matches the target color space.

The differences in calibration technique comes from the workflow paradigm.  For desktop displays it’s assumed that the host computer will be directly attached to any number of different kinds of displays, each with their own color characteristics, at any given time - but always directly attached.  So, to simplify the end user experience, the operating system handles color management of attached displays through ICC profiles.

ICC profiles are data files that define how a display produces colors.  It records the CIE XYZ values of its RGB color primaries, white point, and black point, and its RGB tone curves, among some other metadata.

Using this information, the operating system “shapes” the digital signal sent to the display, converting on the fly the RGB values from the color space embedded in an image or video file into the display’s RGB space.  It does this for all applications, and essentially under all circumstances.  Some professional programs do bypass the internal color management, sort of, by assigning all images they decode or create to use the generic RGB profile (i.e. an undefined RGB color space). But it’s usually best to assume that for all displays directly attached to the computer, the operating system is applying some form of color management to what you’re seeing1.

Calibrating direct attached displays is relatively quick and easy.  The signal generator bypasses the operating system’s internal color management and produces a sequence of colored patches, which the colorimeter reads to map the display’s color output.  The software then generates an ICC color profile for that specific display, which compensates for color shifting from wear and tear, or the individual factory variances the display has.

Once calibrated, you can be reasonably confident that when viewing content, you’ll be seeing the content as close to intended as that particular display allows.

Reference displays, projectors, and televisions have a slightly different paradigm for calibration.  For calibrating computer displays, you can shape the signal to match the display characteristics.  But because of the assumption that a single video signal will (or at very least can) go to multiple displays or signal analysis hardware at the same time, and the signal generator is likely to have no information about the attached devices, it’s simply not practical to adjust the output signal.  Rather, professional output hardware always transmit their signals as pure RGB or YCbCr values, without worrying about the details of color space or managing color at all.

So instead of calibrating the signal, calibration of reference displays, projectors, or any kind of television usually requires adjusting the device itself.2

Once again, a signal generator creates specific color patches the colorimeter reads to see exactly what values the display creates.  Software then calculates the color’s offset as a Delta E value (how far away is the color produced from where it’s supposed to be according to the selected standard) and reports to the operator how far away from calibration it is.

The operator then goes through a set of trial and error adjustments to the image to lower the Delta E values of all the colors to get the best image possible.  Tweak the ‘red’ gain and see how that affects the colors produced.  Touch the contrast and see its effect on the overall image gamma - and on all the other colors.  Measure, tweak, measure, tweak, measure, tweak… and repeat, until the hardware is as close to the target color space as possible.

Calibration results showing DeltaE values for greyscale and color points

Generally, Delta E values less than 5 are good, less than 3 are almost imperceptible, and under 2 is considered accurate.  Once the calibration is complete, you can be reasonably sure that what you’re seeing on a reference display, projector, or television is as close to the target color space as possible.  But does that even matter?


Regular Calibration

Medium priced computer displays and professional reference displays usually leave the factory with a default calibration that puts them as close to standard reference as the technology allows.  The same is not true of most televisions and projectors - they leave the factory uncalibrated, or are in an uncalibrated mode by default for a couple of reasons which we’re not going to get into.

But even with this initial factory calibration for the displays that have it, the longer a display’s been used the more likely it will be experience color shifts.  How quickly it loses calibration depends on the technology in use: some technologies can lose their calibration in as short as a month with daily use.

The reasons behind this shift over time can be lumped together as “wear and tear”.  The exact reasons for each different display technology losing its calibration are a little off topic, so I’m going to spare you the gory details of the exact mechanisms that cause the degradations.  However, the important things to know are:

  1. The backlight of LCDs and the main bulb in digital projectors change colors over time.  This is a major problem with the xenon arc lamps found in projectors, and is a bigger problem for CCFL LCDs than for LED lit (white or RGB) LCDs, but even the LED spectrums shift with use.
  2. The phosphors inside of CRTs and plasma displays degrade with time and their colors change, as do the primary color filters on LCD displays though at a slower pace.
  3. Anything using liquid crystals (LCD displays and LC or LCoS projectors) can suffer from degradation of the liquid crystal, which affects color and brightness contrasts.
  4. The spectrum of light emitted by plasma cells change with age, so they don’t stay balanced for the same output levels.

Or in other words, all displays change colors over time.  Setting up a regular calibration schedule for every display that you look at your content on is an important part of color management.  You don’t want to move a project from your reference display to your desktop to find that suddenly the entire video appears to be pulling magenta, or invite a client to review your work in your conference room to find the picture washed out or color shifted.


Environment and Color Management

Up until now we’ve been talking about the color characteristics of your displays and projectors.  But just as important as your display calibration is the characteristics of your environment in general.  The brightness level and color of lights in the room affect perceptions of contrast and the colors within the image.

This is really easy to get wrong.  Because not only does the display need to be calibrated for the target color space, it should be calibrated within the target environment.  The technician handling the calibration will usually make a judgement call for changing display values like display brightness, gamma curve, or white point based on these environmental choices.  But they may also make other recommendations about the environment to improve the perception of color on the screen - what to do to other displays, lighting, windows etc., so that your perception of color will better match industry standards.

Generally speaking, reference environments should be kept dim (not pitch black), using tungsten balanced lighting that’s as close to full spectrum as possible.  Avoid daylight balanced bulbs, and install blackout curtains on any windows.  Where possible, keep lighting above and pointed away from the workstation screens - reflected light is better than direct lighting, since it reduces glare and is better for color perception.

The easiest way get proper lighting is to set up track lighting with dimmable bulbs (LED or tungsten based, colored between 2800K & 3200K), and point the pots slightly away from the workstation.  The dimmer ensures that you can bring the environment into specification for grading, but can then bring the lighting back up to normal ambient conditions for general work or for installing hardware etc.  If changing the overhead lighting isn’t an option, good alternatives are stick lights on the opposite side of the room, positioned at standing height.

Keep your reference display or projector as the brightest screen in the environment.  If you don’t, your brights will look washed out and gray since they’re dimmer than other light sources. It will also affect your overall perception of contrast: you’ll perceive the image as darker and having more contrast than expected, and are therefore more likely to push up the mids and dark and wash out the image as a whole.  Dimming the brightness of interface displays, scopes, phones or tablets, and any other screen within the room will make sure that you’re perceiving the image on your reference hardware as accurately as possible.

Depending on the number of interface displays and other other light sources in the room, you may need to further lower ambient lighting to keep contrast perception as accurate as possible.  In rare cases, such as in small rooms, this may include turning the lights off completely since the interface displays provide sufficient ambient lighting for the environment.

Calibrating your displays is essential, calibrating the environment is important.  Usually it’s pretty easy to tweak environmental calibration for better color perception, so long as you’re starting from a dark or otherwise light controlled environment.  And unlike display calibration it’s something you can do once and not need to tweak for years.


Application Color Management

Once you’ve calibrated all of your hardware and your environment, it’s easy to assume that your job is done, and you don’t have to worry about color management until the next time you book a calibration session.  Oh how I wish that were the case.

Different applications manage color in different ways, which means you may still see differences between applications with the same footage.  Sometimes applications get in fights with the operating system over who’s managing color and both end up applying transformations you’re not aware of.

Which means it’s important to understand exactly how each application touches color.  To do that, let’s briefly look at how four common applications manage color: Adobe Premiere, Final Cut Pro X, Adobe After Effects, and DaVinci Resolve.

Both Adobe Premiere and Final Cut Pro X actively manage the colors within the project.  Adobe Premiere gives you exactly no ways of changing the color interpretation of the input files, beyond the embedded metadata in HEVC and a few other formats (NOT Apple ProRes).  It conforms everything to Rec. 709 in your viewers and signal outputs, and there’s no way to override this.  The operating system then uses the display’s ICC profile to conform the output so that you can see it as close to Rec. 709 as possible.  Which is good, because it means that when you output the video file, what you see is what you get.

Adobe Premiere’s color engine processes colors in 8 bit.  You can turn 16 bit color processing in the output or in the sequence settings by flagging on “Maximum Bit Depth” and “Maximum Render Quality.”  This is really important for using high bit depth formats like Apple ProRes, which stores 10 or 12 bit image data, assuming you want to maintain high color fidelity with your output files.  If you’re outputting to 8 bit formats for delivery you may still benefit from keeping these flags on, however, depending on how in depth your color corrections and gradients are.

Basically, Adobe Premiere assumes you know nothing about color management, and that it should handle everything for you.  Not a terrible assumption, just something to be aware of when you start thinking about managing color yourself.

Like Adobe Premiere, Final Cut Pro X also handles all of the color management, but offers at least a small amount of control over input and output settings.  By default, it processes colors at a higher internal bit depth than Premiere, and in linear color which offers smoother gradients and generally gives better results.  You also get to assign a working color space to your library and your project (sequence), though your only options are Rec. 709 and Wide Color Gamut (Rec. 2020).

Each clip by default is interpreted as belonging to the color space identified in its metadata, and conformed to the output color space selected by the project (sequence).  If necessary, you can override the color space interpretation of each video clip by assigning it to either Rec. 601 (NTSC or PAL), Rec. 709, or Rec. 2020 (notably missing is DCI-P3 and HDR curves).  When using professional video outs, the signal’s data levels of the is managed by the selection of Rec. 709 or Rec. 2020, and FCP-X handles everything else.  Like Adobe Premiere, it works with the operating system to conform the video displayed in the interface to the attached monitor’s ICC profile.

Both Adobe Premiere and FCP-X work on a “what you see is what you get” philosophy.  If your interface display is calibrated and using the proper ICC profile, you shouldn’t have to touch anything, ever.  It just works.  But gods Adobe and Apple forbid you try to make it do something else.

On the other hand, Adobe After Effects and DaVinci Resolve have highly flexible, colorspace agnostic color engines that allow you to nearly completely ignore all color management.  They’re quite content to simply apply the transformations you’ve requested to the digital data read in, and to not care about what color space or contrast curve the digital data is in.  And when you output, it simply writes the RGB data back to a file and you’re good to go.

Of course, that’s the theory.  After Effects makes a few color assumptions under the hood about intent, including ignoring the display ICC profile on output, since it has no idea what color space you’re working in anyway.  That sounds innocuous, but it’s a problem if you’re using a display with properties that are mismatched to the color profile of the footage you’re using3.  Suddenly your output, with an embedded color profile and playing back in a color managed application, may look significantly different than it did in After Effects.

Turning on After Effect’s color management by assigning a project working space allows for a more accurate viewing of the final output.  You can then flag on the view option to “Use Display Color Management” (on by default), and adjust the input space of any RGB footage.  But you can still get into trouble: any chroma subsampled footage, like ProRes 422 or H.264, is only permitted to use the embedded color profile.  Also Adobe ignores ProRes metadata for Rec. 2020 and HDR, which will negatively affect the output when using color management.  It also exhibits strange behavior when using HDR gamma curves and in some other working spaces.

DaVinci Resolve has some of the best functionality for color management.  It’s agnostic color engine renders color transformations in 32 bit float precision, and outputs raw RGB data to your video out.  It assumes you know what color space you’re using, so it’s happy to ignore everything else.  By default, on a Mac it applies the monitor ICC profile to the interface viewers, with the assumption that your input footage is Rec. 7094.

Fortunately, changing the working space is incredibly easy, even without color management turned on - simply set it the color primaries and EOTF in the Color Management tab of the project settings.  With color management off, this will only affect the interface display viewers, and then only if the flag “Use Mac Display Color Profile for Viewers” is set (on by default, MacOS only).  Unfortunately it does not as of yet apply ICC profiles to the viewers under Windows (see footnote 4).

When you turn DaVinci Resolve’s color management on, you have extremely fine grained control over color space - being able to set the input, working, and output color spaces and gammas separately (with Resolve managing the transformations on the fly), and then being able to bypass or override the input color space and gamma on a clip by clip basis in the color correction workspace.  And because of their 32 bit floating point internals, their conversions work really well, preserving “out of range” data between nodes and between steps in the color management process, allowing the operator to reign it in and make adjustments to the image at later steps - an advantage of active color management over LUTs in a few cases.

Input Processing Output Display
Adobe Premiere Assumes embedded or Rec. 709, cannot be changed 8 bit Rec. 709 with Gamma 2.4 assumed, 16 bit and linear color processing possible Rec. 709 on all outputs Output conformed to display using ICC profile
Final Cut Pro X Assumes embedded or Rec. 709, overridable to Rec. 2020 10-12 bit Rec 709 or Rec 2020 (configured by library) with gamma 2.4. Rec. 709 or Rec. 2020 on all outputs (configured by project) Output conformed to display using ICC profile
Adobe After Effects Assumes embedded or Rec. 709, ignored by default, reassignable for RGB formats but fixed interpretation of YCbCr 8 or 16 bit integer or 32 bit float agnostic color engine. Working space assignable on project basis, many fixed working spaces available RGB output in working space or generic RGB Color space and calibration defined by display (Pro out), output conformed to display using ICC profile for direct attached interfaces when working space assigned.
DaVinci Resolve Studio Ignored by default, global assignable with per-clip overrides to nearly any color space 32 bit floating point agnostic color engine. Working space assignable on a project basis, many combinations of working spaces with independently assignable color primaries and EOTFs RGB output in working space or assignable output space, or generic RGB Color space and calibration defined by display (pro out), output conformed to display using ICC profile for direct attached interfaces when working space assigned, LUTs available for pro output calibration.

These four programs kind of form a good scale for understanding application color management.  Generally speaking, the easier an application is to set up and use, the more hands-off management it’s likely to do, and give you anywhere from no, to very limited control over color management.  More advanced programs usually offer more in depth color management features, or the ability to bypass color management completely so that you’re able to have the finesse you need.  They also tend to preserve RGB data internally (and output that RGB data through professional video output cards), but require more of a knowledge of color spaces and the use of calibrated devices.

Calibrating your displays is a significant portion of the color management battle, though it’s also necessary to understand exactly what the applications are doing to the color if you want to be able to trust that what you’re seeing on the screen is reasonably close to what will be delivered to a client or to the end user.


What A Fine Mess We’re In

Keeping displays and projectors calibrated and trusting their accuracy has always been a concern, but it’s really become a major issue as the lower cost of video technologies has made the equipment more accessible, and since both the video and film production industries have shifted into modern digital productions.

“Back in the day”, analog video displays relied on color emissive phosphors for their primary colors.  The ‘color primaries’ of NTSC and PAL (and SECAM) weren’t based on the X-Y coordinates on the CIE XYZ 1931 diagram, but on the specific phosphors used in the CRT displays that emitted red, green, and blue light.  They weren’t officially defined with respect to the CIE 1931 standards until Recommendation BT.709 for High Definition Television Systems (Rec. 709) in 1990.

Around that time, with the introduction of liquid crystal displays computer displays also had to start defining colors more accurately.  They adopted the sRGB color space in mid to late nineties, using the same primaries as Rec. 709 but with a different data range and more flexible gamma control.  Naturally, both of these standards based their color primaries on… the CRT phosphors used in NTSC and PAL televisions systems.  And while the phosphors degrade and shift over time, they don’t shift anywhere near as much as the backlights of an LCD.  Meaning that prior to the early 2000s, when LCDs really took off, calibration was far less of an issue.

Now we have to worry not only about the condition of the display and its shifting calibration, but which of the multiple color spaces and new EOTFs (gamma curves) the display or application works with, what client deliverables need to be, and which parts of the process may or may not be fully color managed with our target spaces supported.

And then we have film.  Right up until the advent of end to end digital production, film had the massive benefit of “what you see is what you get” - your color space was the color space of the film stock you were using for your source, intermediates, and masters.  Now with the DCI standard of using gamma corrected CIE X’Y’Z’ values in digital cinema masters, you have to be far more cautious of projector calibration: it’s not possible to convert from CIE X’Y’Z’ into proper color output without regularly measuring the projector’s actual output values.  And we’re not going to talk about the nightmare of DCI white points and desktop displays that use the DCI-P3 color space.

Oh, and by the way, every camera sees the colors differently than the actual color spaces you’re trying to shoot in, and may or may not be conforming the camera color primaries to Rec 709, DCI-P3, or something else.  Because this needed to be more complicated.

Fortunately, with a basic understanding of color management and color calibration navigating the modern color problems is actually much more manageable than it all appears on face value.  In our next post we’re going to be discussing RED Digital Cinema’s Image Processing Pipeline 2 (IPP2), and why it’s the perfect paradigm for solving the modern color management problem.


But in the meantime, if you’re working in the Utah area and want to figure out the best way of calibrating your workspace or home office, give us a call.  We’ve got the right equipment and know how to make sure that when you look at your display or projector, you’re seeing as close to the standards as possible.

Color and deliver with confidence: make sure it’s calibrated.
 


ADDENDUM:

Color management and calibration are trickier than I’ve made it sound.  I’ve simplified a few things and tried to be as clear as possible, but there are many, many gotcha’s in the process of preserving color that can make it maddening.  And this is one area where a small amount of knowledge and trying to do things yourself can get you into huge amounts of trouble really quickly.

Trial and error is important to learning, and often it’s still the only way to feel out exactly what an application is doing to your files.  But be smart: calibrate your displays and let the programs manage things for you, unless you’re intending on experimenting and know the risks associated with it.

 

Footnotes:

1 Note, this is not a bad thing.  In most cases it’s a good thing.  It’s just something to be aware of and to understand how it works.

2 It’s also possible to use lookup tables to shape the signal for viewing on a reference display.  Here, the software will measure the actual values produced by the display, and calculate the offsets as values to put in a 3D LUT.  When attached to multiple displays using the same professional signals, LUTs should be applied using external hardware, when attached to one display only it’s acceptable to apply the LUT in the software application generating the output signal or in a hardware converter.  Ensure that the LUT is not applied to any place on the signal upstream of the final output recording.

3 This is a big problem with the iMac, or any other Wide Gamut / DCI-P3 display.  Colors will look different than expected without enabling color management within After Effects.

4 At least it did, until DaVinci Resolve Beta 14b8, 14b9, and 14.0 release - the option to flag on and off color management for the display disappeared with this update and I haven’t had time to test whether it’s on by default, works under Windows, or whether they’ve gone a different way with their color management.

Resolving Post Production Bottlenecks

Every system has one or more bottlenecks - the factors that limit all other operations or functions and controls the maximum speed things can happen.  This is true in every aspect of life, whether we’re talking chemistry, physics, biology, human resources, a film set, or editing and grading footage in post-production.

We’re not going to get into the bottlenecks in film production here since they tend to have a variety of causes and are often unique the type of production you’re working on or the companies or individuals involved.

Instead we want to look at finding bottlenecks in Post-Production, understanding how each one can limit the speed at which you can work, and when it can be simple or inexpensive fixes that can increase the level of productivity.

Broadly speaking, all bottlenecks in post fall into the following categories: storage device speedstorage transfer speedsperipheral transfer speedsprocessing power (CPU and GPU)software architecture, and workflow.

Read More

When Should You Buy a REDROCKET-X?

It’s no secret among those we work with that we love RED.  And yet, with all of our camera purchases here at Mystery Box, we’ve never bought our own REDROCKET or REDROCKET-X.  On occasion we’ve borrowed a REDROCKET for projects here or there and we regularly discuss whether we should get one or not.  But we haven’t.  Even after the upgraded REDROCKET-X was released in 2013, we were still on the fence as to whether it would actually accelerate our workflows.

But instead of arguing about what-ifs and maybes, we decided to use a couple of days near the end of last year to really put it to the test.  We borrowed a friend’s REDROCKET-X and two full days of testing later, we had our results.

The TL;DR version of our results is that the the value of a REDROCKET-X depends significantly on your workflow.  For some it’s definitely worth it, while for others (including us) it’s far less so.

Specifically, you should consider a REDROCKET-X when your workflow demands 1. Real-time or faster R3D decoding and 2. The bottleneck / choking point is the actual decoding process, and not another point in the workflow.

Read More

Delivering 8K using AVC/H.264

YouTube launched 8K streaming back in 2015, but the lack of cameras available to content creators meant 8K uploads didn’t start in earnest until late 2016.  That’s around the time when we uploaded our first 8K video to YouTube, and while we ran into some interesting problems getting it up there (which aren’t worth discussing because they’ve all been fixed), overall we're impressed with YouTube’s ability to stream in 8K.

Being naturally curious, I wanted to know more about what they were using for 8k compression, so I downloaded the mp4 version YouTube streams to see which codec it was using.  Let me save you some time finding it yourself and show you what settings YouTube uses for 8K streaming on the desktop:

MediaInfo of a YouTube video file showing the 8K resolution in the AVC/H.264 codec

Does anything look weird to you? Unless you’re a compressionist, maybe not.

Here’s what’s strange: it lists the codec as AVC, otherwise known as H.264.  The problem with that is the largest frame size permitted by the H.264 video codec standard is 4,096 x 2,304, and yet somehow this video has a resolution of 7,680 x 4,320.  Which means that either this video, or the video standard must be lying.

Well, not exactly.  The frame resolution is Full Ultra High Definition (FUHD - 7,680 x 4,320), and the video codec is H.264 / AVC.  It’s just a non-standard H.264 / AVC.

Being able to make and use your own non-standard H.264 (or any other codec) video files is a really useful trick, and right now it’s an important thing to know for working with 8K video files.  Specifically, it’s important to know what benefits and drawbacks working outside the standard format offers and how to make the best use out of them.


Background

In 2014, a client asked about 5K, high frame rate footage to use on a demonstration display.  Since we’d been filming all of our videos at 5K resolution, remastering the files at their native camera resolution wasn't an issue and we were happy to work with them.

But as things moved forward with their marketing team we ran into a little problem.  We had no problem creating and playing 5K files on our systems, but when their team tried to play back the ProRes or DPX master files on their Windows based computer (which they were required to use for the presentation), they weren’t available to get real-time playback.  Why not? The ProRes couldn’t be decoded fast enough by the 32 bit QuickTime player on Windows, and the DPX files had too high of a data rate to be read from anything but a SAN at the time.

Fortunately, we’d already been experimenting with encoding 5K files in a few different delivery formats: High Efficiency Video Coding (HEVC / H.265), VP8 and VP9, and Advanced Video Coding (AVC / H.264).  The HEVC was too computationally complex to be decoded in real time for 5K HFR, since there were no hardware decoders that could handle the format (even in 8 bit precision) and FFMPEG still needed optimizations to playback HEVC beyond 1080p60 in real-time, on almost every system.  The VP8 and VP9 scared the client, since they weren’t comfortable working with the Matroska video container (for reasons they never explained - quality wise, this was the best choice at the time), which left us with H.264.

Which is how we delivered the final files: AVC Video with AAC Audio in an MP4 container, at a resolution of 5,120 x 2880, though we ended up dropping the playback frame rate to only 30fps for better detail retention.

Finding a way to encode and to play back these 5K files in H.264 wasn’t easy.  But once we did, we opened up the possibility of delivering files in any resolution to any client, regardless of the quality of their hardware.

So how did we do it?  We cheated the standard.  Just like Google does for 8K streaming on YouTube.  And for delivering VR video out of Google’s Jump VR system.

And since you’re probably now asking: “how do you cheat a standard?”, let’s review exactly what standards are.


Standards

Standards like MPEG-4 Part 10, Advanced Video Coding (AVC) / ITU-T Recommendation H.264 (H.264) exist to allow different hardware and software manufacturers to exchange video files with the guarantee they’ll actually work on someone else’s system.

Because of this standards have to impose limits on things like frame size, frame rate for a given frame size, and data rate in bits per second.  For AVC/H264, the different sets of limits are called Levels.  At its highest level, Level 5.2, AVC/H.264 has a maximum frame size of 4,096 x 2,304 pixels @ 56 frames per second, or 4,096 x 2160 @ 60 frames per second, so that standard H.264 decoders don’t have to accommodate any frame size or frame rate larger than that.

Commercial video encoders like those paired with the common NLEs Adobe Premiere, AVID Media Composer, and Final Cut Pro X, assume that you’ll want the broadest compatibility with the video file, so the software makes most of the decisions on how to compress the file, and strictly adheres to the available limits.  Which for H.264 means that you’ll never be able to create an 8K file out of one of these apps.

While standards allow for broad compatibility, sometimes codecs are needed to work in a more limited use setting. “Custom video solutions” are built for a specific purposes, and may need frame sizes, frame rates, or data rates that aren’t standard. This is where the standard commercial AVC/H.264 encoding softwares often won’t work, and you either write a new encoder yourself (time consuming and expensive) or turn to the open source community.

Open source projects for codec encoding and decoding, like the x264 encoder/decoder implementation of the H.264 standard, often write code for all parts of the standard. x264 even includes playback features beyond the AVC/H.264 standard, specifically an ‘undefined’ or ‘unlimited’ profile or level where you can apply H.264 compression to any frame size or frame rate. The catch is that it just won’t playback with hardware acceleration because it’s out of standard; it’ll need a software package that can decode it.

Spend enough time with codecs and compression and you’ll run across a term: FFMPEG.  FFMPEG is an open source software package that provides a framework for encoding or decoding audio and video. It’s free, it’s fast, and it’s scriptable (meaning it can be automated by a server) so a lot of companies who don’t write audio-video software themselves can simply incorporate FFMPEG and codec libraries like x264 for handling the multimedia aspect of their programs.

Which is exactly what YouTube does.

"Writing application : Lavf56.40.101" indicates the file was written using FFMPEG in this 8K file from YouTube.

That’s right, when you upload a video to YouTube, Google’s servers create encoding scripts for FFMPEG, which are sent off to various physical servers in Google’s data centers to encode all of the different formats that YouTube uses to optimize the video experience for desktops, televisions, phones, and tablets, and for internet connections ranging from dial-up to fiber optic.

And for 8K content streaming on the desktop, that means encoding it in 8K H.264.


Why AVC/H.264 for 8K?

Which, of course, leads us to our last two questions: Why H.264 and not something else? And How can you do it too?

For YouTube, using AVC/H.264 is a matter of convenience.  At the time that YouTube launched 8K support (and even today) HEVC/H.265, which officially supports 8K resolutions, is still too new to see broad hardware acceleration support - and even then few hardware solutions support at 8K resolution.  (Side note - as of the last time we tested it [Jan 2017] the open source HEVC/H265 encoder x265 struggles with 8K resolutions, so there’s that too).  Google’s own VP9/VP10 codecs still weren't ready for broad deployment when 8K support was announced, and hardware VP9 support is just starting to appear.

YouTube selecting either HEVC/H.265 or the VP9/VP10 codecs would severely limit where 8K playback would be allowed.  And since software decoding 8K H.264 can work in real time while H.265 doesn’t on most computers (H.264 is about 5 - 8 times less processor intensive than H.265) we have YouTube streaming in 8K in the AVC/H.264 codec, at least until VP10 or H265 streaming support is added to the platform.


Encoding 8K Video into H.264

So you want to encode your own 5K or 8K H.264?  It’s easy - just download FFMPEG and run it from the command line.  Just kidding, that’s a horrible experience.  Use FFMPEG, but run it through a frontend instead.

The syntax for running FFMPEG from the command line can get a little complicated.

An FFMPEG frontend is a piece of software that gives you a nicer user interface to decide your settings, then sends off your decisions to FFMPEG and its associated software to do the actual work.  Handbrake is a good example of a user-friendly cross platform front end for simple jobs, but it doesn’t give you access to all the options available.  The best that I’ve found for that is a frontend called Hybrid.

Hybrid is a little more complicated than, say, Adobe Media Encoder, but it gives you access to all of the features that are available in the x264 library (i.e. all of the AVC/H.264 standard + out of standard encoding) instead of the more limited features that other packages give you.  It’s a cross-platform solution that works on Windows and MacOS, it’s updated regularly to add new features and optimizations, and it by default hides some of the complexity if you just want to do a basic encode with it.


Hybrid

Here are the settings we’d use for a 5K or higher H.264 video:

Main Pane of Hybrid showing where to select the audio and video codecs, and where to set the output file name.

On the first pane of the program, select your input file, generate or select your output file name, and decide on which video codec you want to use (in this example, x264) and whether to include audio or not (set it to custom).

Set the Profile and Level to None/Unrestricted to encode high bitrate 8K video

Now, under the x264 tab, make the following changes: Switch the encoding mode to “average bitrate (1-pass)”, and change the Bitrate (kbits/s) value to 200,000.  That’ll set our target bitrate to 200Mbps, which for 8K it’s the equivalent quality as 50Mbps for 4K.

Then, under the restriction settings, change the “AVC Profile/Level” drop downs to “none” and “unrestricted”.  Leave everything else the same and jump over to the Audio tab at the top.

Add the audio by selecting "Audio Encoding Options" and then clicking the plus to add it to the selected audio options

In the audio tab, add an audio source if your main source file doesn’t have one, turn on the Audio Encoding Options pane by using the check box, choose your audio format and bit rate (in this case I’m using the default AAC with 128 kbps audio, then click the big plus sign at the top right of the audio cue to add that track of audio to your output file.

What to click to add your job to the queue and get the queue started

That’s it.  You’re done.  Jump back to the Main tab, click the “add to queue” button to add your job to the batch, and either follow the same steps to add another, or click on “start queue” to get things rendering.

When you’re done you’ll find yourself with a perfectly useable 8K file compressed into H.264!


Who Cares?

Is this useless knowledge to have?  Not if you regularly create 8K video for YouTube, or if you create VR content using the GoPro Odyssey rig with Google Jump VR.  In both of those cases you’ll need to upload an 8K file.  While the ProRes format works, it’s quite large (data wise) and may be problematic for upload times.  Uploading AVC/H.264 is a better option in some cases, and it can always be used as a delivery file for 8K content when data rates prohibit DPX or an intermediate format.

To playback files created this way, you need a video player that also leverages lightweight playback and non-standard video, like MPC-HC on Windows or MPV on Windows or MacOS.  Sometimes QuickTime will work, though it rarely works on Windows because it’s still a 32 bit core, and VLC is also a solid option in many cases.  But both of those have more overhead than FFMPEG core players and can cause jittery playback.

Spending time learning new programs, especially ones that aren’t at face value user friendly like Hybrid or FFMPEG, doesn't seem like it’ll pay off.  But the process of discovery, trial, and error is your friend when you’re trying to stay ahead of the game in video.  Don’t be afraid to test out something new.

It’s how we were able to deliver 5K video content to a client when no one else could, and how we still stay at the forefront of video technologies today.

Shooting 360 VR with GoPro Odyssey and Google Jump VR

This past fall Mystery Box produced and finished a 360 VR short action film in collobration with our good friends over at Team Supertramp and we ended up filming with the GoPro Odyssey using Google’s Jump VR system. We have filmed with several other 360 video rigs in the past (Freedom 360, Nokia Osmo, Fly 360, among others) we were excited to see what this new setup could and couldn’t do, and how it would affect the story we were looking to tell. 

When the opportunity to film with the GoPro Odyssey and to use the Google Jump VR system for processing came up and we jumped on it.  We really went into it blind, we wanted to spend a day in demo testing out shoots, but scheduling wouldn't allow time for prep.  But we dove right in and problem solved as we went. 

Unlike every other 360 camera we’d tested, the GoPro Odyssey and Google Jump VR got us inspired by the medium.  It gave us far better results than the other systems we’d used.  The workflow had its inconvenient moments and the system has its limitations, but we finished the project wanting to do more with the rig, rather than the usual sentiment of “never again”. 

Which may make you wonder: how is it different, and why does it matter?



The Rig

The GoPro Odyssey Rig

The GoPro Odyssey is essentially a ring of 16 GoPro Hero 4 Blacks, all tied together with a unified power source and custom firmware that links their controls.  The cameras are aligned vertically, and shoot in 2.7K 4:3 aspect ratio at 30fps (29.97fps).  Each camera has an effective coverage of about 45 degrees horizontally (around the center axis of the rig), and 120 degrees vertically.

If you look at those numbers you’ll notice a couple of “wait, what?” questions.

First, yes, it doesn’t do a full 360 x 180 sphere, there are cones of occlusion on the top and bottom of the rig.  There’s a screw mount in the center of the rig that you can add another GoPro to if you want to fill in the missing gap on the top, but if you’re in the normal planar orientation it’s actually pretty rare for a viewer to look up when watching the video.  If you’re hanging the rig upside down and flying it, you’ll probably want to throw one there for the downward looking orientation.  Otherwise, the screw mount on the top is perfect for mounting a spatial microphone like the Zoom H2N and getting ambisonic audio as well.  But I’ll get to that in a minute.

But what about the horizontal angle of view - 45 degrees per camera?  360 degrees / 16 cameras is 22.5 degrees per camera.

True.  Unless you’re shooting in 3D.

That’s probably the biggest draw of the Odyssey rig, one that makes the VR experience that much better than other rigs we’ve used.  Yes, you’re limited to a torroidal field of view, but when viewed on a Google Cardboard, Oculus or other head mounted VR display, the 3D immersion really sells the final image.

How does it capture 3D?  Parallax, of course.  That’s right: the same concept that creates problems and stitching issues on most rigs allows for 3D on the GoPro Odyssey.

In case this is the first time you’re encountering parallax, briefly, parallax is the concept that viewing the same scene from different positions renders slight differences in the relative positions of objects.  For human vision, the interocular distance (distance between the eyes) creates a point of view difference in the image coming from each eye that our brain uses to provide the bulk of our depth perception.  There are other cues for perceiving depth, but none is as important as the parallax information.

GoPro Odyssey Single Camera Field of View.  Notice the amount of overlap between adjascent cameras

The part of the Camera's Field of View actually used in the final image.  The outer horizontal edges of each frame are discarded

Each camera within the Odyssey rig sees a 90 degree arc on the horizontal axis.  Usually, the outer 22.5 degrees (ish - these angles are approximate because of how the stitching process works) are discarded because this is where the greatest parallax and distortion occurs.  The center 45 degrees is then cut into 22.5 degree segments, with each adjacent camera around the rig forming a left-eye / right-eye pair.

How the center 45° of each GoPro Camera is split into a Left and Right eye, to form stereo pairs between adjacent cameras.

The 16 “left eye” arcs of 22.5 degrees are then blended into one equirectangular image and the 16 “right eyes” arcs are blended into a second.  The two equirectangular images are combined into a single over-under 3D video file which you can edit and work with.

Left and Right eye image slices from all 16 GoPros in the Odyssey form the slices that are combined into the left and right eye stitched equirectangular images.

Google’s Jump VR system handles the stitching, which makes our control-freak workflow specialist a little nervous.  You upload the footage from all 16 of the cameras, Google’s servers stitch it and you get it back a couple of days later.

How can you tell if everything’s worked right?  You can’t.


Weakpoints

With the Odyssey rig is that there is no way to do playback, which is a rarity in digital filmmaking.  Using it reminds us of when we used to shoot film, where our ability to replay previous clips was limited, but in this case it’s worse.  With a 360 degree field of view, there’s nowhere for the director to hide and see how the take went.

In other words, there’s no way to see what the camera’s seeing, review what’s been shot, or stand to the side and watch the performances from there.  So how are we supposed to know if we got the shot?

During a tutorial from Google, we were shown images of directors hiding underneath the tripod to watch the action.  Unfortunately, this wouldn’t work for us since we had the camera mounted on a cart, that blocked the view of anyone hiding underneath, and statically sitting under the tripod made it really tough to turn around and follow the action through the whole shot.

Nowhere to hide in the cart!

Our solution?  We simply added more GoPros! We dialed in the settings of two GoPro Hero4 Blacks to match the ISO and frame rate as the Odyssey, and live streamed/captured the action from two mounting points underneath the Odyssey’s chassis. This gave us almost the full 360 degrees of view as footage we were able to play back, show actors/clients, and review several times to give notes and make adjustments.

Bottom Mounted GoPros for live monitoring, capture and playback.

Image from the Reference GoPro side by side with the image from the Odyssey (Uncolored Stitch).  The reference GoPro was used to stream live to a tablet and give us playback.

Odyssey's battery hidden below the rig - notice the size!

Another fairly major limitation of the Odyssey rig is that in order to power 16 GoPro’s you need a hefty quad v-mount battery bank.  When coming from the Fly360 (which is about the size of a baseball) or the Freedom 360 which is about the size of a volleyball, this was a huge jump.  The Odyssey rig itself is large and cumbersome, but the battery was even heavier.  The battery being separated from the rig proves challenging to keep it hidden whenever you want to move the camera or attach it to anything.  Once again, most directors place it under the tripod.

On the plus side though, the large battery enabled us to film all day, and since we were pushing the rig around on a cart the whole shoot, the size and weight didn’t pose a huge issue to us, this time anyway. Maybe we should always have a cart? (Problem solved!)


Maybe it's Us?

Because the rig is still new, there were some challenges we ran into while using it that that should go away with time.  The first was how careful you have to be when using the camera.  The cameras are controlled by a daisy chained set of network cables that run from camera 1 to camera 16, and everything is controlled by GoPro #1.  This creates a pretty substantial delay every time we powered on, rolled the camera, cut the camera, or made adjustments to the settings; we have to take things slowly and wait 2-3 seconds betweens actions as camera 1 synced with the other 15 cameras.  Rolling on the rig, for instance involves pushing record on GoPro #1, which beeps to confirm, followed by a slight delay and then a chorus of beeps from all of the GoPros which signals the actual start of recording.  Once we got used to the delayed pace it became second nature

A second issue we ran into was that we had to power down the whole system (including the battery) after each take. This took some getting used to, as after each “CUT!” it would take 3 delayed button pushes to power down the system (stop recording, power off Odyssey, power off battery).  This might have been because each of our takes were several minutes in length; we don’t know what would have happened if we were using shorter takes. But what we do know is that the one time we did try and squeeze two takes in one power cycle, things went a little haywire.  The rig locked up and we had to do a hard reboot in order to resync the cameras; several of the cameras didn’t finalize their takes and had to rebuild the footage when the rig was repowered before we could start again.

Those hiccups aside the end product was absolutely outstanding. The GoPro Odyssey delivers an 8K x 8K image which enables an amazingly immersive and detailed 3D experience.

4K by 4K Output from GoPro Odyssey and Google Jump VR

One unmarketed advantage to having a 16-lensed 360 camera is that should any of the cameras fail, you are still more than covered for the full 360 field of view. Because we were working with an untested (brand new) camera (as in, showed up from the manufacturer a day or two before we were to use it), we didn’t know until after Google got back to us with our footage that 3 of the 16 cameras were dead on arrival, and their footage was fully corrupted.  This was when the extra field of view overlapping came in to save the day, as Google Jump VR was able to stitch the remaining camera angles together for a seamless image (with only one VFX shot to fix a glitch).  We lost out on the 8K resolution and had to master at 4K 3D, but being able to recover from a 3 camera failure was amazing.

Another advantage to having 16 cameras is that the auto exposure settings are better tuned for the 360 experience. Several times we had the Bear push the cart from outdoors to indoors.  The front facing cameras would adjust to the darker space, while the rear cameras would remain at their outdoor exposure until they were mostly inside the room. In many ways, this makes its autoexposure superior to that of single lens 360 cameras, since they will only be able to have one exposure setting no matter where the viewer is looking: with the GoPro Odyssey, the exposure is tuned for wherever you are looking.  

Autoexposure differences between the camera - the ones facing outdoors have autoexposed darker than the ones facing the interior.


Audio

Just a quick note about audio before we wrap this up.  The Zoom H2n is a fantastic recorder for capturing ambisonic / spatial audio, and setting it up was a breeze.  Any sound within about 20 feet of the rig was captured with surprising clarity.  We mounted it on the top of the rig so that the audio center and alignment matched the Odyssey's alignment:

But we didn't end up using it.

Okay, that's a little lie.  We did use the audio it captured, but not as spatial audio.  We ended up having to convert all of the ambisonic audio into stereo because of the amount of sound design we added to the piece.  Budgetary and time constraints meant that we had to do that part of the project quickly and with our internal team instead of outsourcing.  In order to make the sound design ambisonic we would have needed an expensive audio plugin for Pro Tools (which we don't normally use internally) and double or triple the amount of time used.  That, and there's still no go way of putting music in the space (other than dead center of the space, which effectively makes it monophonic).  Next time we'll leave larger considerations for sound design in the ambisonic space, to better accompany the quality of the VR video experience.


Last Thoughts

Looking back on this project, we were pretty ambitious in pushing the limits of the 360 experience. We wanted to move the camera (handbook said not to), go from dark to light spaces several times (also discouraged) and get super close to the camera (BIG NO NO - you’re supposed stay about 1m away). In all of these scenarios, the GoPro Odyssey and Google Jump post processing held up incredibly well, and inspired us to do future action packed 360 shoots. 

Our tiny cast and crew.

We see the GoPro Odyssey 360 as the best VR camera that’s commercially available, for most applications.  It’s not perfect - far from it - but despite it’s flaws, it’s still a fantastic system and one that we’d recommend using for any project looking to shoot in VR today.

This is how the GoPro Odyssey makes us feel.

Adobe Premiere CC 2017 - Real World Feature Review

About two weeks ago Adobe released their 2017 update to Creative Cloud, and because of a couple of projects that I happened to be working on at the time, I figured I’d download it immediately to see if I could take advantage of some of the new features.

If you want the TL;DR review, the short version is this: most of the features offer genuine improvements, but range in usefulness from incredibly useful to just minor time savers; a few, though, are utter crap.

Side note: I considered talking about the new features found in Adobe After Effects, but really, there’s not much to say other than: they work? Largely they’re just performance increases accomplished by moving things to the GPU, broader native format support, time shortening templating, and better integration with a few other Adobe CC products.  If you look at their new features page, you should be able to pretty quickly figure out which ones could be important to you, and there’s not much else to say about them other than “they work”.

Premiere is a different animal though, and I can’t say that all of the new features work properly.  But let’s start with the positives, of which there are many.

First and foremost, 8K native R3D imports.

This was expected, and necessary.  And while not ‘featured’ as part of their summaries, it is there and it works.  That’s a boon to all of us shooting on Helium sensors, and to our clients.  So far we’ve been running 8K ProRes or 2K proxies for our clients so they could edit with our footage; now they can take care of mastering with the 8K themselves (if they want).  So definitely a plus.

Second, the new native compression engine supporting DNxHD and DNxHR.

To me, this is a big plus.  I keep looking for a solid alternative to ProRes for my workflows, and while they don’t yet support the DNxHR 444, they do solidly support DNxHR HQX.  Since a significant portion of my usual workflows are built on 12 bits per channel and roundtripping between Adobe and DaVinci, having a solid 12 bit 422 cross-platform alternative to ProRes may finally let me get rid of DPX.

Third, the new audio tools.  Oh, thank god, the new audio tools.

I happen to be working this week on a short project doing sound design and light mixing (I’ll link to it when it’s up) and the new audio tools in Premiere have been a massive time saver.  If you’ve ever tried to do audio work directly in Premiere before, you’ll know how maddening it’s been dealing with their unresponsive skeuomorphic effect control knobs.  Even doing basic EQ meant flagging values on and off and struggling to get things as precise as you wanted.

Adobe Premiere CC 2015.3 EQ

Adobe Premiere CC 2015.3 Pitch Shifter

But the new audio UX is… well, fantastic.  I really can’t praise it enough.  The effect controls are still skeuomorphic (which I actually think is important in this case) but look classier, and more importantly actually respond really quickly to the changes you want to make.  They’ve expanded the tools set and the effects run more quickly.  I can’t be happier - this alone saved me hours of frustration and headaches this week.

Adobe Premiere CC 2017 EQ

Adobe Premiere CC 2017 Pitch Shifter

Fourth, the new VR tools.

So the same project I was doing sound design on happens to be a stereoscopic VR project.  So immediately, the promise of new VR tools was exciting - what more would they let me do, I wondered?

Install, fire it up, and… not much, actually.

Here’s basically all of the new VR tools I could find:

  • Automatically detect the VR properties of imported footage, but only if they were properly flagged with metadata (marginally useful, not really useful)
  • Automatically assign VR properties to sequences if you create a new sequence based on properly flagged VR footage.
  • Manual assign VR properties to sequences, allowing you to flag stereoscopic (and the type of 3D used, if any).  The sequence flagging allows for Premiere to automatically flag for VR on export, when supported.
  • Embed VR metadata into mp4 files in the H264 encoder module, instead of just QuickTime.
  • Connect seamlessly to an Oculus or other VR headset with full 360 / 3D output.

Is this 2015.3 or 2017?

Is this 2015.3 or 2017?

Is this 2015.3 or 2017?

Is this 2015.3 or 2017?

Is this 2015.3 or 2017?

And that’s… it.  Really?  I mean, there is actually no difference between the viewers in 2015.3 and 2017, both handle stereoscopic properly; assigning the VR flags to sequences and then embedding the necessary metadata on export VERY useful.  But I would really LOVE to see an editor trying to edit with a VR headset.  Or color correct, for that matter.  Reviewing what you’ve got, sure, but not for the bulk of what you’re doing.

I should note that Premiere chokes on stereoscopic VR files at resolutions greater than 3K by 3K, which makes mastering footage from the GoPro Odyssey interesting, since it comes back from the Google Jump VR system as 8K by 8K mp4s.  Even converting to a full ProRes 422 intermediate at 4K by 4K proved too data heavy for Premiere to keep up with on an 8 Core MacPro.

But it’s not only VR performance that’s an issue: it’s still missing a whole bunch of features that would really make it a useful VR tool.  Where are my VR aware transitions?  What about VR specific effects, like simple reframing?  Where is my VR support in After Effects?  Why can’t I manually flag footage as VR if it didn’t have the embedded metadata?  What about recognizing projections other than equirectangular?  They have a drop down for changing projection type on a timeline, but equirectangular is the only option.  What about native ambisonic audio support? Or even flagging for ambisonic audio on export?

Don’t get me wrong, what they’ve done isn’t bad; it does work, and is an improvement.  It’s just that the tools they added were very tiny improvements on what was already there.  And I know (and use) that there are plugins that give Premiere and After Effects many of the VR features that I need to actually work in VR.  But it's really difficult, almost impossible, to get by without the 3rd party plugins.

Maybe I’m just jaded and judgmental, in part because of my reaction to the HDR 'tools' they announced, but when you advertise “New VR Support” as the second item on the new features list, it had better be good support.  Like, you know, actually work as well in VR as you can in standard 2D video.  If I, as a professional, require third party plugins to your program to make it work at the most basic level, it’s not the turnkey solution you advertise.  I’m sure that more tools are in the works, but for now, it feels lackluster and an engineering afterthought rather than an intelligent feature designed for professionals.


But don’t worry, that’s not their most useless feature change.  Let’s talk about their new HDR tools.

What. The. Hell.

This is how using the new HDR 'tools' in Premiere 2017 feel.

I mean that.  With all of my heart.

I might be a little biased on the subject, but honestly I question who in their right mind decided that what they included was actually something useful.

It’s not.

It’s utter shit.

But worse than that, as-is it’s more likely to hurt the broader adoption of HDR than to help it.

And no, I’m not exaggerating.


On paper, the new HDR tools sound amazing.  HDR metadata on export!  HDR grading tools!  HDR Scopes!  Full recognition of HDR files!  Yay!

In practice, all of these are useless.

Let me give you a rundown of what the new HDR tools actually do.

Premiere now recognizes SMPTE ST.2084 HDR files, which is awesome.  But only if the proper metadata is already embedded in the video stream, and then only if it’s an HEVC deliverable file.  Not a ProRes, DPX, or other intermediate file; only HEVC.  And like VR support above, there’s no way to flag footage as already being in HDR or using BT.2020 color primaries.  Which ends up being a massive problem, which I’ll get to in a minute.

When you insert properly flagged HDR footage into a sequence, you get a pleasant surprise: hard clipping at 120 nits on your viewer or connected external display.  It’s honestly the worst clipping I’ve seen.  And there’s no way to turn it off.  If you go to export the clip into any format without the HDR metadata flag enabled on export, you get the same hard clipping.  And since you can only flag for HDR if you’re exporting to HEVC, you can’t export HDR graded or processed through premiere in DPX, ProRes, TIFFs, OpenEXR or any other intermediate format.

This is why in my article on Grading and Mastering HDR I mention that it’s really important to be using a color space agnostic color grading system.  When the application includes color management that can’t be disabled, your options become very limited.

Also, side note, their HEVC encoder needs work - it’s very slow at the 10 bits you need for HDR export.  I expect it’s better on the Intel Kaby Lake chips that include hardware 10 bit HEVC encoder support that, oh wait, don’t exist for professionals yet (2017 5K iMac maybe?)

But at least with the metadata flagging you can bypass the FFMPEG / x265 encoder that you’ll have needed up to this point to properly encode HDR for delivery, right?

Why would you think that?  Of course you can’t.

Because if you bring in a ProRes, DPX, or other intermediate file into Premiere, there’s no way to flag it as HDR and it doesn’t recognize embedded metadata saying it’s HDR like DaVinci and YouTube do.  What happens is that if you use these intermediates as a source (individually or assembled in a sequence) and you flag for HDR on export, Premiere runs a transform on the footage that scales it into the HDR range as if it’s SDR footage.

12 Bit ProRes 4444 HDR Intermediate in Timeline with 8 Bit Scope showing proper range of values

12 Bit ProRes 4444 HDR Intermediate in Timeline with HDR Scope showing how Premiere CC 2017 interprets the intermediate if you flag for HDR on export

When is that useful? If I have a graded SDR sequence that I want to encode into the PQ HDR space, while keeping 100% of the limits of an SDR image.  Because why the hell not.

But never fear!  Premiere has included new color grading tools for HDR!

Well, they aren’t horrible, which I suppose is a compliment?

How to enable HDR Grading in Premiere 2017

To enable HDR Grading you need to change three different settings.  From the Lumetri context menu in your Lumetri Panel, you need to select “High Dynamic Range” to enable the HDR features; on the scopes you’ll need to switch the scale from “8 Bit” to “HDR” (and BT.2020 from the scope settings); and if you actually want to see those HDR values on the scope, you’ll need to enable the flag “Maximum Bit Depth” in your Sequence Settings.  I’m sure there’s a fantastic engineering explanation for that last one, but it’s not exactly intuitive or obvious, and took me a bit of hunting to figure it out.

Maximum Bit Depth needs to be turned on in Sequence Settings to enable proper HDR Scopes

HDR Scopes WITHOUT Maximum Bit Depth Flag

HDR Scopes WITH Maximum Bit Depth Flag

Once you’ve enabled HDR grading from the Lumetri drop down menu, you’ll get a few new options in your grading panels.  “HDR White” and “HDR Specular” come available under the Basic Correction panel, “HDR Range” comes available under the Curves panel, and “HDR Specular” comes available under the Color Wheels panel.

The HDR White setting seems to control how much the other sliders of the Basic Correction panel behave, almost like changing the scale.  The higher the HDR White value, the less of an effect exposure adjustments have and the greater the effect of contrast adjustments.  The HDR Specular slider controls just the brightest whites, almost like the LOG adjustment I use in DaVinci Resolve Studio.  This applies to both the slider under Basic Correction, and the wheel under the Color Wheels panel.  HDR Range seems to change the scale of the curves similar to how the HDR White slider does for the basic corrections.

All of this, by the way, I figured from watching the scopes, and not the output image.  I’ve tried hooking up a second display to the computer and hooking up our BVM-X300 through our Ultrastudio 4K to Premiere, but to no avail - the output image is always clipped to standard video levels and output in gamma 2.4.

Which, if you ask me, severely defeats the purpose of having HDR grading tools to begin with. Here’s a great idea: let’s allow people to grade HDR, but not see what they’re grading.  Which is like trying to use a table saw blindfolded.  Because that’s a thing people do, right?  Which brings me back to my original premise: What. The. Hell.

When you couple that little gem with the hard clip scaling, you realize that the only reason the color grading features are in this particular version is to make the process of cross grading from SMPTE ST.2084 into SDR easier, and nothing else.

No fields for adding HDR10 Compliant Metadata on Export.  That's okay, you shouldn't use their exporter anyway (at least not this version)

Oh, one last thing of course: here’s the real kicker: you can’t even export HDR10 compliant files.  Yes, I know I said that in the HEVC encoder you can flag for ST.2084, but you can’t add any MaxFALL, MaxCLL, or Master Display metadata.  And yes, I double checked that Premiere didn’t casually put those into the file without telling you (it doesn’t).

And it has zero support for Hybrid Log Gamma.  Way to pick a side, Adobe.


So passions aside, let’s run down the list again of new HDR tools and what they do:

  1. Recognize SMPTE ST.2084 files, but only when already properly flagged in HEVC streams and no other codec or format.
  2. Export minimal SMPTE ST.2084 metadata to flag for HDR, but only works if your source files are already in the HEVC format and already properly HDR flagged (see #1), or if they’re graded in HDR in the timeline, which you can’t see.  Which renders their encoder effectively useless.
  3. Enable HDR grading through a convoluted process, with a minimal but useful set of tools.  But you can’t see what you’re doing, so I'm not sure why they're there.
  4. There is no bullet point 4.  That’s literally all it does.

The question that I have that I keep coming back to is “who do they think is going to use these tools?”  It feels like the entire feature set was a “well, we need to include HDR, so get it in there”.  But unlike the VR tools that you can kind-of build into, these HDR “tools” (I use the word loosely) are really problematic, not just because the toolset is incomplete but because the way that the current tools are implemented is actually harmful to a professional workflow.

Call it simple feature bandwagoning, or engineers that didn’t consult real creative professionals, or blame it on whatever reason you will.  But the fact is, this ‘feature’ is utter shit, which to me sours the whole release, just a little.

My biggest concern here is that while someone like me, who's been working with HDR for a while now, can tell that these will hurt my workflow, Premiere is an accessible editing platform for everyone from amateurs to professionals.  And anyone looking to get into HDR video may try to use these tools as their way in, and their results are going to be terrible.  God awful.  And that hurts everyone - why would we want to adopt HDR when 'most of what people can do' (meaning the amateurs and prosumers who don't know any better) looks bad?

So basically, if Premiere is part of your HDR workflow, don't even think about using their new 'tools'.

HDR Rant over, let’s bring this back to the positive.


Just to reiterate, the new audio tools in Premiere CC 2017 are fantastic.  I can't emphasize that enough.  Most of the rest of the features added are pretty good.  The new team projects collaboration tools, though I haven’t had a chance to use them, appear to work well (though are still in beta).  The new captions are useful, the new visual keyboard layout manager fantastic (though WAAAY long overdue!), and the other under-the-hood adjustments have improved performance.

Should you upgrade?  Yes!  It’s a great upgrade!  Despite my gripes I’m overall happy with what they did!

Just don’t try to use it for HDR yet, and be aware that the new VR tools aren’t really that exciting.

How to Upload HDR Video to YouTube (with a LUT)

Today YouTube announced via their blog official HDR streaming support.  I alluded to the fact that this was coming in my article about grading in HDR because we've been working with them the past month to get our latest HDR video onto the platform. It's officially live now, so we can go into detail.


How to Upload HDR Video to YouTube

Similar to VR support, there are no flags on the platform itself that will allow the user to manually flag the video as HDR after it's been uploaded, so the uploaded file must include the proper HDR metadata.  But YouTube doesn't support uploading in HEVC, so there are two possible pathways to getting the right metadata into your file: DaVinci Resolve Studio 12.5.2 or higher, or the YouTube HDR Metadata Tool.  They are generally outlined in the YouTube support page, but not very clearly, so I think more detail is useful.

I did include a lengthy description on how to manage HDR metadata in DaVinci Resolve Studio 12.5.2+, with a lot more detail than they include on their support page, so if you want to use the Resolve method, head over there and check that out.  I've covered it once, so I don't see the need to cover the how-to's again.

I should note that Resolve doesn't include the necessary metadata for full HDR10 compatibility, lacking fields for MaxFALL, MaxCLL, and the Mastering Display values of SMPTE ST.2086.  It does mark the BT.2020 primaries and the transfer characteristics as either ST.2084 (PQ) or ARIB STD-B67 (HLG), which will let YouTube recognize the file as HDR Video.  YouTube will then fill in the missing metadata for you when it prepares the streaming version for HDR televisions, by assuming you're using the Sony BVM-X300.  So this works, and is relatively easy.  BUT, you don't get to include your own SDR cross conversion LUT; for that you'll need to use YouTube's HDR Metadata Tool.

 

***UPDATE: April 20, 2017*** We've discovered in our testing that if you pass uncompressed 24 bit audio into your QuickTime container out of some versions of Adobe Media Encoder / Adobe Premiere into the mkvmerge tool described below the audio will be distorted.  We recommend using 16 bit uncompressed audio or AAC instead until the solution is found.

 

YouTube's HDR Metadata Tool

Okay, let's talk about option two: YouTube's HDR Metadata Tool.  

Alright, not to criticize or anything here, but the VR metadata tool comes in a nice GUI, but the link to the HDR tool sends you straight to GitHub.  Awesome.  Don't panic, just follow the link, download the whole package, and un-Zip the file.

So the bad news: whether you're working on Windows or on a Mac, you're going to need to use the command line to run the utility.  Fire up Command Prompt (Windows) or Terminal (MacOS) to get yourself a shell.

So the really bad news: If you're using a Mac, the binary you need to run is actually inside the app package mkvmerge.app.  If you're on Windows, drag the 32 or 64 bit version of mkvmerge.exe into Command Prompt to get thing started; if you're on MacOS, right click on mkvmerge.app, select "Show Package Contents", and drag the binary file ./Contents/MacOS/mkvmerge into Terminal to get started:

Right click on mkvmerge.app and select "Show Package Contents"

Drag the mkvmerge binary into Terminal

The README.md file includes some important instructions and the default syntax to run the tool, with the assumption that you're using the Sony BVM-X300 and mastering in SMPTE ST.2084.  I've copied the relevant syntax here (I'm using a Mac; delete anything in bold before copying the command over, and replace the file paths in the **s with your content:)

./hdr_metadata-master/macos/mkvmerge.app/Contents/MacOS/mkvmerge \
-o *yourfilename.mkv* \
--colour-matrix 0:9 \
--colour-range 0:1 \
--colour-transfer-characteristics 0:16 \
--colour-primaries 0:9 \
--max-content-light 0:1000 \
--max-frame-light 0:300 \
--max-luminance 0:1000 \
--min-luminance 0:0.01 \
--chromaticity-coordinates 0:0.68,0.32,0.265,0.690,0.15,0.06 \
--white-colour-coordinates 0:0.3127,0.3290 \

If using a LUT, add the lines
--attachment-mime-type application/x-cube \
--attach-file *file-path-to-your-cube-LUT* \

In all cases end with
*yourfilename.mov*

Beyond the initial call to the binary or executable, the syntax is identical on MacOS and Windows.

The program's full syntax can be found here, but it's a little overwhelming.  If you want to look it up, just focus on section 2.8, which include the arguments we're using here.   The first four arguments set the color matrix (BT.2020 non-constant), color range (Broadcast), transfer function (ST.2084), and color space (BT.2020) by referencing specific index values, which you can find on the linked page.  If you want to use HLG instead of PQ, switch the value of --colour-transfer-characteristics to 0:18, which will flag for ARIB STD-B67.

(Note to the less code savvy: the backslashes at the end of each line allow you to break the syntax across multiple lines in the command prompt or terminal window.  You'll need them at the end of every line you copy and paste in, except for the last one)

The rest of the list of video properties should be fairly self explanatory, and match the metadata required by HDR10, which I go over in more detail here.

Now, if you want to include your own SDR cross conversion LUT, you'll need to include the arguments --attachment-mime-type application/x-cube, which tells the program you want to attach a file that's not processed (specifically, a cube LUT), and --attach-file filepath, which is the actual file you're attaching.

If you don't attach your own LUT, YouTube will handle the SDR cross conversion with their own internal LUT.  It's not bad, but personally I don't like the hard clipping above 300 nits and the loss of detail in the reds, but that's largely a personal preference.  See the comparison screenshots below to see how theirs works.

Once you've pasted in all of the arguments and set your input file path, hit enter to let it run and it'll make a new MKV.  It doesn't recompress any video data, just copies it over, so if you gave it ProRes, it'll still be the same ProRes stream but with the included HDR metadata and LUT that YouTube needs to recognize the file.

Overall, it's It's a pretty fast tool, and extremely useful beyond just YouTube applications.  You can see what it's done in this set of screenshots below.  The first is the source ProRes clip, the second is the same after passing it through mkvmerge to add the metadata only, and the third went through mkvmerge to get the metadata and my own LUT:

ProRes 422 UHD Upload Without Metadata Injection

ProRes 422 UHD Upload in MKV File.  Derived from the ProRes File above and passed through the mkvmerge tool to add HDR Metadata, but no LUT.

ProRes 422 UHD Upload in MKV file.  Derived from the ProRes file above and passed through the mkvmerge tool to add HDR Metadata and including our SDR cross conversion LUT.  Notice the increased detail in the brights of the snake skin, and the regained detail in the red flower.


All of us at Mystery Box are extremely excited to see HDR support finally widely available on YouTube.  We've been working in the medium for over a year, and haven't been able to distribute any of our HDR content in a way that consumers would actually be able to use.  But now, there's a general content distribution platform available with full HDR support, and we're excited to see what all creators can do with these new tools!

HDR Video Part 5: Grading, Mastering, and Delivering HDR

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 5: Grading, Mastering, and Delivering HDR.

In our series on HDR so far, we’ve covered the basic question of “What is HDR?”, what hardware you need to see it, the new terms that apply to it, and how to prepare and shoot with HDR in mind.  Arguably, we’ve saved the most complicated subject for last: grading, mastering, and delivering.

First, we’re going to look at setting up an HDR grading project, and the actual mechanics of grading in the two HDR spaces.  Next, we’re going to look at how to prepare cross conversion grades to convert from one HDR space to the other, or from HDR to SDR spaces.  Then, we’re going to look at suitable compression options for master & intermediate files, before discussing how to prepare files suitable for end-user delivery.

Now, if you don’t handle your own coloring and mastering, you may be tempted simply to ignore this part of our series.  I’d recommend you don’t - not just because I’ve taken the time to write it, but I sincerely believe that if you work at any step along an image pipeline, from acquisition to exhibition, your work will benefit from learning how the image is treated in other steps along the way.  Just my two cents.

Let’s dive in.

NOTE: Much of this information will be dated, probably within the next six months to a year or so. As programs incorporate more native HDR features, some of the workarounds and manual processes described here will likely be obsolete.


Pick Your Program

Before diving into the nitty gritty of technique, we need to talk applications.  Built-in color grading tools or plugins for Premiere, Avid, or FCP-X are a no-no.  Until all of the major grading applications have full and native HDR support, you’re going to want to pick a program that offers full color flexibility and precision in making adjustments.

I’m going to run you through my workflow using DaVinci Resolve Studio, which I’ve been using to grade in HDR since October 2015, long before Resolve contained any native HDR tools.  My reasoning here is threefold: one, it’s the application I actually use for grading on a regular basis; two, the tricks I developed to grade HDR in DaVinci can be applied to most other color grading applications; and three, it offers some technical benefits that we find important to HDR grading, including:

  • 32 bit internal color processing
  • Node based corrections offering both sequential and parallel operations
  • Color space agnostic processing engine
  • Extensive LUT support, including support for multiple LUTs per shot
  • Ability to quickly apply timeline & group corrections
  • Extensive, easily accessible correction toolset with customizable levels of finesse
  • Timeline editing tools for quick edits or sequence changes
  • Proper metadata inclusion in QuickTime intermediate files

Now, I’m not going to say that DaVinci Resolve is perfect.  I have a laundry list of beefs that range from minor annoyances to major complaints (but the same is basically true for every program that I’ve used…), but for HDR grading its benefits outweigh its drawbacks.

My philosophy tends to be that if you can pretty easily make a program you’re familiar with do something, use that program.  So while we’re going to look at how to grade in DaVinci Resolve Studio, you should be able to use any professional color grading application to achieve similar results, by translating the technique of the grade into that application’s feature set.*

If you are using DaVinci Resolve Studio, I recommend upgrading to version 12.5.2 or higher, for reasons I’ll clarify in a bit.

DaVinci Resolve Studio  version 12.5.2 has features that make it very useful for HDR grading and encoding.


Grading in HDR

So now that we’re clear on what we need in a color grading program, let’s get to the grading technique itself.  For starters, I’m going to focus on grading with the PQ EOTF rather than HLG, simply because there’s a lot of overlap between the two.  The initial subsections will focus on PQ grading, but I’ll conclude the section with a bit about how to adapt the advice (and your PQ grade!) to grading in HLG.

Set up the Project

I assume, at this point, that you’re familiar with how to import and set up a DaVinci Resolve Studio project for normal grading using your own hardware, adding footage, and importing the timeline with your sequence.  Most of that hasn’t changed, so go ahead and set up the project as usual, and then take a look at the settings that need to be different for HDR.

First, under your Master Project Settings you’re going to want to turn on DaVinci’s integrated color management by changing the Color Science value to “Davinci YRGB Color Managed”.  Enabling DaVinci’s color management allows you to set the working color space, which as of Resolve Studio 12.5.2 and higher will embed the correct color space, transfer function, and transform matrix metadata to QuickTime files using ProRes, DNxHR, H.264, or Uncompressed codecs.  As more and more applications become aware of how to move between color spaces, especially BT.2020 and the HDR curves, this is invaluable.

Enabling DaVinci YRGB Color Management as a Precursor for HDR Grading

Side note: I’m actually not recommending using their color management for input color space transformations; in fact, for my HDR grades, I actually set the input to “bypass” and the timeline and output color space values to the same values, because I don’t like how these transformations affect how basic grading operations act.  Color management is however a useful starting point for HDR and SDR cross conversions, which we’ll discuss in a bit.

Once color management is turned on, you’ll want to set it up for the HDR grade.  Move to the Color Management pane of the project settings and enable setting “Use Separate Color Space and Gamma”.  This will give you fine-tuneable controls over the input, timeline, and output values.  If you want to keep these flat, i.e. preventing any actual color management by DaVinci, set the Input Color Space to “Bypass” and the Timeline and Output Color Space to “Rec.2020” - “ST.2084”.  This will enable the proper metadata in your renders without affecting any grading operations.

For the purposes of what I’m demonstrating here, if you are using DaVinci’s color management for color transformations, use these settings:

  • Input Color Space - <”Bypass,” Camera Color Space or Rec 709> - <”Bypass,” Camera Gamma or Rec 709>
  • Timeline Color Space - “Rec.2020” - “ST.2084”
  • Output Color Space - “Rec.2020” - “ST.2084”

DaVinci Resolve Studio for embedding HDR metadata in master files, without affecting overall color management.

NOTE: At the time of this writing DaVinci's ACES doesn’t support HLG at all, or PQ within the BT.2020 color space; in the future, this may be a better option to use, if you’re comfortable grading in ACES.

After setting your color management settings, you’ll want to enable your HDR scopes by flagging “Enable HDR Scopes for ST.2084” in the Color settings tab of the project settings.  This changes the scale on DaVinci’s integrated scopes from 10 bit digital values to a logarithmic brightness scale showing the output brightness of each pixel in nits.

How to Enable HDR Scopes for ST.2084 in DaVinci Resolve Studio 12.5+

DaVinci Resolve Studio scopes in standard digital scale, and in ST.2084 nits scale.

If you’re connected to your HDMI reference display, under Master Project Settings flag “Enable HDR Metadata over HDMI”, and under Color Management flag “HDR Mastering is for X nits” to trigger the HDR mode on your HDMI display.

How to enable HDR Metadata over HDMI to trigger HDR on consumer displays.

If you’re connected to a reference monitor over SDI, set the display’s color space to BT.2020 and its gamma curve to ST.2084 (and its Transform Matrix to BT.2020 or BT.709, depending on whether you’re using subsampling and what your output matrix is).

Settings for enabling SMPTE ST.2084 HDR on the Sony BVM-X300

That’s it for settings.  It’s really that simple.


Adjusting the Brightness Range

Now that we’ve got the project set up properly, we’re going to add the custom color management compensation that will allow the program’s mathematical engine to process changes in brightness and contrast in a way more conducive to grading in ST.2084.

The divergence of the PQ EOTF from a linear scale is pretty hefty, especially in the high values.  Internally, the mathematical engine operates on the linear digital values, with a slight weighting towards optimization for Gamma 2.4.  What we want to do is make the program respond more uniformly to the brightness levels (output values) of HDR, rather than to the digital values behind them (input values).

We’re going to do this by setting up a bezier curve that compresses the lights and expands the darks:

Bezier curve for expanding the darks and compressing the whites of ST.2084, for grading with natural movement between exposure values in HDR

For best effect, we need to add the curve to a node after the rest of the corrections, either as a serial node after other correctors on individual clips, on the timeline as a whole (timeline corrections are processed in serial, after clip corrections), or exported as a LUT and attached to the overall output.

Where to attach the HDR bezier curve for best HDR grading experience - serial to each clip, or serial to all clips by attaching it to the timeline.

So what effect does this have on alterations?  Look at the side by side effect of the same gain adjustment on the histogram with and without the custom curve in serial:

Animated GIF of brightness adjustments with and without the HDR Bezier Curve

Without the curves, the upper range of brightnesses race through the HDR brights.  This is, as you can imagine, very unnatural and difficult to control.  On the other hand, the curve forces the bright ranges to move more slowly, still increasing, but at a pace that’s more comparable to a linear adjustment of brightnesses, rather than a linear adjustment of digital values: exactly what we want.

NOTE: DaVinci Resolve Studio includes a feature called “HDR Mode”, accessible through the context menu on individual nodes, that in theory is supposed to accomplish a similar thing.  I’ve found it has really strange effects on Lift - Gamma - Gain that I can’t figure out how is supposed to help HDR grading: Gain races faster through the brights, Gamma is inverted and seems to compress the space, and so does Lift, but at different rates.  If you’ve figured out how to make these controls useful, let me know…

If you've figured out how to use HDR Mode in DaVinci Resolve Studio for HDR grading, let me know!

Once that curve’s in place, grading in HDR becomes pretty normal, in some ways even easier than grading for SDR.  But there are a few differences that need to be noted, and a couple more tricks that will get your images looking the best.  And the first one of these we’ll look at is the HDR frenemy, MaxFALL.


Grading with MaxFALL

If you read the last part in this HDR series about shooting for HDR, you’ll remember that MaxFALL was an important consideration when planning the full scene for HDR.  In color grading you’re likely going to discover why MaxFALL is such an important consideration: it can become frustratingly limiting to what you think you want to do.

Just a quick recap: MaxFALL is the maximum frame average light level permitted by the display.  We calculate each frame average light level by measuring the light level, in nits, of each pixel and taking the average across each individual frame.  The MaxFALL value is the maximum encoded within an HDR stream, or permitted by a display.  The MaxFALL permitted by your reference or target display is what we really need to think about with respect to color grading.

Without getting into the technical reasons behind the MaxFALL, you can imagine it as collapsing all of the color and brightness within a frame into a single, full frame, uniform grey screen, and the MaxFALL is how bright that grey (white) screen can be before the display would be damaged.  Every display has a MaxFALL value, and will hard-limit the overall brightness by dimming the overall image when you send it a signal that exceeds the MaxFALL.

Average Pixel Brightness with Accompanying Source Image

On the BVM-X300, you’ll notice the over range indicator turns on when you exceed the MaxFALL, so that when you look at the display, you can see when the display is limiting the brightness.  On consumer television displays, there is no such indicator, so if the dimming happens when you’re looking away from the screen, you’re likely to not notice the decreased range.  Use the professional reference when it’s available!

BVM-X300 Over Range Indicator showing MaxFALL Exceeded

Just like with CRT displays, the MaxFALL tends to be lower on reference displays than on consumer displays with the same peak brightness, since the size of the consumer displays often reduces the damage produced through the heat generated from the higher current, and the tolerable color deviation in consumer displays allows for lower color fidelities with higher MaxFALLs than a reference display.

So what do we do in grading that can be limited by the MaxFALL attribute?  Here are some scenarios that I’ve run into limitations with MaxFALL:

  1. Bright, sunny, outdoors scenes
  2. Large patches of blue skies
  3. Large patches of white clouds
  4. Large patches of clipped whites
  5. Large gradients into the brightest whites

When I first started grading in HDR, running into MaxFALL was infuriating.  You’re working at a nice clip, when suddenly, no matter how much I raise the brightness of the scene, it just never got brighter!  I didn’t understand initially, since I was looking at the scopes and I was well below the peak brightness available on my display, yet every time I added gain, the display bumped up, then suddenly dimmed down.

When MaxFALL is exceeded, the Over Range indicator lights up and the display brightness is notched down to prevent damage.

Now that I know what I was fighting against, it’s less infuriating, but still annoying.  In generally, I know that I need to keep the majority of the scene around 100-120 nits, and pull only a small amount into the superwhites of HDR.  When my light levels are shifting across frames, as in this grade with the fire breather, I’ll actually allow a few frames to exceed the display’s MaxFALL temporarily, so long as it’s very, very brief, so as not to damage the display when it temporarily averages brighter.

Grading with brief exceeding of target MaxFALL.

When I’m grading content that’s generally bright, with long sets of even brighter, such as this outdoor footage from India, it can be a good idea to keyframe an upper-brightness adjustment to drop the MaxFALL, basically dropping the peak white values as the clipped or white patch takes up more and more of the scene.  This can be visible, though, as a greying of the whites, so be careful.

Tilt-up Shot of Taj Mahal where brightness keyframes were required to limit MaxFALL.  In an ideal world, no keyframes would have been necessary and the final frame would have been much brighter (as shot) than the first.

In other cases, it may be necessary to drop the overall frame brightness, to allow for additional peak brightness in a part of the frame, such as what happened with this shot of Notre Dame Cathedral, where I dropped the brightness of the sky, tree, and cathedral to less than what I wanted to allow the clouds to peak higher into the HDR white range.

Average brightness was limited so that more of the cloud details would push higher into the superwhites without exceeding MaxFALL

In some cases, you really have no choice but to darken the entire image and reduce the value of peak white, such as this shot of the backflip in front of the direct sun - the gradient created nearby the sun steps if I pull the center up to the peak white of the sun, while the MaxFALL is exceeded if I pull up the overall brightness of the image.

MaxFALL limited the white point to only 200 nits because of the quantity of the bright portion of the image and the softness of the gradient around the sun.

The last consideration with MaxFALL comes with editing across scenes, and is more important when maintaining consistency across a set of shots that should look like they’re in the same location.  You may have to decrease the peak white within the series of shots so that on no edit does the white suddenly appear grey, or rather, ‘less white’ than the shot before it.

Three shots with their possible peak brightnesses (due to MaxFALL limitations of the BVM-X300) vs the values I graded them at.

What do I mean by ‘less white’?  I mentioned it in Part 4: Shooting for HDR, but to briefly reiterate and reinforce:


In HDR grading, there’s no such thing as absolute white and black.


HDR Whites & Blacks

From a grading paradigm point of view, this may be the biggest technical shift: in HDR, there is no absolute white or absolute black.

Okay, well, that’s not entirely true, since there is a ‘lowest permitted digital code’ which is essentially the blackest value possible, and a ‘highest permitted digital code’ the can be called the peak brightness - essentially the whitest value possible within the system (encoded video + display).  However, in HDR, there is a range of whites available through the brights, and a range of blacks available through the darks.

Black and white have always been a construct in video systems, limited by the darkest and brightest values displays could produce.  There were the hard-coded limits of the digital and voltage values available.  In traditional SDR color grading, crushing to blacks was simply: push the darks below the lowest legal dark value, and you have black.  Same thing with whites - set the brightness to the highest legal value and that was the white that was available: anything less than that tends to look grey, especially in contrast with ‘true white’ or ‘legal white’.

But in the real world, there is a continuum that exists between blacks and whites.  With the exception of a black hole, there is nothing that is truly ‘black’, and no matter how bright an object is, there’s always something brighter, or whiter than it.

Of course, that’s not how we see the world - we see blacks and whites all around us.  Because of the way that the human visual system works, we perceive as blacks any part of a scene (that is, what is in front of our eyes) that is either very low in relative illumination and reflects all wavelengths of light relatively uniformly, or that is very low in relative illumination such that few of our cones are activated in our eyes and we therefore can’t perceive the ratio of wavelengths reflected with any degree of certainty.  Or, in other words, everything that is dark with little saturation, or so dark that we can’t see the saturation, we perceive as black.

The same thing is true with whites, but in reverse.  Everything illuminated or emitting brightness beyond a specific value, with little wavelength variance (or along the normal distribution of wavelengths) we see as white, or if things are so bright that we can’t differentiate between the colors reflected or emitted, we see it as white.

Why do I bring this up?  Because unlike in SDR video where there is a coded black and coded white, in HDR video, there are ranges of blacks and whites (and colors of blacks and whites), and as a colorist you have the opportunity to decide what level of whiteness and blackness you want to add to the image.

Typically, any area that’s clipped should be pushed as close as possible to the scene-relative white level where the camera.  Or, in other words, as high as possible in a scene with a very large range of exposure values, or significantly lower when the scene was overexposed and therefore clipped at a much lower relative ratio.

Clipping in an image with wide range of values and tones vs clipping in image with limited range of values and tones

Since this is different for every scene and every camera, it’s hard to recommend what that level should be.  I usually aim for the maximum value of the display or the highest level permitted by MaxFALL if my gradient into the white or size of the clipped region won’t permit it to be brighter.

So long as the light level is consistent across edits, the whites will look the same and be seen as white.  If, within a scene, you have to drop the peak brightness level of one shot because of MaxFALL or other considerations, it’s probably going to look best if you drop the brightness level of the whites across every other shot within that same scene.  In DaVinci, you can do this quickly by grouping your shots and applying a limiting corrector (in the Group Post-Clip, to maintain the fidelity of any shot-based individual corrections).

Sometimes you may actually want a greyer white, or a colored white that reads more blue or yellow, depending on the scene.  In fact, when nothing within the image is clipping and you don’t have other MaxFALL considerations, it’s very liberating to decide the absolute white level within an image.  Shots without any ‘white’ elements can still have colored brights at levels well above traditional white, which helps separate the relative levels within a scene in a way that could not be possible with traditional SDR video.

The only catch, and this is a catch, is that when you do an SDR cross conversion, some of that creativity can translate into gross looking off-whites, but if you plan specifically for it in your cross conversion to SDR, you should be able to pull it off in HDR without any issues.

Blacks have a similar tonal range available to them.  You have about 100 levels of black available below traditional SDR’s clipping point, and that in turn creates some fantastic creative opportunities.  Whole scenes can play out with the majority of values below 10 nits.  Some parts of the darks can be so dark that they appear uniform black, until you block out the brighter areas of the screen and suddenly find that you can see even deeper into the blacks.  Noise, especially chromatic noise, disappears more in these deep darks, making the image appear cleaner than it would in SDR.  All of these offer incredible creative opportunities when planning for production, and I discussed them in more detail in Part 4: Shooting for HDR.

So how do you play with these whites and blacks?

The two tools I use on a regular basis to adjust my HDR whites and blacks are the High and Low LOG adjustments within DaVinci.  These tools allow me to apply localized gamma corrections to specific parts of the image, that is, those above a specific value for the highs adjustment, and those below a specific value for the lows adjustment.

DaVinci Resolve Studio's LOG Adjustment Panel

In SDR video, I typically use LOG adjustments on the whites to extend contrast, or to adjust the color of the near-whites.  In HDR, I first adjust the “High Range” value to ‘bite’ the part of the image that I want, and then pull it towards the specific brightness value I’m looking for.  This often (but not always) involves pulling up a specific part of the whites (say, the highlights on the clouds) to a higher brightness value in the HDR range, for a localized contrast enhancement, though I do use it to adjust the peak brightness too.

Effect of LOG Adjustments on an HDR Image with Waveform.  Notice the extended details in the clouds.

In SDR video, I’d typically use the low adjustment to pull down my blacks to ‘true black’, or to fix a color shift in the blacks I’d introduced with another correction (or the camera captured). In HDR, I use the same adjustment to bite a portion of the lows and extend them through the range of blacks, increasing the local contrast in the darks to make the details that are already there more visible.

The availability of the LOG toolset is one of the major reasons I have a preference for color grading in DaVinci, and what it lets you do quickly with HDR grading really helps speed up the process.  When it’s not available its functionality is difficult to emulate, with finesse, using tools such as curves or lift-gamma-gain.  Typically, I’ve found it generally requires a secondary corrector limited to a specific color range and then using a gamma adjustment, which is a very inelegant workaround, but one that works.


Futureproofing

Once the grade is nearly finalized, there’s a couple of things that you may consider doing to clean up the grade and make it ‘futureproof’, or, to make sure that things you do now don’t come back to haunt the grade later.

If you’ve been grading by eye, any value above the maximum brightness of your reference display will be invisible, clipped at the maximum display value.  If you’re only ever using the footage internally, and on that display only, don’t worry about making it future proof.  If, however, you’re intending on sharing that content with anyone else, or upgrading your display later, you’ll want to actually add the clip to your grade.

The reasoning here I think is pretty easy to see: if you don’t clip it your video signal, your master will contain information that you can’t actually see.  In the future, or on a different display with greater latitude, it may be visible.

There are a couple of ways of doing this.

One that’s available in DaVinci is to generate a soft-clip LUT in the Color Management of the project settings, setting the top clip value to the 10 bit digital value of your display’s maximum nits value (767, for instance for 1000 nits max brightness display using PQ space).  Once you generate the LUT, attach it to the output and you’ve got yourself a fix.

Generating a Soft Clipping LUT for ST.2084 at 1000 nits in DaVinci Resolve

Alternatively, you can adjust your roll off curve that we’re using for making uniform brightness adjustments so that it comes as close to limiting the maximum displayable value as you can get, by extending the bezier curve into a near flat line that lands at your target maximum

Bezier curve for HDR grading with flatter whites to minimize peak range

But sometimes you may want to leave those values there, so that when the next generation of brighter displays comes around, you may find a little more detail in the lights.  What’s really important here is that you make white white, and not accidentally off-white.

If you’re working with RAW footage that allows you to adjust the white balance later, you may find that where white ‘clipped’ on the sensor isn’t uniform in all three channels.  This can happen too with a grading correction that adjusts the color balance of the whites - you can end up with separate clips in the red, green, and blue channels that may be clipped an invisible on your display, but will show up in the future.

Waveform of clipped whites with separated RGB Channels.  This is common with RAW grading with clipped whites at the sensor and the ability to control decoded color temperature.

The simple fix here is to add a serial node adjustment that selects, as a gradient, all values above a specific point, and desaturate the hell out of.  Be careful to limit your range to low saturation values only (so long as they encompass what you’re trying to hit) so that you don’t accidentally desaturate other more intentionally colorful parts of the image that just happen to be bright.

How to fix RGB separated clipped whites: add a serial node with a Hue/Saturation/Luminance restriction to just the whites and reduce their saturation to zero.

Working with Hybrid Log Gamma

Up to this point the grading techniques I’ve been discussing have been centered on grading in PQ space.  Grading in Hybrid Log Gamma is slightly different in a couple of important ways.

As a quick refresher, Hybrid Log Gamma is an HDR EOTF that intends to be partially backwards compatible with traditional gamma 2.4 video.  This is a benefit and a drawback when it comes to HDR grading.

If you have multiple reference displays available, this is an important time to break them out.  Ideally, one display should be set up in HLG with a system gamma of 1.2 (or whatever your target system gamma is), and the second should be set up in regular gamma 2.4.  That way, whatever grading you do you can see the effect immediately on both target systems.  Otherwise you’ll need to flip back and forth between two HDR and SDR modes on a single reference display in your search for ‘the happy medium’.

Grading HLG with two reference displays - one in HDR, one in SDR, to ensure the best possible contrast in both.

Most of the project and grading setup is identical to grading with the PQ EOTF, with the exception of the bezier curve in serial that adjusts the brightness response.  In HLG we don’t want to expand the darks, since the HLG darks are identical to the gamma 2.4 darks, so we want that part of the curve to be more linear, before easing into our compression of the highs.

Bezier curve for HDR grading in Hybrid Log Gamma.  This curve replaces the ST.2084 Bezier curve added earlier.

Once that’s in place, the rest of the grading process is similar to grading in PQ.  In fact, you can replace the ST.2084 bezier curve with this curve and your grade should be nearly ready to go in HLG.  The major exception to this is that you still need to regularly be evaluating how the image looks in SDR, on a shot by shot basis.

The biggest complaint I have with grading in HLG is the relative contrast between the HDR and the SDR images.  Because HLG runs up to 5000 nits with its top digital values, if you’re grading in 1000 nits you end up with a white level in the SDR version below the usual peak white.  This often means that the whites in the SDR version look muddied and lower contrast than the same content graded for SDR natively.  This is especially true when the MaxFALL dictates a darker image is necessary and a lower white point is necessary, landing values solidly in the middle ranges of brightness.

Hybrid Log Gamma occasionally has much dimmer and muddied whites, when compared to SDR natively graded footage, due to MaxFALL limitations.

And as if muddied whites weren’t enough, it’s difficult in HLG to find a contrast curve that works for both the HDR and the SDR image: because of how our brains perceive contrast, when the contrast looks right and natural in HDR, it looks flat in SDR because of the more limited dynamic range, while when it looks right in SDR it looks overly contrasty in HDR.

Personally, I find grading in HLG to compounds the minor problems of HDR with the problems of SDR, which I find extremely irritating.  Rather than being happy with the grade, I’m often left with a sense of “It’s okay, I guess”.

But on the other hand, when it’s done, you won’t necessarily have to regrade for other target gamma systems, which is what you have to do when working in PQ.



Cross Converting HDR to HDR & HDR to SDR

Let’s be honest.  A PQ encoded image displayed in standard gamma 2.4 rendering looks disgusting.  The trouble is, we only really want to do the bulk of the grading once, so how can we cheat and make sure we don’t have to regrade every project two or more times?

LUTs, LUTs, and more LUTs!  Also, Dolby Vision.

Dolby Vision is an optional (paid to Dolby) add-in for DaVinci Resolve Studio that allows you to encode the metadata for the SDR cross conversion into your output files.  Essentially, the PQ HDR image is transmitted with metadata that describes how to transform the HDR into a properly graded SDR image.  It’s a nifty process that seeks to solve the dilemma of backwards compatibility.

But I’ve never used it, because we’ve had no need and I don’t have a license.  DaVinci Resolve’s documentation on how to use it with DaVinci is extensive though, and it requires a similar process to doing a standard SDR cross conversion, so take that as you will.  I’ve also heard rumors that some major industry players are looking for / looking to create a royalty-free dynamic metadata alternative that everyone can use as a global standard for transmitting this information - but that’s just a rumor.

For everyone not using Dolby Vision, you’re going to have to render the SDR versions separately from the HDR versions as separate video files.  Here at Mystery Box, we prefer to render the entire HDR sequence as set of clip-separated 12bit intermediate files to make the SDR grade from them, versus adding additional corrector elements to the HDR grade.  This tends to render faster, because you only render from the RAWs once, and make any other post-processing adjustments once instead of on every version.

NOTE: I’m going to cover the reason why later, but it’s important that you use a 12 bit intermediate if you want a 10 bit master, since the cross conversion from PQ to any other gamma system cuts the detail levels preserved by about 2-4 times, or an effective loss of 1-2 bits of information per channel.

When I’m cross converting from PQ in the BT.2020 space to gamma 2.4 in the BT.2020 space, after reimporting and reassembling the HDR sequence (and adding any logos or text as necessary), I’ll duplicate the HDR sequence and add a custom LUT to the timeline.

The fastest way to build this LUT is to use the built-in DaVinci Color Management (set the sequence gamma to ST.2084 and the output gamma to Gamma 2.4) or the HDR 1000 nits to Gamma 2.4 LUT, and then add a gain and gamma adjustment to bring the brightness range and contrast back to where you want it to be.  It’s a pretty good place to start building your own LUT on, and while these tools weren’t available when I started building my first cross conversion LUT, the process they use is nearly identical to what I did.

Using DaVinci Resolve Studio to handle HDR to SDR cross conversion

Using DaVinci Resolve Studio to handle HDR to SDR cross conversion

Once you’ve attached that correction to the timeline, it’s a pretty fast process to run through each shot and simply do minor brightness, contrast, white point, and black point adjustments - Using DaVinci’s built-in LUT / Color Management I can do a full SDR cross conversion for 5 minutes of footage in less than half an hour using this LUT method.  Using my own custom LUT this processes can take less than five minutes.

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 01

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 02

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 03

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 03

HDR to SDR Cross Conversions using a Custom LUT vs. using DaVinci Resolve Studio's integrated conversion + brightness adjustment, Image 04

Notice the detail loss in the pinks, reds, and oranges because of over saturation in the simple downconversion process (images 01 and 04), the milkiness and hue shifting in the darks (images 02) and the fluorescence of the pinks and skin tones (images 03) with a straight downconversion.  This happens largely in the BT.2020 to BT.709 color space conversion, when colors land outside of the BT.709 gamut.  Building a custom LUT can be a great solution to retain the detail.

After prepping the BT.2020 version, making a BT.709 version, for web or demonstration purposes is incredibly easy.  All that you have to do is duplicate the BT.2020 sequence (this is why I like adding LUTs to timelines, instead of to the output globally) and add an additional LUT to the timeline that does a color space cross conversion from BT.2020 to BT.709.  (Alternatively, change the color management settings).  Since the BT.2020 and BT.709 contrast is the same, all I need to do then is run through the sequence looking for regions where reds, blues, or greens end up out of gamut, and bring those back in.  That’s usually less than 5 minutes for a 5 minute project.

Stacked LUTs on a Timeline to combine transformations.

Cross converting from HLG to PQ is fairly simple, since PQ encompasses a larger range of brightnesses than the HLG range and it can fairly easily be directly moved over with a simple LUT or color management tool; you may want to adjust your low-end blacks to take advantage of the deeper PQ space, but it’s otherwise straightforward.

Cross-grading from PQ to HLG is a different animal altogether.  It’s still faster to work from the intermediate than the RAWs themselves, but it’s more than just a simple LUT or color management solution.  Because of the special considerations for HLG, that its contrast has to look good in both HLG and gamma 2.4, you have a lot more work to do finessing the contrast then when you convert ST.2084 into gamma 2.4.  You’ll also run into issues with balancing the MaxFALL in HLG, which in some cases you’ll just have to ignore.

DaVinci’s built-in color management is actually quite good starting point for cross converting from HLG to PQ or PQ to HLG.  It’s important, though, to be aware of how color management injects metadata into QuickTime files, which I’ll address in a second, so that you don’t accidentally flag the incorrect color space or gamma in your master files.

Using DaVinci Color Management to apply an HLG to ST.2084 cross conversion.

Understanding how LUTs work to handle SDR cross conversions is really important, because until there’s a universal metadata method for including SDR grades with HDR content, which in and of itself would essentially be a version of a shot-by-shot LUT, display manufacturers and content delivery system creators rely on LUTs (or their mathematical equivalent) to convert your HDR content into something that can be shown on SDR displays!


Metadata & DaVinci’s Color Management

If you’re using color management to handle parts of your color space and gamma curve transformations, you’re going to need to adjust the Output Color Space each time you change sequences, to match the targeted space of that timeline (in addition to switching the settings on your reference display).  This is actually the biggest reason I prefer using LUTs over color management - it just becomes a hassle to continually have to reset the color management when I’m grading.

Even if you’re not using the color management to handle color space conversions, you’re going to need to make some changes to the color management settings when rendering out QuickTime masters, so that the correct metadata is included into the master files.

Proper Metadata Inclusion for BT.2020 / ST.2084 QuickTime File, encoded in ProRes 4444 out of DaVinci Resolve Studio.

The settings you use depend when you go to render will depend on whether you’re using color management for the transformation or not.  If you are using color management for the transform, change just the Output Color Space to match the target color space and gamma of the timeline to be rendered.  If you aren’t using color management to handle the color conversion, switch both the Timeline Color Space and the Output Color Space to match your target color space and gamma immediately before rendering the matching timeline.  Again, and unfortunately, you will need to make this adjustment every time you go to render a new sequence.  Sorry, no batch processing.

DaVinci Resolve Studio Color Management Settings for transforming color and adding metadata, and adding metadata only.

Grading in HDR isn’t as hard as it originally seems, once you figure out the tricks that allow the grading system to respond to your input as you would expect and predict.  And despite how different HDR is from traditional video, SDR and HDR cross conversions aren’t as hard as they seems, especially when you’re using prepared LUTs specifically designed for that process.


Mastering in HDR

When it comes to picking an appropriate master or intermediate codec for HDR video files, the simplest solution would always be to pick an uncompressed format with an appropriate per-channel bit depth.  Other than the massive file size considerations (especially when dealing with 4K+ video), there are a few cautions here.  

First, for most of the codecs available today that use chroma subsampling, the transfer matrix that converts from RGB to YCbCr is the BT.709 transfer matrix, and not the newer BT.2020 transfer matrix, which should be used with the BT.2020 color space.  This isn’t a problem per-se, and actually benefits out of date decoders that don’t honor the BT.2020 transfer matrix, even with the proper metadata.  It’s also possible to use the use the BT.2020 transfer matrix and improperly flag the matrix used when working with a transcoding application that requires manual flagging instead of metadata flagging.  At its very worst, it can create a very small amount of color shifting on decode.

As slightly more concerning consideration, however, is the availability of high quality 12+ bit codecs for use in intermediate files.  Obviously any codec using 8 bits / channel only are out of the question for HDR masters or intermediates, since 10 bits are required by all HDR standards.  10 bit encoding is completely fine for mastering space, and codecs like ProRes 422, DNxHR HQX/444, 10 bit DPX, or any of the many proprietary ‘uncompressed’ 10 bit formats you’ll find with most NLEs and color correction softwares should all work effectively.

However, if you’re considering which codecs to use as intermediates for HDR work, especially if you’re planning on an SDR down-grade from these intermediates, 12 bits per channel as a minimum is important.  I don’t want to get sidetracked into the math behind it, but just a straight cross conversion from PQ HDR into SDR loses about ½ bit of precision in data scaling, and another ¼ - ½ bit precision in redistributing the values to the gamma 2.4 curve, leaving a little more 1 bit of precision available for readjusting the contrast curve (these are not uniform values).  So, to end up with an error-free 10 bit master (say, for UHD broadcast) you need to encode 12 bits of precision into your HDR intermediate.

ProRes 4444 / 4444 (XQ), DNxHR 444, 12 bit DPX, Cineform RGB 12 bit, 16 bit TIFFs, or OpenEXR (Half Precision) are all suitable intermediate codecs,** though it’s important to double check all of your downstream applications to make sure that whichever you pick will work later.  Similarly, any of these codecs should be suitable for mastering, with the possibility of creating a cross converted grade from the master later.

I just want to note before anyone actually asks: intermediate and master files encapsulating HDR video are still reeditable after rendering - they can be assembled, cut, combined, etc just like regular video files.  You don’t need to be using an HDR display to do that either - they just look a little flatter on a regular display (except if you’re using HLG).  So long as you don’t pass them through a process that drops the precision of the encoded video, you should be fine to work with them in other applications as usual, though you may want to return to DaVinci to add the necessary metadata to whatever your final sequence ends up being.


Metadata

After you’ve made the master, it’s easy to assume you’re done.  But HDR specifications call for display referenced metadata during encoding of the final deliverable stream, so it’s actually important to record this metadata at the time of creation, if you aren’t handling the final encode yourself.  Unfortunately, currently none of the video file formats have a standardized place to record this metadata.

Your options are fairly limited; the simplest solution is to include a simple text file with a list of attribute:value pairs.

Text file containing necessary key : value pairs for an HDR master file that doesn't provide embedded metadata.

What metadata should you include?  It’s a good idea to include everything that you’d need to include in the standard VUI for HDR transmission:

  • Color Primaries
  • Transfer Matrix
  • Transfer Characteristics (for chroma subsampled video)
  • MaxCLL
  • MaxFALL
  • Master Display

When you’re creating distribution files, each of these values need to be properly set to flag a stream as HDR Video to the decoding display.  It’s possible to guess many of these (color space, transfer matrix, etc) if you’ve been provided with a master file without metadata, but it’s much easier to record and provide this metadata at the time of creation so that no matter how long down the line you come back to the master, none of the information is lost.


Distributing HDR

If you’ve made it this far through the HDR creation process, there should really only be one major question remaining: how do we encode HDR video in a way that consumers can see it?

First, the bad news.  There’s no standardization for HDR in digital cinema yet.  So if your intention is a theatrical HDR delivery, you’re probably need to work with Dolby.  At the moment, they’re the only ones with the actual installations that can display HDR, and they have specialists who will handle that step for you.  For most people, what we want to know is how to get an HDR capable television to display the video file properly.

This is where things get more tricky.

I don’t say that because it’s a necessarily complicated process, but only because there’s no ‘drop in’ solutions that are generally available to do it (other than YouTube, very soon).

There are only three codecs that can, by specification, actually be used for distributing HDR video, HEVC, VP9 and AV1 (AV1 is the successor to VP9), and within these only specific operational modes support HDR.  And of these three, the only real option at the moment is HEVC, simply because HDR televisions support hardware based 10 bit HEVC decoding - it’s the same hardware decoder needed for the video stream of UHD broadcasts.

HEVC encoding support is still rather limited, and finding an application with an encoder that supports all of the optional features needed to encode HDR is still difficult.  Adobe Media Encoder, for instance, supports 10 bit HEVC rendering, but doesn’t allow for the embedding of VUI metadata, which means that the file won’t trigger the right mode in the end-viewer’s televisions.

Unfortunately, there’s only one encoder freely available that gives you access to all of the options you need for HDR video encoder: x265 through FFmpeg.

If you’re not comfortable using FFmpeg through a command line, I seriously recommend downloading Hybrid (http://www.selur.de), which is one of the best, if not the best, FFmpeg frontend I’ve found.

Here are the settings that I typically use for encoding HEVC using FFmpeg for a file graded in SMPTE ST.2084 HDR using BT.2020 primaries on our BVM-X300, at a UHD resolution with a frame rate of 59.94fps:

Profile: Main 10
Tier: Main
Bit Depth: 10-bit
Encoding Mode: Average Bitrate (1-Pass)
Target Bitrate: 18,000 - 50,000 kbps
GOP: Closed
Primaries: BT.2020
Matrix: BT.2020nc
Transfer Characteristics: SMPTE ST.2084
MaxCLL: 1000 nits
MaxFALL: 180 nits
Master Display: G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)
Repeat Headers: True
Signaling: HRD, AUD, SEI Info

I’ve only listed the settings that I are different from the default x265 settings, so let me run through what they do, and why I use these values.

First, x265 needs to output a 10-bit stream in order to be compliant with UHD broadcast, SMPTE ST.2084, ARIB STD-B67 or HDR10 standards.  To trigger that mode, that I set the Profile to Main 10 and the Bit Depth to 10-bit.  Unless you’re setting a really high bit rate, or using 8K video, you shouldn’t need a higher Tier than Main.

Next, I target 18 - 50 mbps as an average bitrate, with a 1 pass encoding scheme.  If you can tolerate a little flexibility in the final bitrate, I prefer using this mode, simply because it balances render time with quality, without padding the final result.  If you need broadcast compliant UHD, you’ll need to drop the target bitrate from 18 to 15 mbps, to leave enough headroom on the 20 mbps available bandwidth for audio programs, closed captions, etc.

x265 Main Compression Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

However, I’ve found that 15mbps does introduce some artifacts, in most cases, when using high frame rates such as 50 or 60p.  18 seems to be about the most that many television decoders can handle seamlessly, though individual manufacturers vary and it does depend significantly on the content you’re transmitting.  Between 30 and 50 mbps you end up with a near-lossless encode, so if you happen to know the final display system can handle it, pushing the bitrate up can give you better results.  Above 50 mbps, there are no perceptual benefits to raising the bitrate.

A closed GOP is useful for random seeks and to minimize the amount of memory used by the decoder.  By default, x265 uses a GOP of at most 250 frames, so reference frames can end up being stored for quite some time when using an open GOP; it’s better just to keep it closed.

x265 Frames Compression Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

Next we add the necessary HDR metadata into the Video Usability Information (VUI).  This is the metadata required by HDR10, and records information about your mastering settings, including color space, which HDR EOTF you’re using, the MaxCLL of the encoded video, the MaxFALL of the encoded video (if you’ve kept your MaxFALL below your display’s peak, you can estimate this value using the display’s MaxFALL), and the SMPTE ST.2086 metadata that records the primaries, white point, and brightness range of the display itself.

x265 Video Usability Information Compression Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

This metadata is embedded into the headers of the video stream itself, so even f you change containers the information will still be there.  To make sure that the metadata is stored at regular intervals, and to enable smoother random access to the video stream, the last step is to turn on the option for repeating the headers and to include HRD, AUD, and SEI Info.

x265 Stream Settings for HDR Delivery (Using Hybrid FFMPEG front-end)

The HEVC stream can be wrapped in either a .mp4 or a .ts container; both are valid MPEG containers and should work properly on HDR televisions.  Be aware that it can take a while to get your settings right on the encode; if you’re using Hybrid you may need to tweak some of the settings to get 10-bit HEVC to encode without crashing (I flag on “Prefer FFmpeg” and “Use gpu for decoding” to get it to run stable) - don’t leave testing to the last minute!


Grading, mastering, and delivering HDR are the last pieces you need to understand to create excellent quality HDR video.  We hope that the information in this guide to HDR video will help you to be confident in working in this new and exciting video format.

HDR Video is the future of video.  It’s time to get comfortable with it, because it’s not going anywhere.  The sooner you get on board with it and start working with the medium, the more prepared you’ll be for the forthcoming time when HDR video becomes the defacto video standard.


Endnotes


*The rationale behind the technical requirements will become clear over the course of the article.  I would recommend that you look at the documentation for the application you use to make sure it meets the same minimum technical requirements as DaVinci Resolve when grading in HDR.  Most major color grading programs meet most or all of these technical criteria, and it’s always better to grade in the program you know than in the program you don’t.


However, if you are looking to pick a program right off the bat, I’d recommend DaVinci Resolve Studio, primarily since you can learn on regular Resolve level to learn the application and toolset before even having to spend a dime.


** You should always test that these codecs actually perform as expected with HDR in your workflow, even if you’ve used them for other applications in the past.  I’ve run into an issue where certain applications decode the codecs in different ways that have little effect in SDR, but create larger shifts and stepping in HDR.

HDR Video Part 4: Shooting for HDR

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 4: Shooting for HDR.

The first HDR project I graded was a set space shuttle launch shots, filmed on the RED ONE camera by NASA.  The footage wasn’t filmed with HDR in mind.  In fact, HDR wasn’t anything close to ‘a thing’: the shuttle last flew in 2011, and Dolby didn’t present their proposition of “Perceptual Signal Coding for More Efficient Usage of Bit Codes” (what we now call PQ), until 2012.  And yet despite the age of the footage and the lack of consideration for HDR when it was filmed, I still had no problem grading the footage into HDR space, and getting a pretty awesome image out of it.

HDR Grade from NASA Archive Footage, shot on Red One, circa 2011

I bring this up simply as a point of perspective; while I’m going to offer some suggestions here to help make your footage better in HDR, it’s important to realize that, in general, all footage is better in HDR, regardless of its age or how it was filmed.  That being said, there are things you can do while filming to best prepare for an HDR finish, which is what we’re going to discuss here.


The Kit

Choosing a digital camera today is like how choosing film stock used to be - each one responds differently than others, and can create slightly different looks.  This doesn’t change when you’re shooting HDR.  Beyond a specific point, your camera choice doesn’t matter; camera choice is a creative (or budgetary) decision.

But there are a few features that are important, if not essential, when planning an HDR shoot.  Think of these as the minimum level of kit needed.  I’m going to outline those first.  Then, there are niceties you can add on that will make your life a little easier, and I’ll outline those next.

First and foremost for HDR recording: never, ever, EVER shoot with a standard Rec 709 / BT.1886 / Gamma 2.4 contrast curve.

It’s possible to grade footage that uses one, but the results are pretty poor.  There’s too much clipping of the darks and whites, and the loss of detail kills you.  Linear is okay if it’s a high enough bit depth; a LOG format is better, but native RAW is really the best.  LOG and RAW will preserve more of the detail through the darks while retaining better roll-off into the whites, that will make HDR grading easier / possible.

When you’re shooting with HDR mastery in mind, use the highest bit depth (and bit rate) available.  If you’re using a camera that stores its footage in a compressed 8 bit format, you’re going to do yourself a world of disservice when it comes to grading in HDR.  The same reason that all HDR formats require 10 bits minimum applies to the camera - 8 bits causes stepping.  If you’ve shot in a LOG format, it’s possible to get away with using 8 bit sources, but you can’t push the footage as far as it’s able to go, and are very likely to see stepping in the whites.

If an 8 bit camera is your only option, and you still want / need HDR, consider using an external ProRes, DNxHR, or other high bitrate, intraframe, 10b+ per channel recorder.  You’ll save yourself a world of hurt in post.

Of course, the ideal format to shoot in is a camera RAW format, like Cinema DNG, RED RAW, ARRI RAW, Sony RAW, Phantom RAW, etc.  You’ll love yourself in HDR post for using RAW, even if you typically prefer the turnaround speed offered by a ProRes or DNxHR workflow.  Here’s why:

  1. Most ProRes and DNxHR workflows normalize the RAW footage into a LOG format, which is fine, but they collapse the bit depth range used to 10 bits.  With RAWs you typically have access to the full 12, 14 or 16 bits offered by the sensor!  Your grading application typically uses even higher bit depth internals for color processing, so even if you’re only grading in 10 bit at the display, having the extra 2, 4, or 6 bits per channel, per pixel, is a major advantage in grading latitude, something I can’t emphasize enough as being important to HDR.
     
  2. ProRes and DNxHR workflows typically normalize the camera primaries into Rec. 709 color space.  Most professional cameras use primaries that are wider than Rec. 709, but the video signals they output are typically conformed to Rec. 709 for ‘no brains’ compatibility, that is, it’ll just work.  While this isn’t a problem per-se for HDR, it does restrict the volumes of colors available for grading, and will require a LUT or manual shifting of the primaries in grading to match HDR’s BT.2020 or DCI-P3 space.

    Typically, RAW formats record their data using the camera native RGB values, and your RAW interpreter in your color grading program can then renormalize them to whatever target space you’re looking for.  Since BT.2020 is the widest of all display spaces, you’ll be able to better reproduce what your camera is already capturing.
     
  3. RAW formats often provide highlight and lowlight recovery not available in fixed video formats, even when using LOG or linear recording.  The RAW formats here give you access to as much information as the sensor actually recorded, which is invaluable in post.  Because of the extended dynamic range of the HDR environment, you’ll want as much of the highlights as you possibly can get, and may even at times push further into the noise floor, because the noise is less perceptible in the deep HDR darks.

If you’re shooting at the professional level, and using professional cinematography equipment, what you already have is probably okay for shooting in HDR.  Cameras like the RED Epic or Weapon (or pretty well all of RED’s cameras because of their RAW format), Sony’s F55 or F65, Arri’s Alexa, or any camera in the same class are perfect.  As are most film sources (S35mm or greater, for resolution), when captured with a high bit depth scanner.  Using any of these, you’ll be well suited for HDR, assuming you follow the format’s best shooting practices, which I’ll discuss in the technique section below.

If you’re using a prosumer or entry level professional camera, taking a few preparatory steps to set up how you’ll actually be capturing the image can mean the difference between getting footage that can be used in HDR mastering, vs. footage that can’t.

So in summary, when choosing your kit for HDR, consider:

  1. 16 bpc > 14 bpc > 12 bpc > 10 bpc > 8 bpc: 10 bpc should be your minimum spec.
  2. RAW > LOG > Linear > Gamma 2.4: avoid baking in your gamma at all costs!
  3. Camera Native / BT.2020 > DCI-P3 > Rec. 709 color primaries
  4. RAW > Compressed RAW > Intraframe compression (ProRes, DNxHR, AVC/HEVC Intra) > Interframe compressed (AVC/H.264, HEVC).

As a quick side note, some cameras offer a SMPTE ST.2084 signal or other HDR signal out of the camera for use with an external recorder.  These are a useful replacement to recording externally in either a LOG or gamma format - they can lead to faster turnaround times, but will require an HDR grade (or a dedicated step out of HDR) vs. being ready to be graded in HDR with the option of grading normally.


The Technique

First things first, some general, good advice: take some time learning how your camera and lenses respond to various lighting situations.  How is its roll-off into the highs?  What’s its noise level in the darks?  How does the color response change in different exposure levels?  While this is generally good practice, it’s the kind of forethought you really need in planning an HDR shoot.

When shooting for HDR mastery, you may find that you’ll need to modify your typical shooting technique.  There are three things that are important above all others: protecting your highlights, protecting your darks, and planning the expansion of the dynamic range of your scene.


Protecting Your Highlights

Most of us are rightfully excited about the creative possibilities that come with increased brightnesses at the display, and the expanded range of highlight detail that comes with it.  The catch is that there are some things that used to work well that, frankly, now look like shit.

Clipping.  May you rot in hell.

The large area to the right of the sun has no detail retention in the RAW.

All sensors clip at some level of exposure.  Film does too.  It’s unavoidable.  The goal for HDR shooting is to expose your whites to eliminating clipping the RAW data when possible, and minimizing it when it’s not.

Unlike traditional mastering workflows, where images clipped to white are simple to correct (set the clipped area to true white), clipping in HDR becomes problematic very quickly.  In HDR there is no longer such a thing as “true white”.  Instead, in the grading process (which we’ll discuss in Part 5), we make a creative decision about how bright white should be, and how to roll into it.  That roll into whatever white you pick is essential to tricking the eye to believe whatever you’ve picked to be white is, in fact, white.

The same shot can be graded with different white points in HDR, depending on the goals of the cinematographer & colorist.  Both of these grades work with the snow reading as white; the lighter image feels brighter, while the darker image feels more oppressive and foreboding

The human visual system perceives any object within, or region of a scene, as colored a shade of white (that is, not as a shade of grey, but varying intensities of whites) so long as three conditions are met:

  1. The brightness level is above a specific threshold relative to the rest of the scene, which is usually around 100 nits
  2. The chromatic characteristics are relatively balanced (that is, low saturation)
  3. The area is not completely uniform in brightness level and juxtaposed with a scene (or part of the same frame) with a brighter or more natural roll-into the whites

When talking about clipping, it’s that third condition that ends up being a problem.  Clipped footage typically has large swatches of ‘white’ with an abrupt transition into the patch (once the rest of the footage is graded to a normalized brightness level).

Gentle rolls into clipped white areas appear more natural than abrupt transitions

Deciding what brightness level to place this ‘white’ at becomes problematic for a couple of reasons.

First, you have to limit the brightness of the white patch with respect to the rest of the scene - if it’s too much brighter than everything else (say, everything is under 100 nits and you put the patch at 1000 nits), without roll into the whites you have an obnoxiously bright patch that dominates and overwhelms the rest of the scene.

Second, because these patches typically have a large area, that is, make up a significant portion of the pixels used on the screen, they end up skewing the distribution of brightnesses when calculating the MaxFALL, meaning that everything else in the scene has to be significantly darker than you might like, or you have to bring down the brightness of the white to bring up the brightness of everything else.

The overall brightness around the sun limits the overall peak image brightness due to MaxFALL.  For contrast, I've included both the direct SDR down grade (roll into white between 200 and 500 nits), and the same with the white point restored to full

Third, with the first two effects limiting the overall brightness of the uniform patch, it’s likely to appear grey when cut together with footage that has proper roll into the whites, since that footage is likely to have parts that are much brighter than whatever white you’re able to use for this clipped value.  The overall effect: grossness that pulls you out of the ‘magic’ that HDR creates.

In this sequence, the peak available white point of the middle shot is lower than the two shots that surround it, due to MaxFALL.  In the final grade, the first and third shots were graded with lower peak whites to match

Stop down or use ND: protect your highlights and avoid clipping like the plague.

Some parts of your image, like the sun or bright lights for instance, may clip and that’s okay, so long as they don’t dominate your scene.  You can typically roll into these whites much more subtly than larger clipped areas.

Not all clipping is unnatural, even in HDR

White, puffy clouds also tend to want to clip on most cameras, but don’t let them, if possible.  Because of the frequency that most people see clouds, and see the details in them, you need to preserve as much of that as you can or risk your viewers looking into them and being jarred at the bright uniform shapes that come with clipping, vs. the gentle gradients that come with the more rounded textures.

In HDR the contrast in clouds is much more significant than in SDR, and the clipping in the clouds hurts the realism of the scene

Coupled with this, is the idea that you can’t assume that you’ll get to hide things on the other side of bright windows.  If you camera retains any detail through a window or doorway, it’ll probably be visible.  If you’re hiding crew or equipment through a blown out window, you’ll need to be doubly sure that the window will, in fact, be blown out in HDR. (The same, by the way, is true for your darks - don’t assume they’ll be crushed out.  More on that in a second).

If your monitor out from your camera allows for separate colorimetry than your recorded image signal, you may want to switch it to a LOG curve out so that you can see on your field display or eyepiece where the scene is clipping, if at all, and what details are visible in the brights.


Protecting Your Darks

While the brights tend to get the love when talking about HDR, personally, I love what HDR does for the darks.

Just like with the whites in the image, we have to get rid of the concept of ‘true black’ when discussing HDR.  Instead, we have a range of blacks, just like we have a range of whites.  Two of the three conditions we discussed above describing how the brain perceives whites is true for blacks as well: they need to be below a certain value threshold, and they can’t be large uniform areas juxtaposed with darker regions.  The only difference is that below the brightness threshold the brain typically stops perceiving chromatic value anyway: saturation doesn’t matter (unless you’re trying to supersaturate your darks?)

Just like with our whites, eventually sensors clip to black.  In most video signals, this will be a hard clip, but in many RAW formats (especially those that offer ISO adjustments in post production), the blacks are typically recoverable into the noise floor of the camera.

If you’re planning on using a PQ HDR mastery workflow, you’ll need to assume that most of these darks are in fact visible, beyond what you’d normally consider available.  Which means you may need to be concerned about overall exposure level for the detail in the darks - you can’t necessarily hide equipment there, and need to make sure your production design moves deep into the visible darks.

Details often lost in darks in SDR are often visible in HDR

Even worse, or better, depending on whether you’ve planned for it or not, even after the image is properly graded, areas that appear black with the full HDR grade can ‘open up’ to the eye just by obscuring with your hand the brightest regions of the image, just like how blocking the light from a spotlight pointed at you allows you to see behind the light source.

Simulated images showing details visible in the darks when you block the lights in HDR.  This does not happen in SDR.

The good news is that noise is far less perceivable in the darkest depths of HDR than when the footage is normalized into SDR, largely because of the lack of saturation and our vision’s greater degree of tolerance to luminance noise than to chromatic noise.  So while it’s important to keep important details above the noise floor, it’s not as essential as protecting against clipping.

To the darks the same rules apply: open up or increase your ISO to avoid clipping in the darks.


Planning the Scene in HDR

You may have noticed the two pieces of contradictory advice from the last two considerations: stop down to protect your highlights, and open up to protect your darks: a paradox.  Something has to give: how do you plan for that?

Don’t worry: planning your scene for HDR is actually even more complicated than that.

When you shoot for HDR, you can’t assume that every consumer display will be HDR.  So you need to consider that how the darks and lights will play in both HDR and SDR.  With the whites, it’s relatively simple to adjust your roll into or clip so that it plays well in SDR, but crushing the blacks isn’t always the best option.  Creatively, you may want to highlight action or detail in the darks in a way that will be lost with a simple crush.

Crushing the darks in SDR maintains the mood of the HDR image, but at the expense of detail retention

A solution, of course, would be to bring up part of the darks during post, which increases the visible noise in SDR and may require a clipping or flattening of the whites to maintain the contrast and detail across the scene.

Noise is more perceptible in SDR darks than in HDR darks

Alternatively, you can adjust your lighting to bring up the darks and compress the range, then re-expand the range while color grading in HDR space.  So long as you’re shooting in a RAW format and capturing 12+ bits / channel, you won’t see stepping with this technique, since your mid gradients on a log curve are allocated sufficient bits that expansion is possible.

Another thing to consider when planning the scene is the MaxFALL limitation of HDR mastering.  The overall dynamic range of the scene needs to be planned in a way that the super bright / HDR elements are restricted to a small portion of the overall frame, so as to not push up the frame average light level.  Shooting interiors with a few bright windows or patches of direct sunlight tend to be fine, larger bay windows with cloudy or limited outdoor light also work so long as the eternal ambient isn’t too high (dusk / dawn, not noonday sun).

Both of these shots were done in the same space, about a year apart.  The time of day plays an important role to how much the windows affect the MaxFALL of the scene, with the blown out windows limiting overall brightness.

Particularly problematic are blue skies.  Why?  Because blue skies often take up a much bigger part of a frame than you expect, and contribute more to the MaxFALL since our eyes perceive blue values of similar absolute brightnesses as darker than those of other colors.  What we see as mid range blues can suddenly push up MaxFall and limit your overall scene brightness while still looking ‘normal’ or ‘average’ to the eye.  Exposing for blue skies often means keeping the blueness in the traditional light level range, which can leave the rest of your brights muddied (especially when shooting into the sun).

The amount of the image take up by the blue sky limited the overall MaxFALL of this image.  The result: in HDR, the sky never felt 'bright' like the trees or the tower.

Essentially, when designing your scene for HDR you need to plan the bulk of each frame to land below the traditional film standard light levels, so as to not push up your MaxFALL / average light level.  Of particular concern here is planning your edits for HDR - small patches of direct sun in a darker scene are fine, until you move in for the close up and that small patch behind the actor’s face dominates part of the scene.

In this wide and close pair, the wide shot is only limited by the available peak brightness of the display, while the close up is limited by the MaxFALL

While as an individual shot it’s fine to limit the MaxCLL / peak light level of a close up’s bright patch, when you’re cutting between two shots you’ll need to adjust the wider shot’s MaxCLL to match the MaxCLL permitted by the close up’s MaxFALL.

Or, in plain english, you’ll be limited on the maximum brightness in the wide because the maximum brightness of the close-up will be more limited, if the brighter areas take up more of the frame.  If you’re looking to push the 1000 nits limit of current HDR displays for creative reasons, your scene blocking needs to take in account that average brightness for the close up: plan on minimizing bright areas around the talent or inserts to keep the bright patches bright across a sequence.

Otherwise the shifting brightness levels can be much more visible and leave a ‘greying’ feeling in the more restricted close ups (which the eye would normally perceive as white, except in contrast with something whiter).

Because of the expanded darkness range of HDR, you can design much more ‘dark and moody’ lighting setups than you normally would for standard film or video exhibition.  Whole detailed-filled scenes can play out in levels under 30 nits!  However, be aware that this is a bad idea if you’re intending your work to end up on consumer displays.  In a darkened reference environment, our eyes will adjust to the lower light levels and we’ll see deeper into the darks.  But in a consumer’s home, where the ambient light level around the display may be higher than cinema or grading spaces, the viewer’s adjustment levels may be limited.

You can, however, still allow scenes to play much darker than typically available in television exhibition, keeping your maximum brightness below 80 nits.  This can be used with great effect when cutting between darker average to lighter average scenes: it’s in this contrast that HDR really pops.

In HDR darker footage can be cut in with brighter footage without the details in the blacks feeling milky, or the drop in brightness being jarring.


RED HDRx

If you don’t shoot using the RED ecosystem, ignore this section (seriously). But if you do, all this talk of shooting in HDR may make you tempted to shoot using RED’s HDRx.  I’m not saying this is a bad idea, but I am saying this is difficult to execute.

The real problem here is getting the HDR grade right using the HDRx footage.  We shot with it once, and our takeaway from that is has been: only shoot it if you absolutely, certainly, without a doubt, need it.  Which, in this case, means an increase in the recorded dynamic range of the scene (or scene elements).

RED HDRx Blend in HDR and SDR

The reason not to use it comes down to grading.  Blending the two separately exposed elements is fine in REDCINE, but you’re going to run into difficulties with HDR Grading in REDCINE, simply because of the limited grading toolset.  When you grade in DaVinci, you run into severe performance issues using the API blending tool in the RAW decoding.  DaVinci’s split input tool is better, but you still run into problems compressing the larger dynamic range and maintaining the overall look of HDR video.

In the end, the most efficient (inefficient) workflow was actually grade the shot twice in HDR - once for the standard exposure to grade the darks, and a second time with the blended exposure to grade the lights.  Then, both shots need to be passed through a compositing program like After Effects to selectively decide which set of contrast you want for which parts of the image - far more like traditional HDR photography than HDR video.

Dark and Light Plates in HDR and SDR with Final Image Blend

You can get great results this way, but it’s way, way more involved.

 

A Grain of Salt

Cameras, settings, best practices, planning.  Here’s the caveat: take all this advice with a grain of salt, not as a set of hard and fast rules.

Going back to the story that I opened with, even footage never planned to be shown in HDR can give excellent results.  Comparing the SDR version of the shuttle launch footage to the HDR grade, the HDR looks better.  The darks are darker while preserving all the details, and the range is higher.  This is, of course, an ideal case since high quality RAWs were available; the same is true for film sources when negatives are available.

We’ve done a lot of HDR regrading of our back catalog of footage, and I haven’t found a single shot that looks worse in HDR than SDR (even when ignoring the benefits of BT.2020 and 10 bit displays).

 

But even when you’re limited to just 8 bit log or standard gamma footage, you can often find more detail within an scene when grading in HDR then is perceptible in SDR.  While you’ll want to be far more cautious with how far you push the footage, but you’ll still be able to get good results.

Detail recovery is often possible when grading from sufficiently high quality SDR graded sources



Generally, if you are already following best practices for digital cinematography, and if you spend a little bit of time reviewing HDR grades of your existing footage with a colorist, you’ll quickly get a feel for how the HDR space works and what you can do with it, and that’s when you can unleash your own creative potential.


But once it’s shot and edited, what happens next?  Grading, mastering, and delivering in HDR is our next topic, so stay tuned for Part 5.

HDR Video Part 3: HDR Video Terms Explained

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 3: HDR Video Terms Explained.

In HDR Video Part 1 we explored what HDR video is, and what makes it different from traditional video.  In Part 2, we looked at the hardware you need to view HDR video in a professional environment.  Since every new technology comes with a new set of vocabulary, here in Part 3, we’re going to look at all of the new terms that you’ll need to know when working with HDR video.  These fall into three main categories: key terms, standards, and metadata.


Key Terms

HDR / HDR Video - High Dynamic Range Video - Any video signal or recording using one of the new transfer functions (PQ or HLG) to capture, transmit, or display a dynamic range greater than the traditional CRT gamma or BT.1886 Gamma 2.4 transfer functions at 100-120 nits reference.

The term can also be used as a compatibility indicator, to describe any camera capable of capturing and recording a signal this way, or a display that either exhibits the extended dynamic range natively or is capable of automatically detecting an HDR video signal and renormalizing the footage for its more limited or traditional range.


SDR / SDR Video - Standard Dynamic Range Video - Any video signal or recording using the traditional transfer functions to capture, transmit, or display a dynamic range limited to the traditional CRT gamma or BT.2886 Gamma 2.4 transfer functions at 100-120 nits reference. SDR video is fully compatible with all pre-existing video technologies.


nit - A unit of brightness density, or luminance. It’s the colloquial term for the SI units of candelas per square meter (1 nit = 1 cd/m2). It directly converts with the United States customary unit of foot-lamberts (1 fl = 1 cd/foot2), with 1 fl = 3.426 nits = 3.426 cd/m2.

Note that the peak nits / foot-lamberts value of a projector is often lower than that of a display, even in HDR video: because a projected image covers more area and the image is viewed in a darker environment than consumer’s homes, the same psychological and physiological responses exist at lower light levels.

For instance, a typical digital cinema screen will have a maximum brightness of 14fl or 48 cd/m2 vs. the display average of 80-120nits for reference and 300 for LCDs and Plasmas in the home. HDR cinema actual light output ranges in theaters are adjusted accordingly, since 1000 cd/m2 on a theater’s 30 foot screen is perceived to be far brighter than on a 65” flat screen.


EOTF - Electro-Optical Transfer Function - A mathematical equation or set of instructions that translate voltages or digital values into brightness values. It is the opposite of the Optical-Electro Transfer Function, or OETF, that defines how to translate brightness levels into voltages or digital values.

Traditionally, the OETF and EOTF were incidental to the behavior of the cathode ray tube, which could be approximated by a 0-1 exponential curve with a power value (gamma) of 2.4. Now they are defined values like ‘Linear”, “Gamma 2.4” or any of the various LOG formats. OETFs are used at the acquisition end of the video pipeline (by the camera) to convert brightness values into voltages/digital values, and EOTFs are used by displays to translate voltages/digital values into brightness values for each pixel.


PQ - Perceptual Quantization - Name of the EOTF curve developed by Dolby and standardized in SMPTE ST.2084, designed to allocate bits as efficiently as possible with respect to how the human vision perceives changes in light levels.

Perceptual Quantization (PQ) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

Dolby’s tests established the Barten Threshold (also called the Barten Limit or the Barten Ramp), the point at what the difference in light levels between two values does that difference become visible.

PQ is designed that when operating at 12 bits per channel, the stepping between single digital values is always below the Barten threshold, for the whole range from 0.0001 to 10,000 nits, without being so far below that threshold that the resolution between bits is wasted. At 10 bits per channel, the PQ function is just slightly above the Barten threshold, where in some (idealized) circumstances stepping may be visible, but in most cases should be unnoticeable.

Barten Thesholds for 10 bit and 12 bit Rec. 1886 and PQ curves.  Source

For comparison, current log formats waste bits on the low end (making them suitable for acquisition to preserve details in the darks, but not transmission and exhibition), while the current standard gamma functions waste bits on the high end, while creating stepping in the darks.

HDR systems using PQ curves are not directly backwards compatible with standard dynamic range video.


HLG - Hybrid Log Gamma - A competing EOTF curve to PQ / SMPTE ST.2084 designed by the BBC and NHK to preserve a small amount of backwards compatibility.

Hybrid Log Gamma (HLG) Electro-Optical Transfer Function (EOTF) with Gamma 2.4 Reference

HLG vs. SDR gamma curve with and without knees.  Source

HLG vs. SDR gamma curve with and without knees.  Source

On this curve, the first 50% of the curve follows the output light levels of standard Gamma 2.4, while the top 50% steeply diverges along a log curve, covering the brightness range from about 100 to 5000 nits. As with PQ, 10 bits per channel is the minimum permitted.

HLG does not expand the range of the darks like PQ curve, and as an unfortunate side effect of the backwards compatibility coupled with the max-fall necessitated by the technology of HDR displays, whites can appear grey, when viewed in standard gamma 2.4, especially when compared to footage natively graded in gamma 2.4.


Standards

SMPTE ST. 2084 - First official standardization of HDR video transfer function by a standardization body, and is at the moment (October 2016), the most widely implemented. SMPTE ST.2084 officially defines the PQ EOTF curve for translating a set of 10 bit, or 12 bit per channel digital values into a brightness range of 0.0001 to 10,000 nits. SMPTE ST.2084 provides the basis for HDR 10 Media Profile and Dolby Vision implementation standards.

This is the transfer function to select in HEVC encoding to signal a PQ HDR curve.


ARIB STD-B67 - Standardized implementation of Hybrid Log Gamma by the Association of Radio Industries and Businesses. Defines the use of the HLG curve, with 10 or 12 bits per channel color and the same color primaries as BT.2020 color space.

This is the transfer function to select in HEVC encoding to signal an HLG HDR curve.


ITU-T BT.2100 - ITU-T Recommendation BT.2100 - ITU-T’s standardization of HDR for television broadcast. Ratified in 2016, this document is the HDR equivalent of ITU-T Recommendation BT.2020 (Rec.2020 / BT.2020). When compared with BT.2020, BT.2100 includes the FHD (1920x1080) frame size in addition to the UHD and FUHD, and defines two acceptable transfer functions (PQ and HLG) for HDR broadcast, instead of the single transfer function (BT.1886 equivalent) found in BT.2020.

BT.2100 uses the same color primaries and the same RGB to YCbCr signal format transform as BT.2020, and includes similar permissions of 10 or 12 bits per channel as BT.2020, although BT.2100 also permits full range code values in 10 or 12 bits where BT.2020 is limited only to traditional legal.

BT.2100 also includes considerations for a chroma subsampling methodology based on the LMS color space (human visual system tristimulus values), called ICTCP, and a transform for ‘gamma weighting’ (in the sense of the PQ and HLG equivalent of gamma weighting) the LMS response as L’M’S’.


HDR 10 Media Profile - The Consumer Technologies Association (CTA)’s official HDR video standard for use in HDR Televisions. HDR 10 requires the use of the SMPTE ST.2084 EOTF, BT.2020 color space, 10 bits per channel, 4.2.0 chroma subsampling, and the inclusion of SMPTE ST.2086 and associated MaxCLL and MaxFALL metadata values.

HDR 10 Media Profile defines the signal televisions can decode for the inclusion of “HDR compatibility” term in the marketing of televisions.

Note that “HDR compatibility” does not necessarily define the ability to display in the higher dynamic range, simply to the compatibility to decode and renormalize footage in the HDR 10 specification for whatever the dynamic range and color space of the display happen to be.


Dolby Vision - Dolby’s proprietary implementation of the PQ curve, for theatrical setups and home devices. Dolby Vision supports both the BT.2020 and the DCI-P3 color space, at 10 and 12 bits per channel, for home and theater, respectively.

The distinguishing feature of Dolby Vision is the inclusion of shot-by-shot transform metadata that adapts the PQ graded footage into a limited range gamma 2.4 or gamma 2.6 output for SDR displays and projectors. The colorist grades the film in the target HDR space, and then runs a second adaptation pass to adapt the HDR grade into SDR, and the transform is saved into the rendered HDR output files as metadata. This allows for a level of backwards compatibility with HDR transmitted footage, while still being able to make the most of the SDR and the HDR ranges.

Because Dolby Vision is a proprietary format, it requires a license issued by Dolby and the use of qualified hardware, which at the moment (October 2016) is only the Dolby PRM-4220, the Sony BVM-X300, or the Canon DP-V2420 displays


Metadata

MaxCLL Metadata - Maximum Content Light Level - An integer metadata value defining the maximum light level, in nits, of any single pixel within an encoded HDR video stream or file. MaxCLL should be measured during or after mastering. However if you keep your color grade within the MaxCLL of your display’s HDR range, and add a hard clip for the light levels beyond your display’s maximum value, you can use your display’s maximum CLL as your metadata MaxCLL value.


MaxFALL Metadata - Maximum Frame Average Light Level - An integer metadata value defining the maximum average light level, in nits, for any single frame within an encoded HDR video stream or file. MaxFALL is calculated by averaging the decoded brightness values of all pixels within each frame (that is, converting the digital value of each frame into its corresponding nits value, and averaging all of the nits values within each frame).

MaxFALL is an important value to consider in mastering and color grading, and is usually lower than the MaxCLL value. The two values combined define how bright any individual pixel within a frame can be, and how bright the frame as a whole can be.

Displays are limited differently on both of those values, though typically only the peak (single pixel) brightness of a display is reported. As pixels get brighter and approach their peak output, they draw more power and heat up. With current technology levels, no display can push all of its pixels into the maximum HDR brightness level at the same time - the power draw would be extremely high, and the heat generated would severely damage the display.

As a result, displays will abruptly notch down the overall image brightness when the frame average brightness exceeds the rated MaxFALL, to keep the image under the safe average brightness level, regardless of what the peak brightness of the display or encoded image stream may be.

For example, while the BVM-X300 has a peak value of 1000 nits for any given pixel (MaxCLL = 1000), on average, the frame brightness cannot exceed about 180 nits (MaxFALL = 180). The MaxCLL and MaxFALL metadata included in the HDR 10 media profile allows consumer displays to adjust the entire stream’s brightness to match their own display limits.


SMPTE ST.2086 Metadata - Metadata Information about the display used to grade the HDR content. SMPTE ST.2086 includes information on six values: the three RGB primaries used, the white point used, and the display maximum and minimum light levels.

The RGB primaries and the white point values are recorded as ½ of their (X,Y) values from the CIE XYZ 1931 chromaticity standard, and expressed as the integer portion of the the first five significant digits, without a decimal place. Or, in other words:

f(XPrimary) = 100,000 × XPrimary ÷ 2

f(YPrimary) = 100,000 × YPrimary ÷ 2.

For example, the (X,Y) value of DCI-P3’s ‘red’ primary is (0.68, 0.32) in CIE XYZ; in SMPTE ST.2086 terms it’s recorded as

R(34000,16000)

because

for R(0.68,0.32):

f(XR) = 100,000 × 0.68 ÷ 2 = 34,000

f(YR) = 100,000 × 0.32 ÷ 2 = 16,000

Maximum and minimum luminance values are recorded as nits × 10,000, so that they too end up as positive integers. For instance, a display like the Sony BVM-X300 with a range from 0.0001 to 1000 nits would record its luminance as

L(10000000,1)

The full ST.2086 Metadata is ordered Green, Blue, Red, White Point, Luminance with the values as

G(XG,YG)B(XB,YB)R(XR,YR)WP(XWP,YWP)L(max,min)

all strung together, and without spaces. For instance, the ST.2086 for a DCI-P3 display with a maximum luminance of 1000 nits, a minimum of 0.0001 nit would be, and using white point D65 would be:

G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)

while a display like the Sony BVM-X300, using BT.2020 primaries, with a white point of D65 and the same max and min brightness would be:

G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)

In an ideal situation, it would be best to use a colorimeter and measure the display’s native R-G-B and white point values; however, in all practicality the RGB and white point values the display conforms to that was used in mastering, are sufficient in communicating information about the mastery to the end unit display.


That should be a good overview of the new terms that HDR video has (so far) introduced into the extended video technologies vocabulary, and are a good starting point for diving deeper into learning about and using HDR video on your own, at the professional level.

In Part 4 of our series we’re going to take the theory of HDR video and start talking about the practice, and look specifically about how to shoot with HDR in mind.


HDR Video Part 2: HDR Video Reference Hardware

UPDATE 18 December 2017: We've posted a new blog about using Production HDR monitors for grading in HDR.  This puts HDR grading displays in the sub $4,000 USD range.  Read our post about how to do that and what you'll need here.

To kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 2: HDR Video Reference Hardware.

In HDR Video Part 1 we explored what HDR video is, and what makes it different from traditional video.  Here in Part 2, we’re going to look at what hardware is needed for proper HDR grading (as of October 2016), and how to partially emulate HDR to get a feel for grading in the standard before investing in a full HDR setup.


New Standard, New Hardware

Alright, first, the bad news.  Professional grade reference displays for HDR are expensive.  And there are only two that are commercially available for purchase*: The Sony BVM-X300 and the Dolby PRM-4220.  Both cover 100% of DCI-P3 space, but the BVM-X300 operates in and covers most of BT.2020, has a 4K resolution, a peak brightness of 1000 nits, and uses OLED panels for more detail through the darks.  The PRM-4220 is an excellent display, but is only 2K in resolution and 600 nits max, though it operates with a 12 bit panel for better DCI reference.

At the time of this writing, these are the only two commercially available HDR reference monitors.

At the time of this writing, these are the only two commercially available HDR reference monitors.

At this time, I can’t find any DCI projectors advertising HDR capabilities, though I think that a bright enough laser projector with the right LUT could emulate one, in a small environment - essentially using the LUT trick I’m going to describe below while using a projector that’s 10x brighter than it should be for the reference environment.  That doesn’t mean they don’t exist, it just means you’ll need to talk to the manufacture directly.  I haven’t tested this, though, so don’t quote me on it.

There's at least one reference display that claims to be HDR compatible, but really isn’t - the Canon DP-V2410.  Frankly, the display is actually gorgeous and comparable to the Sony for color rendering and detail level, but it’s max brightness is only 300 nits and it’s HDR mode downscales the SMPTE ST.2084 0.0001 - 1000 nit range into the 0.01-300 nit range.  This leaves the overall image rather lackluster and less impactful, though you could use it to grade in a pinch, since its response curve is right.  But I wouldn’t, primarily because of MaxFALL, which I’ll cover extensively in Parts 4 and 5.

At Mystery Box we decided to go with the Sony BVM-X300 for our HDR grading.  I can’t praise the look of this display enough, though I do have my gripes (I mean, who doesn’t?), but I’ll save that review for another time.

Sony BVM-X300 (Right) in Mystery Box's grading environment (lights on, for detail clarity)


HDR Video on Consumer Displays

DaVinci Resolve Studio 12.5+ Settings for enabling HDR metadata over HDMI

The most affordable option for grading in HDR is to use an HDR television.  The Vizio Reference series have nice color with a 300 nit peak (in HDR mode), while the LG 2016 OLED HDR series displays have phenomenal color, with max brightness levels approaching 1000 nits.

The catch is, of course, that there is still more variation in the color of the display than in a reference display, so unless you know for certain that you’re going to be exhibiting on that specific display, be cautious when using them to grade.  They also lack SDI inputs, but that’s solvable.

DaVinci Resolve Studio version 12.5+ has an option to inject flags for HDR and BT.2020 into the HDMI output of your DeckLink or UltraStudio hardware.  To grade in HDR using a consumer HDR television with HDMI input, simply hook up the display over HDMI, toggle the option in your DaVinci settings and the display will automatically switch into HDR mode:

If you’re not using DaVinci Resolve Studio 12.5+, or if for whatever reason you have to route SDI out, you can inject the right metadata into the HDMI stream once you’ve converted from SDI to HDMI.  What you’ll need is an Integral by HD Fury.  This box, which is annoyingly only configurable under Windows, will add the right metadata into the HDMI connection between host and device, allowing you to flag any HDMI stream as BT.2020 and HDR.

Marketing shot of Integral by HD Fury, a box that will allow you to manually alter HDMI metadata

BE CAREFUL if you’re using the Integral though.  It can be tempting to use the HDMI output of your computer to just patch your desktop into HDR.  This is a bad idea.  Any interface lines will also be translated into HDR, which will limit the displays overall brightness (because you can’t switch your desktop into HDR mode), and any static elements risk burn-in.  Most HDR displays use OLED panels and OLEDs are susceptible to burn-in!

If you are already using SDI outputs for your I/O, and want to switch to the BVM-X300 or the PRM-4220, you shouldn’t NEED to upgrade your I/O card or box to drive HDR - 10b 4:2:2 works for grading HDR.  You might want to upgrade though if you want higher output resolutions (4K vs 2K/1080), higher frame rates at the higher resolutions (50/60p) or RGB/4:4:4 instead of 4:2:2 Chroma Subsampling.

Everything else should work with your existing color correction hardware.


Emulating HDR Video

Okay, so if you’re not ready to spring for the new reference hardware, but want to emulate HDR just to get a feel for how it works, here’s a trick you can do using a standard broadcast display, or a display like the HP Dreamcolor Z27x (which I used when doing my first tests) to partially emulate HDR.

Use a reference display with native BT.2020 color support, if you can.  If you’re using Rec 709, but still want to get a feel for grading in BT.2020, there’s a fix for that using LUTs, but it’s not elegant.  You can get a feel for the HDR curve in a Rec 709 color space, but you won’t get a feel for how the primaries behave slightly differently, or how saturation works in BT.2020.**

In addition, if possible, try to use a reference display with a 10 bit panel.  There’s no cheat for this one, you either have it or you don’t.  8 bits will give you an idea what you’re looking at, but won’t be as clear as possible.  In many cases it won’t make a difference, you’ll just lose your ability to see specific fine details.

Now, calibrate the display and your environment, to emulate HDR.  Turn your maximum brightness to full (on the Dreamcolor Z27x, it peaks at 250 nits; your display may be different).  Turn off all ambient lighting (as pitch black as possible).  Then, turn down the brightness of the host interface display to the lowest setting that it’s still useable.  Do the same for any video scopes or other external monitoring hardware that may also be hooked up to the output signal.

HP Dreamcolor Z27x HDR Approximation Settings

This should make your reference display the brightest thing in the room, by a factor of 2 to 4x.  This is important.  While the display will still lack ‘umph’, at very least it’ll dominate your perception of brightness.  That’s key to creating the illusion of the HDR effect in this case; without it your screen will just look muted and dull.

HDR Approximation Environment Calibration: Lights off, scopes dimmed, interface display as low as possible while retaining visibility (6%, in this case)

At this point, what we’ve done by adjusting the ambient and display brightness is emulated the greater brightness range of HDR without using a display that pushes into the HDR range.  Next what we need to do is adjust the display’s response so that it matches the HDR curve we want to emulate.  Essentially, we need to eliminate the display’s native gamma curve for either PQ or HLG curve.

DaVinci Resolve Studio's LUTs for scaling HDR into Gamma 2.4 / Gamma 2.6

This is actually pretty easy to do in DaVinci Resolve Studio - DaVinci has a set of 3D LUTs you can attach to your output that will automatically do it for you.  You’ll find them written as “HDR <value in nits> nits to Gamma <target display gamma>” (ex. HDR 1000 nits to Gamma 2.4) for the SMPTE 2084 / PQ curve, and “HLG to Gamma <target display gamma>” (ex. HLG to Gamma 2.2) for the Hybrid Log Gamma curve.

What these LUTs do, essentially, is add a 1/gamma (ex 1/2.4) contrast curve to the output signal, combined with the selected contrast curve, i.e., the one you want to see.  The gamma reciprocal adjustment combines with the display’s native or selected gamma to linearlize the overall signal, as the two curves cancel each other left.  The only contrast curve you’re left with, then, is the HDR contrast curve you’ve added to the signal, the HDR curve being translated into your display’s native or adapted luminance range.**

Using one of these LUTs on your monitor or output will allow your display to respond as if it were operating natively with the HDR curve, though you'll notice that your display is only showing the first 100 nits of HDR curve.  We'll fix that next.

The final step is to calibrate your display’s brightness and contrast.  I add a timeline node and scale the gain and gamma adjustments to bring the full HDR range back into the display's native signal range.  As for adjusting the contrast, though, there’s not much I can say about how to do that, other than to use a reference image or two graded in the target space to calibrate the display until it ‘looks right’.  Here are a couple that I graded in SMPTE 2084 that you can use for calibration:

Mystery Box ST.2084 Calibration Images, normalized for Rec.709.  Follow this link to download the DPX and individual reference PNGs.

All of this LUT business and brightness scaling, by the way, is exactly what the Canon DP-V2410 does, it just does it internally with a mode switch instead of needing manual setup.  Don’t get me wrong - in every other respect, the DP-V2410 is an amazing display, but in HDR mode it’s equivalent to this setup for HDR emulation, rather than true HDR performance.*


Emulated HDR vs. True HDR

So how does an emulated HDR display compare to a true HDR reference display?  Well, poorly is an understatement.  It's not terrible, but emulated HDR lacks the power of the true HDR, the ability to grade with lights on and see how your footage holds up through the large punch of the whites.  With an 8 bit panel you’re going to see stepping while grading in an emulated HDR mode, because most of the region you’ll be working in ends up compressed to 50 or so code values.

This compression in the darks means you won’t get a feel for just how deep SMPTE 2084 can go while still preserving details - you can grade whole shots with full detail in the darks and a few hundred levels of contrast, that land between codes 4 and 14 (full range) on a standard 8 bit display (especially an LED or CFL backed LCD).

You’ll also be tempted in this mode to grade ‘hot’, that is, push more into the brights of the image, since you don’t have any actual limits for frame average light levels, like all true HDR displays do.  That’s not necessarily a problem, but you’ll run into trouble if you try to use the footage elsewhere.  You also miss the great psychological response the actual dark and light levels of a true HDR range give you.

So why emulate then?  Well, right now, HDR reference hardware is expensive.  And if you want to practice grading and mastering in HDR, without having to invest in the hardware, emulation is a fantastic place to start.  You’ll get to see how the mids and highs roll into the whites in SMPTE 2084, and develop tricks to make your grading more natural when you make the switch to a proper HDR display.  You may even be able to grade using emulated HDR so long as you have a proper HDR television to internally QC before sending out to a client - assuming your mastering of the HDR file is right, you can check it on a television and make sure it at very least looks good there, contrast and curve wise, before sending it out to a client.

Of course, mastering HDR video is problem in and of itself, but I’m saving it for last, in Part 5 of our series.  First, though, we’re going to look at the new terminology introduced with HDR video, because even if you’ve been working with video for decades, most of this is likely to be new.


Endnotes

* The day I went to post this I found Canon had updated their website to include the Canon DP-V2420, which they claim supports full HDR in both the ST.2084 and the HLG specifications, and be Dolby Vision qualified;  I haven't had time to look into these claims.

The Dolby PRM-4220 requires a workaround to get it to operate in an HDR mode.  It can be loaded with a custom gamma curve that can match the HDR EOTF, or you can add a custom LUT that scales the 0.01 - 600 nits of SMPTE ST2084 into gamma 2.4 while operating the display in 600 nits mode.

The Dolby Pulsar and the Dolby PRM-32FHD are both HDR capable displays, operating at 4000 and 2000 nits respectively, but I elected not to mention them because they are not, to the best of my knowledge, generally available for purchase.

** If you’re using the LUT on your output to emulate the HDR curve, but only have a Rec. 709 display and want to get a feel for BT.2020, you may consider using a BT.2020 to Rec. 709 LUT and stacking it with the gamma compensating LUT.  In DaVinci you can do this by adding one LUT to the output, and a second LUT for the monitor, or by attaching one of the LUTs to a global node for a timeline.  As a last resort, you can attach as many LUTs as you want to individual grades. You should be able to do something similar in pretty much all other color grading or mastering softwares like Scratch or Nuke.

HDR Video Part 1: What is HDR Video?

It’s October 2016, and here at Mystery Box we’ve been working in HDR video for a little over a year.

While it’s easier today to find out information about the new standard than it was when I first started reading the research last year, it’s still not always clear what it is and how it works.  So, to kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video.  This is Part 1: What is HDR Video?

HDR video is as much of a revolution and leap forward as the jump from analog standard definition, to digital 4K.

Or, to put it far less clinically, it’s mind-blowingly, awesomesauce, revolutionarily, incredible!  If it doesn’t get you excited, I’m not sure why you’re reading this…

So what is it about HDR video that makes it so special, so much better than what we’ve been doing?  That’s what we’re going to dive into here.


HDR Video vs. HDR Photography

If you’re a camera guy or a even an image guy, you’re probably familiar with HDR photography.  And if you’re thinking “okay, what’s the big deal, we’ve had HDR for years”, think again.  HDR video is completely unrelated to HDR photography, except for the ‘higher dynamic range’ part.

In general, any high dynamic range technique seeks to capture or display more levels of brightness within a scene, that is, increase the overall dynamic range.  It’s kind of a ‘duh’ statement, but let’s go with it.

In photography, this usually means using multiple exposures at different exposure values (EVs), and blending the results into a single final image.  The catch, of course, has always been that regardless of how many stops of light you capture with your camera or HDR technique, you’re still limited by the same 256 levels of brightness offered by 8 bit JPEG compression and computer/television displays, or the slightly bigger, but still limited set of tonality offered by inks for print.

So, most HDR photography relies on creating regions of local contrast throughout the image, blending in the different exposure levels to preserve the details in the darks and the lights:

Photograph with standard contrast vs. the same with local contrast

While the results are often beautiful, they are, at their core, unnatural or surreal.

HDR Video is Completely Different

Instead of trying to compress the natural dynamic range of a scene into a very limited dynamic range for display, HDR video expands the dynamic range of the display itself by increasing the average and peak display brightnesses (measured in nits), and by increasing the overall image bit depth from 8 bit to at least 10 bits / channel, or from 255 brightness levels & 16 million colors, to at least 1024 brightness levels & 1.02 billion colors.

Standard Video / Photography Range vs. HDR Photography vs. HDR Video Ranges

The change of the display light level allows for extended ranges of tonalities through the darks and the lights, so that the final displayed image itself is a more natural rendering of a scene, one that’s able to match the overall dynamic range of today’s digital cinema and film-sourced cameras. And perhaps more importantly, when fully implemented, HDR video will almost completely match the dynamic range of the human eye itself.

How big of a deal is it?  I can’t describe it better than my younger brother did the first time I showed him HDR video:

 

“I want to say that it’s like you’re looking through a window into another world, except that when you look through a window, it’s not as crisp, or as clean, or as clear as this”.

 

First Impressions to HDR Video

First Impressions to HDR Video


How did we get here?

So if HDR video is so much better than what we’ve been using so far, why haven’t we been using it all along?

And now, for a history lesson (it’s interesting; but it’s not essential to know, so skip down if you don’t care).

Cathode Ray Tubes as scientific apparatus and ‘display’ devices have been around in some form or another since the late 1880s, but the first CRT camera wasn’t invented until the late  1920s.  Early cameras were big with low resolutions; televisions were grainy, noisy, and low fidelity.

Things changed quickly in the early years of television. As more companies jumped on board the CRT television bandwagon, each created slightly different, and incompatible, television systems in an effort to avoid patent rights infringement.  These different systems, with different signal types, meant that home television sets had to match the cameras used by the broadcaster, i.e., they had to be the made by the same company.  As a result, the first broadcaster in an area created a local monopoly for the equipment manufacturer they sourced their first cameras from, and consumers had no choice.

Foreseeing a large problem when more people started buying televisions sets, and more broadcasters wanted to enter an area, the United States government stepped in and said that the diversity of systems wouldn’t fly - all television broadcasts and television sets had to be compatible.  To that end they created a new governing body, the National Television System Committee, or NTSC, which went on to define the first national television standard in 1941.

We’ve had to deal with the outcomes of standardization, good and bad, ever since.

The good, obviously, has been that we don’t have to buy a different television for every channel we want to watch, or every part of the country we want to live in (though transnationals are often still out of luck).  The bad is that every evolution of the standard since 1941 has required backwards compatibility: today’s digital broadcast standards, and computer display standards too, are still limited in part by what CRTs could do in the 1940s and 50s.

Don’t believe me?  Even ignoring the NTSC 1/1.001 frame rate modifier, there’s still a heavy influence: let’s look at the list:

  1. Color Space: The YIQ color space for NTSC and the YUV color space used in both PAL and SECAM are both based on the colors that can be produced by the short glow phosphors, which coat the inside of CRT screens and form the light and color producing element of the CRT.  In the transition to digital, YIQ and YUV formed the basis for Rec. 601 color space (SD Digital), which in turn is the basis for Rec. 709 (HD Digital) color space (Rec. 709 uses almost the same primaries as Rec. 601).

    And just in case your computer feels left out, the same color primaries are used in the sRGB display standard too, because all of these color spaces were display referenced, and they were all built on the same CRT technology.  Because up until the early 2000s, CRTs were THE way of displaying images electronically - LCDs were low contrast, plasma displays were expensive, and neither LEDs nor DLPs had come into their own.
     

  2. Transfer Function: The transfer function (also called the gamma curve) used in SD and HD is also based on the CRT’s natural light-to-electrical and electrical-to-light response.  The CRT camera captured images with a light-to-voltage response curve of approximately gamma 1/2.2, while the CRT display recreated images with a voltage-to-light response curve of approximately gamma 2.4.  Together, these values formed the standard approximate system gamma of 1.2, and form the basis for the current reference display gamma standard of 2.4, found in ITU-T Recommendation BT.1886.
     

  3. Brightness Limits: Lastly, and probably most frustratingly, color accurate CRT displays require limited brightness to maintain their color accuracy. Depending on the actual phosphors used for primaries, that max-brightness value typically lands in the 80-120 nits range.  And consumer CRT displays, while bigger, brighter, and less color accurate, still only land in the 200 nit max brightness levels.  For comparison, the brightness levels found on different outdoor surfaces during a sunny day land in the 5000-14,000 range (or more!).

    This large brightness disparity between reference and consumer display levels has been accentuated in recent years with the replacement of CRTs with LCD, Plasma and OLED displays, which can easily push 300-500 nits peak brightness.  Those brightness levels skew the overall look of images graded at reference, while being very intolerant of changes in ambient light conditions.  In short this means that with the current standards, consumers rarely have the opportunity to see content in their homes as filmmakers intended.

So, because of the legacy cathode ray tube, (a dead technology), we’re stuck with a set of legacy standards that limit how we can deliver images to consumers.  But because CRTs are a dead technology, we now have an opportunity where we can choose to either be shackled by the 1950s for the rest of time, or, to say “enough is enough,” and use something better.  Something forward thinking.  Something our current technology can’t even match 100% yet.  Something like, HDR video.


The HDR Way

At the moment, there two different categories and multiple standards covering HDR video, including CTA’s HDR 10 Media Profile, Dolby’s Dolby Vision, and the BBC’s Hybrid Log Gamma.  And naturally, they all do things just a little differently.  I’ll cover their differences in depth in Part 3: HDR Video Terms Explained, but for now I’m going to lump them all together and just focus on the common aspects of all HDR video, and what makes it different than video from the past.

There are four main things that are required to call something HDR video: ITU-T Recommendation BT.2020 or DCI-P3 color space, a high dynamic range transfer function, 10 bits per channel transmission and display values, and transmitted metadata.

BT.709, DCI-P3, and BT.2020 on CIE XYZ 1931

1. Color Space: For the most part, HDR video is seen by many as an extension of the existing BT.2020 UHD/FUHD and DCI specifications, and as such uses either the wider BT.2020 color gamut (BT.2020 is the 4K/8K replacement for BT.709/Rec.709 HD broadcast standards), or the more limited, but still wide, DCI-P3 gamut.

BT.2020 uses pure wavelength primaries, instead primary values based on the light emissions of CRT phosphors or any material.  The catch is, of course, we can’t fully show these in a desktop display (yet), and only the most recent laser projectors can cover the whole color range. But ultimately, the breadth of the color space covers as many of the visible colors as is possible with three real primaries*, and includes all color values already available in Rec.709/sRGB and DCI-P3, as well as 100% of Adobe RGB and most printer spaces available with today’s pigments and dyes.

2. Transfer Function: Where HDR video diverges from standard BT.2020 and DCI specs is in its light-level-to-digital-value and digital-value-to-light-level relationship, called the OETF and EOTF respectively.  I’m going to go into more depth on OETFs and EOTFs at another time, but for now what we need to know is that the current relationship between light levels and digital values is a legacy of the cathode ray tube days, and approximates gamma 2.4.  Under this system, full white digital value of 235 translates to a light output of between 80-120nits.

Extending this same curve into a higher dynamic range output proves problematic because of the non-linear response of the human eye: it would either cause severe stepping in the darks and lights, or it would require 14-16 bits per channel while wasting digital values in increments that can’t actually be seen.  And it still wouldn’t be backwards compatible, in which case, what’s the point?

So instead, HDR video uses one of two new transfer curves: the BBC’s Hybrid Log Gamma (HLG), standardized in ARIB STD-B67, which allows for output brightness levels from 0.01 nit up to around 5000 nits, and Dolby’s Perceptual Quantization (PQ) curve, standardized in SMPTE ST.2084, which allows for output brightness levels from 0.0001 nit up to 10,000 nits.

PQ is the result of direct research done by Dolby to measure the response of the human eye, and to create a curve where no value is wasted with no visible stepping between values.  The advantage of PQ is pretty clear, in terms of maximizing future output brightness (the best experimental single displays currently max out at 4000 nits; Dolby’s test apparatus ranged from 0.004 to 20,000 nits) and increasing the amount of detail captured in the darks.

HLG, on the other hand, provides a degree of backwards compatibility, matching the output levels of gamma 2.4 for the first 50% of the curve, and reserving the top 50% of the values to the higher light level output.  Generally, HLG content with a system gamma of 1.2 looks pretty close to standard dynamic range content, though it’s whites sometimes end up compressed and greyer than content mastered in SDR to begin with.

Footage graded in Rec. 709 and the same graded in HLG.

(Side note: I prefer grading in SMPTE ST.2084 because of the extended dynamic range through the blacks, and smoother roll-into the whites.)
 

3. Bit Depth: The new transfer curves accentuate a problem that’s been with video since the switch from analog to digital values: stepping.  As displays have gotten brighter, the difference between two code values (say, digital value of 25 and 26) is sometimes enough that we can see a clear distinguishing line between the two greys.  This is especially true when using a display whose maximum brightness is greater than reference standard, and is more common in the blacks than in the whites.

Both the BT.2020 and DCI standards already have requirements to decrease stepping by switching signal encoding and transmission from 8 bits per channel to 10 bits minimum (12 bits for DCI), allowing for at least a 4 times smoother gradient.  However, BT.2020 still permits 8 bit rendering at the display, which is what you’ll find on the vast majority of televisions and reference displays on the market today.

On the other hand, HDR video goes one step further and requires 10 bit rendering at the display panel itself; that is, each color sub pixel must be capable of between 876 and 1024 distinguishable light levels, in all operational brightness and contrast modes.

The reason that HDR requires a 10 bit panel while BT.2020 doesn’t, is that our eyes are more susceptible to stepping in the value of a color or gradient than to stepping in its hue or saturation: the eye can easily make up for lower color fidelity (8 bits per channel in BT.2020 space) by filling in the gaps, but with an HDR curve the jump in light levels between two codes in 8 bits per channel is big enough that it’s clearly noticeable.

Comparison between gradients step sizes at 8 bit, 10 bit, and 12 bit precisions (contrast emphasized)

4. Metadata: The last thing that HDR video requires that standard BT.2020 doesn’t, is metadata.  All forms of HDR video should include information about both the content and the mastering environment.  This includes which EOTF was used in the grade, the maximum and frame average brightnesses of the content and display, and which RGB primaries were used.  Dolby Vision even includes metadata to define, shot by shot, how to translate the HDR values into the SDR range!

Consumer display manufacturers use this information to adapt content for their screens in real time, knowing when to clip or compress the highlights and darks (based on the capability of the screen it’s being shown on), and for the automatic selection of operational mode (switching from Rec. 709 to BT.2020, and in and out of HDR mode, without the end user ever having to change a setting).

 

So, in summary, what does HDR video do differently?  Wider color gamuts, new transfer function curves to allow for a much larger range of brightnesses, 10 bits per channel minimum requirement at the display to minimize stepping, and the transmission of metadata to communicate information about the content and its mastering environment to the end user.

All of which are essential, none of which are completely backwards compatible.


Yes, but what does it look like?

Unfortunately, the only way to really show you what HDR looks like is to tell you to go to a trade show or post house with footage to show, or buy a TV with HDR capabilities and stream some actual HDR content.  Because when you show HDR content on a normal display, it does not look right:

Images in SMPTE ST.2084 HDR Video formats do not appear normal when directly brought into Rec. 709 or sRGB Gamma 2.4 systems

You can get a little bit of a feel for it if I cut the brightness levels of a standard dynamic range image by half, and put it side-by-side with one that more closely follows the HDR range of brightnesses:

Normalized & Scaled SMPTE ST.2084 HDR Video vs Rec. 709 with Brightness Scaled

But that doesn’t capture what HDR video actually does.  I don’t quite know how to describe it - it’s powerful, beautiful, clear, real, present and multidimensional.  There’s an actual physiological and psychological response to the image that you don’t get with standard dynamic range footage - not simply an emotional response to the quality of the image, but the higher brightness levels actually trigger things in your eyes and brain that let you literally see it differently than anything you’ve seen before.

And once you start using it on a regular basis, nothing else seems quite as satisfactory, no other image quite as beautiful.  You end up with a feeling that everything else is just a little bit inadequate.  That’s why HDR will very rapidly become the new normal of future video.


So that's it for Part 1: What is HDR Video?  In Part 2 of our series on HDR video, we’re going to cover what you need to grade in HDR, and how can you cheat a bit to get a feel for the format by emulating its response curve on your existing reference hardware.


Endnotes:

* While ACES does cover the entire visible color spectrum, it’s primary RGB values are imaginary, which means that while it can code for all possible colors, there’s no way of building a piece of technology that actually uses the ACES RGB values as its primary display colors.  Or in other words, if you were to try and display ACES full value RED, you couldn’t, because that color doesn’t exist.