HDR Research

Overview

We are looking at what it would take to encode HDR-10 and Dolby Vision videos using an AVC/HEVC encoder.
HDR-10: 10-bit HDR, BT.2020, PQ transfer function, static metadata
Dolby Vision: 12-bit HDR (10bit YUV + deltas defined in metadata = 12 bit), BT.2020, PQ transfer function, Dolby Vision dynamic metadata 

What is HDR-10?

From this paper, we have gleaned the following:
Essentially, HDR-10 is defined as the combination of the following container and coding characteristics:
    • Color container/primaries: BT.2020
    • PQ Transfer function (OETF/EOTF): SMPTE ST 2084
    • Representation: Non Constant Luminance (NCL) YCbCr
    • Sampling: 4:2:0
    • Bit Depth: 10 bits
    • Metadata: SMPTE ST 2086, MaxFALL, MaxCLL,  AVC/HEVC Supplemental enhancement information (SEI) Messages.
    • Encoding using: HEVC Main 10 profile or AVC High 10.
The AVC/HEVC specifications support all of these features as well as metadata (SEI) that can specify the mastering and brightness limitations characteristics of the content.
The encoder side includes:
    • The SMPTE ST 2084:2014 Electro Optical Transfer Function (EOTF) and Inverse-EOTF (OETF) commonly referred to as the Transfer Function (TF), which is applied to linear light samples in the RGB BT.2020 domain.
    • Color conversion to the Non-Constant Luminance (NCL) YCbCr BT.2020 format.
    • Chroma down-sampling to 4:2:0. 
    • Quantization to a 10 bit integer representation. 

How do we implement Static HDR-10 Metadata?

Please see the BEAMR document titled "Beamr 5, HEVC SDK: v4.0, High Dynamic Range (HDR) Configuration".  In that document they specify the following:
  • VUI section of the SPS

    vui.enable = 1
    vui.video_full_range_flag = 0
    vui.transfer_characteristics = 16 // SMPTE ST 2084
    vui.colour_primaries = 9 // ITU-R BT.2020
    vui.matrix_coeffs = 9 // ITU-R BT.2020
  • Mastering Display Color Volume SEI
    sei.mastering_display_colour_volume_flag = 1
    sei.mastering_display_colour_volume.display_primaries_x[0] = 13250 // Green_X
    sei.mastering_display_colour_volume.display_primaries_x[1] = 7500 //Blue_X
    sei.mastering_display_colour_volume.display_primaries_x[2] = 34000 // Red_X
    sei.mastering_display_colour_volume.display_primaries_y[0] = 34500 //Green_Y
    sei.mastering_display_colour_volume.display_primaries_y[1] = 3000 //Blue_Y
    sei.mastering_display_colour_volume.display_primaries_y[2] = 16000 //Red_Y
    sei.mastering_display_colour_volume.white_point_x = 15635
    sei.mastering_display_colour_volume.white_point_y = 16450
    sei.mastering_display_colour_volume.max_display_mastering_luminance = 10000000
    sei.mastering_display_colour_volume.min_display_mastering_luminance = 50

  • Content light level information SEI
    sei.content_light_level_info_flag
    sei.content_light_level_info.max_content_light_level
    sei.content_light_level_info.max_pic_average_light_level

How to derive Static HDR -10 Metadata values for the SEI message

At the end of the page there is a link to Apple Example HDR Metadata, the most important part is the following, which defines the input values for static HDR-10 metadata:
<attribute name="hdr.format">HDR10</attribute>
<attribute name="hdr.red.chroma.x">0.6800</attribute>
<attribute name="hdr.red.chroma.y">0.3200</attribute>
<attribute name="hdr.green.chroma.x">0.2650</attribute>
<attribute name="hdr.green.chroma.y">0.6900</attribute>
<attribute name="hdr.blue.chroma.x">0.1500</attribute>
<attribute name="hdr.blue.chroma.y">0.0600</attribute>

<attribute name="hdr.whitepoint.chroma.x">0.3127</attribute>
<attribute name="hdr.whitepoint.chroma.y">0.3290</attribute>

<attribute name="hdr.max.display.luminance">1000</attribute>
<attribute name="hdr.min.display.luminance">0.005</attribute>

<attribute name="hdr.max.content.lightlevel">1000</attribute>
<attribute name="hdr.max.frame.avg.lightlevel">343</attribute>

These values must be transformed into integers before they are put into the SEI message.  Based on SMPTE 2086, the red, green, blue and whitepoint chroma values are in units of 0.00002.  The display luminance values are in units of 0.0001.  The light level units can be used as is.  Therefore, to generate the HDR-10 values for the SEI message we need to divide the input red, green, blue, and whitepoint chroma values by 0.00002 (or multiply by 50000), the display luminance values by 0.0001 (or multiply by 10000), and the light level values by 1.  Given the above example input values, the SEI message should look like the following:
sei.mastering_display_colour_volume_flag = 1
sei.mastering_display_colour_volume.display_primaries_x[2] = 34000 // Red_X
sei.mastering_display_colour_volume.display_primaries_y[2] = 16000 //Red_Y
sei.mastering_display_colour_volume.display_primaries_x[0] = 13250 // Green_X
sei.mastering_display_colour_volume.display_primaries_y[0] = 34500 //Green_Y
sei.mastering_display_colour_volume.display_primaries_x[1] = 7500 //Blue_X
sei.mastering_display_colour_volume.display_primaries_y[1] = 3000 //Blue_Y
sei.mastering_display_colour_volume.white_point_x = 15635
sei.mastering_display_colour_volume.white_point_y = 16450
sei.mastering_display_colour_volume.max_display_mastering_luminance = 10000000
sei.mastering_display_colour_volume.min_display_mastering_luminance = 50

sei.content_light_level_info_flag = 1
sei.content_light_level_info.max_content_light_level = 1000
sei.content_light_level_info.max_pic_average_light_level = 343

What the SMPTE 2086 spec says:

display_primaries_x [ c ] and display_primaries_y [ c ] specify the normalized x and y chromaticity coordinates, respectively, of the colour primary component c of the mastering display in increments of 0.00002
white_point_x and white_point_y specify the normalized x and y chromaticity coordinates, respectively, of the white point of the mastering display in normalized increments of 0.00002
max_display_mastering_luminance and min_display_mastering_luminance specify the nominal maximum and minimum display luminance, respectively, of the mastering display in units of 0.0001

What is Dolby Vision?

Dolby Vision is a step up in quality from HDR10.  Dolby Vision uses dynamic HDR metadata instead of the static metadata of HDR-10.  Dolby Vision allows a stream of time-based, dynamic metadata to be used along side of the video stream.  The time based, dynamic metadata defines a set of HDR parameters per scene, instead of once for the entire video.  This allows for a better HDR experience compared to static metadata.  Dolby Vision even allows for sub-rects to be defined withing a scene (think picture-in-picture) so that even more fine tuning can be done.  Dolby Vision HDR metadata complements 10-bit YUV 4:2:0 data.  Both the Dolby Vision metadata and the 10-bit YUV 4:2:0 data are fed into 2 proprietary pieces of hardware called the UHD Dolby Vision Composer and the Dolby Vision Display Manager, which exists in the display device.  The output of the UHD Composer and Display Manager is 12 bit YUV 4:2:0 data.
Specs and more can be found here.
Dolby Vision FAQ.
Dolby Vision and HEVC.

How Do We Implement Dolby Vision?

  • Color container/primaries: BT.2020
  • PQ Transfer function (OETF/EOTF): SMPTE ST 2084
  • Representation: Non Constant Luminance (NCL) YCbCr
  • Sampling: 4:2:0
  • Bit Depth: 10 bits
  • Encoding using: HEVC Main 10 profile or AVC High 10.
  • Dolby Vision metadata can be carried in an ISO Base Media File, an HTTP Live Stream, and MPEG2 Transport Stream, and MPEG Dash.

Tools

hevcesbrowser: I have found a tool called hevcesbrowser, which parses a raw HEVC elementary stream and shows what all the NALUs are.  It even shows the HDR settings used in the HEVC elementary stream.

Delivery Targets

Apple

Source formats will be IMF and TIFF files.

Comments

Popular posts from this blog

FFMPEG Deinterlacing Modes

Remote debugging with GDB and GDBServer on Alpine and Visual Studio Code

Build FFMPEG for Windows Using Visual Studio Toolchain