For your eyes solely: enhancing Netflix video high quality with neural networks | by Netflix Technology Blog | Nov, 2022

0
138
For your eyes solely: enhancing Netflix video high quality with neural networks | by Netflix Technology Blog | Nov, 2022


by Christos G. Bampis, Li-Heng Chen and Zhi Li

When you might be binge-watching the most recent season of Stranger Things or Ozark, we attempt to ship the very best video high quality to your eyes. To achieve this, we constantly push the boundaries of streaming video high quality and leverage the perfect video applied sciences. For instance, we spend money on next-generation, royalty-free codecs and complicated video encoding optimizations. Recently, we added one other highly effective software to our arsenal: neural networks for video downscaling. In this tech weblog, we describe how we improved Netflix video high quality with neural networks, the challenges we confronted and what lies forward.

There are, roughly talking, two steps to encode a video in our pipeline:

  1. Video preprocessing, which encompasses any transformation utilized to the high-quality supply video previous to encoding. Video downscaling is probably the most pertinent instance herein, which tailors our encoding to display resolutions of various units and optimizes image high quality underneath various community situations. With video downscaling, a number of resolutions of a supply video are produced. For instance, a 4K supply video will likely be downscaled to 1080p, 720p, 540p and so forth. This is usually performed by a standard resampling filter, like Lanczos.
  2. Video encoding utilizing a standard video codec, like AV1. Encoding drastically reduces the quantity of video information that must be streamed to your system, by leveraging spatial and temporal redundancies that exist in a video.

We recognized that we are able to leverage neural networks (NN) to enhance Netflix video high quality, by changing typical video downscaling with a neural network-based one. This strategy, which we dub “deep downscaler,” has a number of key benefits:

  • A realized strategy for downscaling can enhance video high quality and be tailor-made to Netflix content material.
  • It may be built-in as a drop-in resolution, i.e., we don’t want some other modifications on the Netflix encoding facet or the consumer system facet. Millions of units that assist Netflix streaming mechanically profit from this resolution.
  • A definite, NN-based, video processing block can evolve independently, be used past video downscaling and be mixed with completely different codecs.

Of course, we consider within the transformative potential of NN all through video functions, past video downscaling. While typical video codecs stay prevalent, NN-based video encoding instruments are flourishing and shutting the efficiency hole by way of compression effectivity. The deep downscaler is our pragmatic strategy to enhancing video high quality with neural networks.

The deep downscaler is a neural community structure designed to enhance the end-to-end video high quality by studying a higher-quality video downscaler. It consists of two constructing blocks, a preprocessing block and a resizing block. The preprocessing block goals to prefilter the video sign previous to the following resizing operation. The resizing block yields the lower-resolution video sign that serves as enter to an encoder. We employed an adaptive community design that’s relevant to the wide range of resolutions we use for encoding.

Architecture of the deep downscaler mannequin, consisting of a preprocessing block adopted by a resizing block.

During coaching, our purpose is to generate the perfect downsampled illustration such that, after upscaling, the imply squared error is minimized. Since we can not immediately optimize for a standard video codec, which is non-differentiable, we exclude the impact of lossy compression within the loop. We give attention to a strong downscaler that’s educated given a standard upscaler, like bicubic. Our coaching strategy is intuitive and leads to a downscaler that isn’t tied to a particular encoder or encoding implementation. Nevertheless, it requires a radical analysis to display its potential for broad use for Netflix encoding.

The purpose of the deep downscaler is to enhance the end-to-end video high quality for the Netflix member. Through our experimentation, involving goal measurements and subjective visible exams, we discovered that the deep downscaler improves high quality throughout numerous typical video codecs and encoding configurations.

For instance, for VP9 encoding and assuming a bicubic upscaler, we measured a median VMAF Bjøntegaard-Delta (BD) charge achieve of ~5.4% over the standard Lanczos downscaling. We have additionally measured a ~4.4% BD charge achieve for VMAF-NEG. We showcase an instance end result from one among our Netflix titles beneath. The deep downscaler (pink factors) delivered larger VMAF at comparable bitrate or yielded comparable VMAF scores at a decrease bitrate.

Besides goal measurements, we additionally performed human topic research to validate the visible enhancements of the deep downscaler. In our preference-based visible exams, we discovered that the deep downscaler was most popular by ~77% of take a look at topics, throughout a variety of encoding recipes and upscaling algorithms. Subjects reported a greater element preservation and sharper visible look. A visible instance is proven beneath.

Left: Lanczos downscaling; proper: deep downscaler. Both movies are encoded with VP9 on the identical bitrate and have been upscaled to FHD decision (1920×1080). You could must zoom in to see the visible distinction.

We additionally carried out A/B testing to know the general streaming affect of the deep downscaler, and detect any system playback points. Our A/B exams confirmed QoE enhancements with none adversarial streaming affect. This reveals the good thing about deploying the deep downscaler for all units streaming Netflix, with out playback dangers or high quality degradation for our members.

Given our scale, making use of neural networks can result in a big improve in encoding prices. In order to have a viable resolution, we took a number of steps to enhance effectivity.

  • The neural community structure was designed to be computationally environment friendly and in addition keep away from any destructive visible high quality affect. For instance, we discovered that just some neural community layers have been ample for our wants. To cut back the enter channels even additional, we solely apply NN-based scaling on luma and scale chroma with a regular Lanczos filter.
  • We applied the deep downscaler as an FFmpeg-based filter that runs along with different video transformations, like pixel format conversions. Our filter can run on each CPU and GPU. On a CPU, we leveraged oneDnn to additional cut back latency.

The Encoding Technologies and Media Cloud Engineering groups at Netflix have collectively innovated to carry Cosmos, our next-generation encoding platform, to life. Our deep downscaler effort was a superb alternative to showcase how Cosmos can drive future media innovation at Netflix. The following diagram reveals a top-down view of how the deep downscaler was built-in inside a Cosmos encoding microservice.

A top-down view of integrating the deep downscaler into Cosmos.

A Cosmos encoding microservice can serve a number of encoding workflows. For instance, a service may be referred to as to carry out complexity evaluation for a high-quality enter video, or generate encodes meant for the precise Netflix streaming. Within a service, a Stratum operate is a serverless layer devoted to working stateless and computationally-intensive capabilities. Within a Stratum operate invocation, our deep downscaler is utilized previous to encoding. Fueled by Cosmos, we are able to leverage the underlying Titus infrastructure and run the deep downscaler on all our multi-CPU/GPU environments at scale.

The deep downscaler paves the trail for extra NN functions for video encoding at Netflix. But our journey shouldn’t be completed but and we attempt to enhance and innovate. For instance, we’re learning a number of different use circumstances, comparable to video denoising. We are additionally extra environment friendly options to making use of neural networks at scale. We are concerned about how NN-based instruments can shine as a part of next-generation codecs. At the top of the day, we’re keen about utilizing new applied sciences to enhance Netflix video high quality. For your eyes solely!

We wish to acknowledge the next people for his or her assist with the deep downscaler mission:

Lishan Zhu, Liwei Guo, Aditya Mavlankar, Kyle Swanson and Anush Moorthy (Video Image and Encoding crew), Mariana Afonso and Lukas Krasula (Video Codecs and Quality crew), Ameya Vasani (Media Cloud Engineering crew), Prudhvi Kumar Chaganti (Streaming Encoding Pipeline crew), Chris Pham and Andy Rhines (Data Science and Engineering crew), Amer Ather (Netflix efficiency crew), the Netflix Metaflow crew and Prof. Alan Bovik (University of Texas at Austin).

LEAVE A REPLY

Please enter your comment!
Please enter your name here