Rebuilding Netflix Video Processing Pipeline with Microservices | by Netflix Technology Blog | Jan, 2024

0
186
Rebuilding Netflix Video Processing Pipeline with Microservices | by Netflix Technology Blog | Jan, 2024


Netflix Technology Blog

Netflix TechBlog

Liwei Guo, Anush Moorthy, Li-Heng Chen, Vinicius Carvalho, Aditya Mavlankar, Agata Opalach, Adithya Prakash, Kyle Swanson, Jessica Tweneboah, Subbu Venkatrav, Lishan Zhu

This is the primary weblog in a multi-part sequence on how Netflix rebuilt its video processing pipeline with microservices, so we will keep our speedy tempo of innovation and repeatedly enhance the system for member streaming and studio operations. This introductory weblog focuses on an summary of our journey. Future blogs will present deeper dives into every service, sharing insights and classes discovered from this course of.

The Netflix video processing pipeline went reside with the launch of our streaming service in 2007. Since then, the video pipeline has undergone substantial enhancements and broad expansions:

  • Starting with Standard Dynamic Range (SDR) at Standard-Definitions, we expanded the encoding pipeline to 4K and High Dynamic Range (HDR) which enabled help for our premium providing.
  • We moved from centralized linear encoding to distributed chunk-based encoding. This structure shift drastically lowered the processing latency and elevated system resiliency.
  • Moving away from using devoted cases that had been constrained in amount, we tapped into Netflix’s inner trough created resulting from autoscaling microservices, resulting in vital enhancements in computation elasticity in addition to useful resource utilization effectivity.
  • We rolled out encoding improvements similar to per-title and per-shot optimizations, which offered vital quality-of-experience (QoE) enchancment to Netflix members.
  • By integrating with studio content material programs, we enabled the pipeline to leverage wealthy metadata from the inventive aspect and create extra participating member experiences like interactive storytelling.
  • We expanded pipeline help to serve our studio/content-development use instances, which had totally different latency and resiliency necessities as in comparison with the standard streaming use case.

Our expertise of the final decade-and-a-half has bolstered our conviction that an environment friendly, versatile video processing pipeline that enables us to innovate and help our streaming service, in addition to our studio companions, is vital to the continued success of Netflix. To that finish, the Video and Image Encoding group in Encoding Technologies (ET) has spent the previous few years rebuilding the video processing pipeline on our next-generation microservice-based computing platform Cosmos.

Reloaded

Starting in 2014, we developed and operated the video processing pipeline on our third-generation platform Reloaded. Reloaded was well-architected, offering good stability, scalability, and an inexpensive degree of flexibility. It served as the muse for quite a few encoding improvements developed by our group.

When Reloaded was designed, we centered on a single use case: changing high-quality media information (also referred to as mezzanines) obtained from studios into compressed belongings for Netflix streaming. Reloaded was created as a single monolithic system, the place builders from numerous media groups in ET and our platform accomplice group Content Infrastructure and Solutions (CIS)¹ labored on the identical codebase, constructing a single system that dealt with all media belongings. Over the years, the system expanded to help numerous new use instances. This led to a big improve in system complexity, and the constraints of Reloaded started to indicate:

  • Coupled performance: Reloaded was composed of plenty of employee modules and an orchestration module. The setup of a brand new Reloaded module and its integration with the orchestration required a non-trivial quantity of effort, which led to a bias in the direction of augmentation fairly than creation when growing new functionalities. For instance, in Reloaded the video high quality calculation was carried out contained in the video encoder module. With this implementation, it was extraordinarily tough to recalculate video high quality with out re-encoding.
  • Monolithic construction: Since Reloaded modules had been usually co-located in the identical repository, it was simple to miss code-isolation guidelines and there was fairly a little bit of unintended reuse of code throughout what ought to have been robust boundaries. Such reuse created tight coupling and lowered growth velocity. The tight coupling amongst modules additional pressured us to deploy all modules collectively.
  • Long launch cycles: The joint deployment meant that there was elevated concern of unintended manufacturing outages as debugging and rollback might be tough for a deployment of this measurement. This drove the strategy of the “release train”. Every two weeks, a “snapshot” of all modules was taken, and promoted to be a “release candidate”. This launch candidate then went via exhaustive testing which tried to cowl as giant a floor space as doable. This testing stage took about two weeks. Thus, relying on when the code change was merged, it may take wherever between two and 4 weeks to succeed in manufacturing.

As time progressed and functionalities grew, the speed of recent characteristic contributions in Reloaded dropped. Several promising concepts had been deserted owing to the outsized work wanted to beat architectural limitations. The platform that had as soon as served us properly was now turning into a drag on growth.

Cosmos

As a response, in 2018 the CIS and ET groups began growing the next-generation platform, Cosmos. In addition to the scalability and the soundness that the builders already loved in Reloaded, Cosmos aimed to considerably improve system flexibility and have growth velocity. To obtain this, Cosmos was developed as a computing platform for workflow-driven, media-centric microservices.

The microservice structure supplies robust decoupling between companies. Per-microservice workflow help eases the burden of implementing advanced media workflow logic. Finally, related abstractions permit media algorithm builders to give attention to the manipulation of video and audio indicators fairly than on infrastructural considerations. A complete checklist of advantages supplied by Cosmos might be discovered within the linked weblog.

Service Boundaries

In the microservice structure, a system consists of plenty of fine-grained companies, with every service specializing in a single performance. So the primary (and arguably crucial) factor is to determine boundaries and outline companies.

In our pipeline, as media belongings journey via creation to ingest to supply, they undergo plenty of processing steps similar to analyses and transformations. We analyzed these processing steps to determine “boundaries” and grouped them into totally different domains, which in flip turned the constructing blocks of the microservices we engineered.

As an instance, in Reloaded, the video encoding module bundles 5 steps:

1. divide the enter video into small chunks

2. encode every chunk independently

3. calculate the standard rating (VMAF) of every chunk

4. assemble all of the encoded chunks right into a single encoded video

5. mixture high quality scores from all chunks

From a system perspective, the assembled encoded video is of main concern whereas the interior chunking and separate chunk encodings exist in an effort to fulfill sure latency and resiliency necessities. Further, as alluded to above, the video high quality calculation supplies a very separate performance as in comparison with the encoding service.

Thus, in Cosmos, we created two unbiased microservices: Video Encoding Service (VES) and Video Quality Service (VQS), every of which serves a transparent, decoupled perform. As implementation particulars, the chunked encoding and the assembling had been abstracted away into the VES.

Video Services

The strategy outlined above was utilized to the remainder of the video processing pipeline to determine functionalities and therefore service boundaries, resulting in the creation of the next video services².

  1. Video Inspection Service (VIS): This service takes a mezzanine because the enter and performs numerous inspections. It extracts metadata from totally different layers of the mezzanine for downstream companies. In addition, the inspection service flags points if invalid or sudden metadata is noticed and supplies actionable suggestions to the upstream group.
  2. Complexity Analysis Service (CAS): The optimum encoding recipe is extremely content-dependent. This service takes a mezzanine because the enter and performs evaluation to grasp the content material complexity. It calls Video Encoding Service for pre-encoding and Video Quality Service for high quality analysis. The outcomes are saved to a database to allow them to be reused.
  3. Ladder Generation Service (LGS): This service creates a complete bitrate ladder for a given encoding household (H.264, AV1, and so on.). It fetches the complexity information from CAS and runs the optimization algorithm to create encoding recipes. The CAS and LGS cowl a lot of the improvements that we have now beforehand offered in our tech blogs (per-title, cellular encodes, per-shot, optimized 4K encoding, and so on.). By wrapping ladder era right into a separate microservice (LGS), we decouple the ladder optimization algorithms from the creation and administration of complexity evaluation information (which resides in CAS). We count on this to provide us larger freedom for experimentation and a quicker fee of innovation.
  4. Video Encoding Service (VES): This service takes a mezzanine and an encoding recipe and creates an encoded video. The recipe contains the specified encoding format and properties of the output, similar to decision, bitrate, and so on. The service additionally supplies choices that permit fine-tuning latency, throughput, and so on., relying on the use case.
  5. Video Validation Service (VVS): This service takes an encoded video and a listing of expectations in regards to the encode. These expectations embody attributes specified within the encoding recipe in addition to conformance necessities from the codec specification. VVS analyzes the encoded video and compares the outcomes towards the indicated expectations. Any discrepancy is flagged within the response to alert the caller.
  6. Video Quality Service (VQS): This service takes the mezzanine and the encoded video as enter, and calculates the standard rating (VMAF) of the encoded video.

Service Orchestration

Each video service supplies a devoted performance they usually work collectively to generate the wanted video belongings. Currently, the 2 essential use instances of the Netflix video pipeline are producing belongings for member streaming and for studio operations. For every use case, we created a devoted workflow orchestrator so the service orchestration might be personalized to greatest meet the corresponding enterprise wants.

For the streaming use case, the generated movies are deployed to our content material supply community (CDN) for Netflix members to devour. These movies can simply be watched hundreds of thousands of instances. The Streaming Workflow Orchestrator makes use of nearly all video companies to create streams for an impeccable member expertise. It leverages VIS to detect and reject non-conformant or low-quality mezzanines, invokes LGS for encoding recipe optimization, encodes video utilizing VES, and calls VQS for high quality measurement the place the standard information is additional fed to Netflix’s information pipeline for analytics and monitoring functions. In addition to video companies, the Streaming Workflow Orchestrator makes use of audio and timed textual content companies to generate audio and textual content belongings, and packaging companies to “containerize” belongings for streaming.

For the studio use case, some instance video belongings are advertising and marketing clips and every day manufacturing editorial proxies. The requests from the studio aspect are typically latency-sensitive. For instance, somebody from the manufacturing group could also be ready for the video to evaluation to allow them to resolve the capturing plan for the subsequent day. Because of this, the Studio Workflow Orchestrator optimizes for quick turnaround and focuses on core media processing companies. At this time, the Studio Workflow Orchestrator calls VIS to extract metadata of the ingested belongings and calls VES with predefined recipes. Compared to member streaming, studio operations have totally different and distinctive necessities for video processing. Therefore, the Studio Workflow Orchestrator is the unique consumer of some encoding options like forensic watermarking and timecode/textual content burn-in.

LEAVE A REPLY

Please enter your comment!
Please enter your name here