The Making of VES: the Cosmos Microservice for Netflix Video Encoding | by Netflix Technology Blog | Apr, 2024

0
200
The Making of VES: the Cosmos Microservice for Netflix Video Encoding | by Netflix Technology Blog | Apr, 2024


Liwei Guo, Vinicius Carvalho, Anush Moorthy, Aditya Mavlankar, Lishan Zhu

This is the second put up in a multi-part sequence from Netflix. See right here for Part 1 which offers an summary of our efforts in rebuilding the Netflix video processing pipeline with microservices. This weblog dives into the main points of constructing our Video Encoding Service (VES), and shares our learnings.

Cosmos is the following technology media computing platform at Netflix. Combining microservice structure with asynchronous workflows and serverless capabilities, Cosmos goals to modernize Netflix’s media processing pipelines with improved flexibility, effectivity, and developer productiveness. In the previous few years, the video crew inside Encoding Technologies (ET) has been engaged on rebuilding the whole video pipeline on Cosmos.

This new pipeline consists of a variety of microservices, every devoted to a single performance. One such microservice is Video Encoding Service (VES). Encoding is a vital part of the video pipeline. At a excessive degree, it takes an ingested mezzanine and encodes it right into a video stream that’s appropriate for Netflix streaming or serves some studio/manufacturing use case. In the case of Netflix, there are a selection of necessities for this service:

  • Given the wide selection of units from cell phones to browsers to Smart TVs, a number of codec codecs, resolutions, and high quality ranges should be supported.
  • Chunked encoding is a should to fulfill the latency necessities of our enterprise wants, and use instances with totally different ranges of latency sensitivity should be accommodated.
  • The functionality of steady launch is essential for enabling quick product innovation in each streaming and studio areas.
  • There is a big quantity of encoding jobs daily. The service must be cost-efficient and take advantage of use of accessible sources.

In this tech weblog, we are going to stroll via how we constructed VES to realize the above objectives and can share a variety of classes we discovered from constructing microservices. Please be aware that for simplicity, we now have chosen to omit sure Netflix-specific particulars that aren’t integral to the first message of this weblog put up.

A Cosmos microservice consists of three layers: an API layer (Optimus) that takes in requests, a workflow layer (Plato) that orchestrates the media processing flows, and a serverless computing layer (Stratum) that processes the media. These three layers talk asynchronously via a home-grown, priority-based messaging system referred to as Timestone. We selected Protobuf because the payload format for its excessive effectivity and mature cross-platform assist.

To assist service builders get a head begin, the Cosmos platform offers a strong service generator. This generator options an intuitive UI. With just a few clicks, it creates a primary but full Cosmos service: code repositories for all 3 layers are created; all platform capabilities, together with discovery, logging, tracing, and so on., are enabled; launch pipelines are arrange and dashboards are readily accessible. We can instantly begin including video encoding logic and deploy the service to the cloud for experimentation.

Optimus

As the API layer, Optimus serves because the gateway into VES, that means service customers can solely work together with VES via Optimus. The outlined API interface is a robust contract between VES and the exterior world. As lengthy because the API is steady, customers are shielded from inner adjustments in VES. This decoupling is instrumental in enabling sooner iterations of VES internals.

As a single-purpose service, the API of VES is sort of clear. We outlined an endpoint encodeVideo that takes an EncodeRequest and returns an EncodeResponse (in an async means via Timestone messages). The EncodeRequest object incorporates details about the supply video in addition to the encoding recipe. All the necessities of the encoded video (codec, decision, and so on.) in addition to the controls for latency (chunking directives) are uncovered via the information mannequin of the encoding recipe.

//protobuf definition 

message EncodeRequest {
VideoSource video_source = 1;//supply to be encoded
Recipe recipe = 2; //together with encoding format, decision, and so on.
}

message EncodeResponse {
OutputVideo output_video = 1; //encoded video
Error error = 2; //error message (non-compulsory)
}

message Recipe {
Codec codec = 1; //together with codec format, profile, degree, and so on.
Resolution decision = 2;
ChunkingDirectives chunking_directives = 3;
...
}

Like another Cosmos service, the platform mechanically generates an RPC shopper primarily based on the VES API knowledge mannequin, which customers can use to construct the request and invoke VES. Once an incoming request is acquired, Optimus performs validations, and (when relevant) converts the incoming knowledge into an inner knowledge mannequin earlier than passing it to the following layer, Plato.

Like another Cosmos service, the platform mechanically generates an RPC shopper primarily based on the VES API knowledge mannequin, which customers can use to construct the request and invoke VES. Once an incoming request is acquired, Optimus performs validations, and (when relevant) converts the incoming knowledge into an inner knowledge mannequin earlier than passing it to the following layer, Plato.

The workflow layer, Plato, governs the media processing steps. The Cosmos platform helps two programming paradigms for Plato: ahead chaining rule engine and Directed Acyclic Graph (DAG). VES has a linear workflow, so we selected DAG for its simplicity.

In a DAG, the workflow is represented by nodes and edges. Nodes characterize phases within the workflow, whereas edges signify dependencies — a stage is simply able to execute when all its dependencies have been accomplished. VES requires parallel encoding of video chunks to fulfill its latency and resilience objectives. This workflow-level parallelism is facilitated by the DAG via a MapReduce mode. Nodes will be annotated to point this relationship, and a Reduce node will solely be triggered when all its related Map nodes are prepared.

For the VES workflow, we outlined 5 Nodes and their related edges, that are visualized within the following graph:

  • Splitter Node: This node divides the video into chunks primarily based on the chunking directives within the recipe.
  • Encoder Node: This node encodes a video chunk. It is a Map node.
  • Assembler Node: This node stitches the encoded chunks collectively. It is a Reduce node.
  • Validator Node: This node performs the validation of the encoded video.
  • Notifier Node: This node notifies the API layer as soon as the whole workflow is accomplished.

In this workflow, nodes such because the Notifier carry out very light-weight operations and will be straight executed within the Plato runtime. However, resource-intensive operations should be delegated to the computing layer (Stratum), or one other service. Plato invokes Stratum capabilities for duties comparable to encoding and assembling, the place the nodes (Encoder and Assembler) put up messages to the corresponding message queues. The Validator node calls one other Cosmos service, the Video Validation Service, to validate the assembled encoded video.

Stratum

The computing layer, Stratum, is the place media samples will be accessed. Developers of Cosmos companies create Stratum Functions to course of the media. They can carry their very own media processing instruments, that are packaged into Docker photos of the Functions. These Docker photos are then printed to our inner Docker registry, a part of Titus. In manufacturing, Titus mechanically scales situations primarily based on the depths of job queues.

VES must assist encoding supply movies into quite a lot of codec codecs, together with AVC, AV1, and VP9, to call just a few. We use totally different encoder binaries (referred to easily as “encoders”) for various codec codecs. For AVC, a format that’s now 20 years outdated, the encoder is sort of steady. On the opposite hand, the latest addition to Netflix streaming, AV1, is repeatedly going via lively enhancements and experimentations, necessitating extra frequent encoder upgrades. ​​To successfully handle this variability, we determined to create a number of Stratum Functions, every devoted to a selected codec format and will be launched independently. This method ensures that upgrading one encoder is not going to affect the VES service for different codec codecs, sustaining stability and efficiency throughout the board.

Within the Stratum Function, the Cosmos platform offers abstractions for widespread media entry patterns. Regardless of file codecs, sources are uniformly offered as domestically mounted frames. Similarly, for output that must be persevered within the cloud, the platform presents the method as writing to an area file. All particulars, comparable to streaming of bytes and retrying on errors, are abstracted away. With the platform taking good care of the complexity of the infrastructure, the important code for video encoding within the Stratum Function may very well be so simple as follows.

ffmpeg -i enter/supplypercent08d.j2k -vf ... -c:v libx264 ... output/encoding.264

Encoding is a resource-intensive course of, and the sources required are intently associated to the codec format and the encoding recipe. We carried out benchmarking to grasp the useful resource utilization sample, notably CPU and RAM, for various encoding recipes. Based on the outcomes, we leveraged the “container shaping” characteristic from the Cosmos platform.

We outlined a variety of totally different “container shapes”, specifying the allocations of sources like CPU and RAM.

# an instance definition of container form
group: containerShapeExample1
sources:
numCpus: 2
reminiscenceInMB: 4000
communityInMbp: 750
diskSizeInMB: 12000

Routing guidelines are created to assign encoding jobs to totally different shapes primarily based on the mixture of codec format and encoding decision. This helps the platform carry out “bin packing”, thereby maximizing useful resource utilization.

An instance of “bin-packing”. The circles characterize CPU cores and the realm represents the RAM. This 16-core EC2 occasion is filled with 5 encoding containers (rectangles) of three totally different shapes (indicated by totally different colours).

After we accomplished the event and testing of all three layers, VES was launched in manufacturing. However, this didn’t mark the tip of our work. Quite the opposite, we believed and nonetheless do {that a} important a part of a service’s worth is realized via iterations: supporting new enterprise wants, enhancing efficiency, and bettering resilience. An essential piece of our imaginative and prescient was for Cosmos companies to have the flexibility to repeatedly launch code adjustments to manufacturing in a protected method.

Focusing on a single performance, code adjustments pertaining to a single characteristic addition in VES are typically small and cohesive, making them straightforward to assessment. Since callers can solely work together with VES via its API, inner code is really “implementation details” which are protected to vary. The specific API contract limits the check floor of VES. Additionally, the Cosmos platform offers a pyramid-based testing framework to information builders in creating assessments at totally different ranges.

After testing and code assessment, adjustments are merged and are prepared for launch. The launch pipeline is totally automated: after the merge, the pipeline checks out code, compiles, builds, runs unit/integration/end-to-end assessments as prescribed, and proceeds to full deployment if no points are encountered. Typically, it takes round half-hour from code merge to characteristic touchdown (a course of that took 2–4 weeks in our earlier technology platform!). The brief launch cycle offers sooner suggestions to builders and helps them make obligatory updates whereas the context remains to be contemporary.

Screenshot of a launch pipeline run in our manufacturing setting

When operating in manufacturing, the service continually emits metrics and logs. They are collected by the platform to visualise dashboards and to drive monitoring/alerting methods. Metrics deviating an excessive amount of from the baseline will set off alerts and might result in automated service rollback (when the “canary” characteristic is enabled).

VES was the very first microservice that our crew constructed. We began with primary data of microservices and discovered a mess of classes alongside the way in which. These learnings deepened our understanding of microservices and have helped us enhance our design decisions and choices.

Define a Proper Service Scope

A precept of microservice structure is {that a} service ought to be constructed for a single performance. This sounds easy, however what precisely qualifies a “single functionality”? “Encoding video” sounds good however wouldn’t “encode video into the AVC format” be an much more particular single-functionality?

When we began constructing the VES, we took the method of making a separate encoding service for every codec format. While this has benefits comparable to decoupled workflows, shortly we have been overwhelmed by the event overhead. Imagine {that a} consumer requested us so as to add the watermarking functionality to the encoding. We wanted to make adjustments to a number of microservices. What is worse, adjustments in all these companies are very related and primarily we’re including the identical code (and assessments) many times. Such form of repetitive work can simply put on out builders.

The service offered on this weblog is our second iteration of VES (sure, we already went via one iteration). In this model, we consolidated encodings for various codec codecs right into a single service. They share the identical API and workflow, whereas every codec format has its personal Stratum Functions. So far this appears to strike a great steadiness: the widespread API and workflow reduces code repetition, whereas separate Stratum Functions assure unbiased evolution of every codec format.

The adjustments we made are usually not irreversible. If sometime sooner or later, the encoding of 1 specific codec format evolves into a very totally different workflow, we now have the choice to spin it off into its personal microservice.

Be Pragmatic about Data Modeling

In the start, we have been very strict about knowledge mannequin separation — we had a robust perception that sharing equates to coupling, and coupling might result in potential disasters sooner or later. To keep away from this, for every service in addition to the three layers inside a service, we outlined its personal knowledge mannequin and constructed converters to translate between totally different knowledge fashions.

We ended up creating a number of knowledge fashions for elements comparable to bit-depth and determination throughout our system. To be truthful, this does have some deserves. For instance, our encoding pipeline helps totally different bit-depths for AVC encoding (8-bit) and AV1 encoding (10-bit). By defining each AVC.BitDepth and AV1.BitDepth, constraints on the bit-depth will be constructed into the information fashions. However, it’s debatable whether or not the advantages of this differentiation energy outweigh the downsides, specifically a number of knowledge mannequin translations.

Eventually, we created a library to host knowledge fashions for widespread ideas within the video area. Examples of such ideas embrace body charge, scan kind, coloration area, and so on. As you possibly can see, they’re extraordinarily widespread and steady. This “common” knowledge mannequin library is shared throughout all companies owned by the video crew, avoiding pointless duplications and knowledge conversions. Within every service, further knowledge fashions are outlined for service-specific objects.

Embrace Service API Changes

This could sound contradictory. We have been saying that an API is a robust contract between the service and its customers, and preserving an API steady shields customers from inner adjustments. This is completely true. However, none of us had a crystal ball once we have been designing the very first model of the service API. It is inevitable that at a sure level, this API turns into insufficient. If we maintain the idea that “the API cannot change” too dearly, builders can be pressured to seek out workarounds, that are nearly definitely sub-optimal.

There are many nice tech articles about gracefully evolving API. We imagine we even have a novel benefit: VES is a service inner to Netflix Encoding Technologies (ET). Our two customers, the Streaming Workflow Orchestrator and the Studio Workflow Orchestrator, are owned by the workflow crew inside ET. Our groups share the identical contexts and work in the direction of widespread objectives. If we imagine updating API is in the most effective curiosity of Netflix, we meet with them to hunt alignment. Once a consensus to replace the API is reached, groups collaborate to make sure a clean transition.

This is the second a part of our tech weblog sequence Rebuilding Netflix Video Pipeline with Microservices. In this put up, we described the constructing technique of the Video Encoding Service (VES) intimately in addition to our learnings. Our pipeline features a few different companies that we plan to share about as effectively. Stay tuned for our future blogs on this matter of microservices!

LEAVE A REPLY

Please enter your comment!
Please enter your name here