Netflix

Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow | by Netflix Technology Blog | Jul, 2025

July 11, 2025

1013

[ad_1]

The high-level diagram above focuses on storage & distribution, illustrating how we leveraged Kafka to separate the write and skim databases. The write database would retailer inside web page content material and metadata from our CMS. The learn database would retailer read-optimized web page content material, for instance: CDN picture URLs moderately than inside asset IDs, and film titles, synopses, and actor names as an alternative of placeholders. This content material ingestion pipeline allowed us to regenerate all consumer-facing content material on demand, making use of new construction and knowledge, reminiscent of world navigation or branding adjustments. The Tudum Ingestion Service transformed inside CMS knowledge right into a read-optimized format by making use of web page templates, operating validations, performing knowledge transformations, and producing the person content material parts right into a Kafka subject. The Data Service Consumer, acquired the content material parts from Kafka, saved them in a high-availability database (Cassandra), and acted as an API layer for the Page Construction service and different inside Tudum providers to retrieve content material.

A key benefit of decoupling learn and write paths is the flexibility to scale them independently. It is a well known architectural strategy to attach each write and skim databases utilizing an occasion pushed structure. As a outcome, content material edits would finally seem on tudum.com.

Did you discover the emphasis on “eventually?” A serious draw back of this structure was the delay between making an edit and observing that edit mirrored on the web site. For occasion, when the workforce publishes an replace, the next steps should happen:

Call the REST endpoint on the third occasion CMS to save lots of the information.
Wait for the CMS to inform the Tudum Ingestion layer by way of a webhook.
Wait for the Tudum Ingestion layer to question all vital sections by way of API, validate knowledge and property, course of the web page, and produce the modified content material to Kafka.
Wait for the Data Service Consumer to devour this message from Kafka and retailer it within the database.
Finally, after some cache refresh delay, this knowledge would finally turn out to be out there to the Page Construction service. Great!

By introducing a highly-scalable eventually-consistent structure we had been lacking the flexibility to rapidly render adjustments after writing them — an essential functionality for inside previews.

In our efficiency profiling, we discovered the supply of delay was our Page Data Service which acted as a facade for an underlying Key Value Data Abstraction database. Page Data Service utilized a close to cache to speed up web page constructing and scale back learn latencies from the database.

This cache was carried out to optimize the N+1 key lookups vital for web page development by having an entire knowledge set in reminiscence. When engineers hear “slow reads,” the fast reply is commonly “cache,” which is strictly what our workforce adopted. The KVDAL close to cache can refresh within the background on each app node. Regardless of which system modifies the information, the cache is up to date with every refresh cycle. If you’ve got 60 keys and a refresh interval of 60 seconds, the close to cache will replace one key per second. This was problematic for previewing latest modifications, as these adjustments had been solely mirrored with every cache refresh. As Tudum’s content material grew, cache refresh occasions elevated, additional extending the delay.

As this ache level grew, a brand new expertise was being developed that might act as our silver bullet. RAW Hollow is an modern in-memory, co-located, compressed object database developed by Netflix, designed to deal with small to medium datasets with assist for robust read-after-write consistency. It addresses the challenges of attaining constant efficiency with low latency and excessive availability in purposes that take care of much less continuously altering datasets. Unlike conventional SQL databases or totally in-memory options, RAW Hollow affords a novel strategy the place the complete dataset is distributed throughout the applying cluster and resides within the reminiscence of every utility course of.

This design leverages compression strategies to scale datasets as much as 100 million data per entity, guaranteeing extraordinarily low latencies and excessive availability. RAW Hollow offers eventual consistency by default, with the choice for robust consistency on the particular person request degree, permitting customers to stability between excessive availability and knowledge consistency. It simplifies the event of extremely out there and scalable stateful purposes by eliminating the complexities of cache synchronization and exterior dependencies. This makes RAW Hollow a strong resolution for effectively managing datasets in environments like Netflix’s streaming providers, the place excessive efficiency and reliability are paramount.

Tudum was an ideal match to battle-test RAW Hollow whereas it was pre-GA internally. Hollow’s high-density close to cache considerably reduces I/O. Having our main dataset in reminiscence permits Tudum’s varied microservices (web page development, search, personalization) to entry knowledge synchronously in O(1) time, simplifying structure, decreasing code complexity, and growing fault tolerance.

[ad_2]

Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow | by Netflix Technology Blog | Jul, 2025

LEAVE A REPLY Cancel reply

ABOUT US

POPULAR POSTS

A Family Affair No More: The Unraveling of Nicole and Keith’s ‘Ideal’ Marriage

Jackpot Secrets: How Players Strike It Big

Defying Gravity and Expectations: ‘Wicked: For Good’ Cast Brings the Magic to NYC

POPULAR CATEGORY