Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix | by Netflix Technology Blog | Jun, 2025

0
242
Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix | by Netflix Technology Blog | Jun, 2025


By Alex Hutter, Alexandre Bertails, Claire Wang, Haoyuan He, Kishore Banala, Peter Royal, Shervin Afshar

As Netflix’s choices develop — throughout movies, collection, video games, reside occasions, and advertisements — so does the complexity of the techniques that help it. Core enterprise ideas like ‘actor’ or ‘movie’ are modeled in lots of locations: in our Enterprise GraphQL Gateway powering inner apps, in our asset administration platform storing media property, in our media computing platform that powers encoding pipelines, to call just a few. Each system fashions these ideas in another way and in isolation, with little coordination or shared understanding. While they usually function on the identical ideas, these techniques stay largely unaware of that truth, and of one another.

Spider-Man Pointing meme with each Spider-Man labelled as: “it’s a movie”, “it’s a tv show”, “it’s a game”.

As a end result, a number of challenges emerge:

  • Duplicated and Inconsistent Models — Teams re-model the identical enterprise entities in several techniques, resulting in conflicting definitions which can be exhausting to reconcile.
  • Inconsistent Terminology — Even inside a single system, groups might use totally different phrases for a similar idea, or the identical time period for various ideas, making collaboration more durable.
  • Data Quality Issues — Discrepancies and damaged references are exhausting to detect throughout our many microservices. While identifiers and overseas keys exist, they’re inconsistently modeled and poorly documented, requiring guide work from area consultants to seek out and repair any knowledge points.
  • Limited Connectivity — Within techniques, relationships between knowledge are constrained by what every system helps. Across techniques, they’re successfully non-existent.

To deal with these challenges, we want new foundations that permit us to outline a mannequin as soon as, on the conceptual stage, and reuse these definitions in every single place. But it isn’t sufficient to only doc ideas; we have to join them to actual techniques and knowledge. And extra than simply join, we’ve got to undertaking these definitions outward, producing schemas and implementing consistency throughout techniques. The conceptual mannequin should turn out to be a part of the management aircraft.

These have been the core concepts that led us to construct UDA.

UDA (Unified Data Architecture) is the muse for linked knowledge in Content Engineering. It permits groups to mannequin domains as soon as and symbolize them persistently throughout techniques — powering automation, discoverability, and semantic interoperability.

Using UDA, customers and techniques can:

Register and join area fashions — formal conceptualizations of federated enterprise domains expressed as knowledge.

  • Why? So everybody makes use of the identical official definitions for enterprise ideas, which avoids confusion and stops totally different groups from rebuilding comparable fashions in conflicting methods.

Catalog and map area fashions to knowledge containers, comparable to GraphQL sort resolvers served by a Domain Graph Service, Data Mesh sources, or Iceberg tables, by their illustration as a graph.

  • Why? To make it simple to seek out the place the precise knowledge for these enterprise ideas lives (e.g., during which particular database, desk, or service) and perceive the way it’s structured there.

Transpile area fashions into schema definition languages like GraphQL, Avro, SQL, RDF, and Java, whereas preserving semantics.

  • Why? To robotically create constant technical knowledge buildings (schemas) for numerous techniques straight from the area fashions, saving builders guide effort and lowering errors attributable to out-of-sync definitions.

Move knowledge faithfully between knowledge containers, comparable to from federated GraphQL entities to Data Mesh (a normal goal knowledge motion and processing platform for shifting knowledge between Netflix techniques at scale), Change Data Capture (CDC) sources to joinable Iceberg Data Products.

  • Why? To save developer time by robotically dealing with how knowledge is moved and appropriately reworked between totally different techniques. This means much less guide work to configure knowledge motion, making certain knowledge exhibits up persistently and precisely wherever it’s wanted.

Discover and discover area ideas by way of search and graph traversal.

  • Why? So anybody can extra simply discover the precise enterprise info they’re in search of, perceive how totally different ideas and knowledge are associated, and be assured they’re accessing the right info.

Programmatically introspect the information graph utilizing Java, GraphQL, or SPARQL.

  • Why? So builders can construct smarter functions that leverage this linked enterprise info, automate extra complicated data-dependent workflows, and assist uncover new insights from the relationships within the knowledge.

This publish introduces the foundations of UDA as a information graph, connecting area fashions to knowledge containers by mappings, and grounded in an in-house metamodel, or mannequin of fashions, referred to as Upper. Upper defines the language for area modeling in UDA and permits projections that robotically generate schemas and pipelines throughout techniques.

Image of the UDA knowledge graph. A central node representing a domain model is connected to other nodes representing Data Mesh, GraphQL, and Iceberg data containers.
The identical area mannequin could be linked to semantically equal knowledge containers within the UDA information graph.

This publish additionally highlights two techniques that leverage UDA in manufacturing:

Primary Data Management (PDM) is our platform for managing authoritative reference knowledge and taxonomies. PDM turns area fashions into flat or hierarchical taxonomies that drive a generated UI for enterprise customers. These taxonomy fashions are projected into Avro and GraphQL schemas, robotically provisioning knowledge merchandise within the Warehouse and GraphQL APIs within the Enterprise Gateway.

Sphere is our self-service operational reporting software for enterprise customers. Sphere makes use of UDA to catalog and relate enterprise ideas throughout techniques, enabling discovery by acquainted phrases like ‘actor’ or ‘movie.’ Once ideas are chosen, Sphere walks the information graph and generates SQL queries to retrieve knowledge from the warehouse, no guide joins or technical mediation required.

UDA is a Knowledge Graph

UDA wants to unravel the knowledge integration downside. We wanted an information catalog unified with a schema registry, however with a tough requirement for semantic integration. Connecting enterprise ideas to schemas and knowledge containers in a graph-like construction, grounded in robust semantic foundations, naturally led us to think about a information graph method.

We selected RDF and SHACL as the muse for UDA’s information graph. But operationalizing them at enterprise scale surfaced a number of challenges:

  • RDF lacked a usable info mannequin. While RDF provides a versatile graph construction, it gives little steerage on the best way to manage knowledge into named graphs, handle ontology possession, or outline governance boundaries. Standard follow-your-nose mechanisms like owl:imports apply solely to ontologies and don’t prolong to named graphs; we wanted a generalized mechanism to precise and resolve dependencies between them.
  • SHACL will not be a modeling language for enterprise knowledge. Designed to validate native RDF, SHACL assumes globally distinctive URIs and a single knowledge graph. But enterprise knowledge is structured round native schemas and typed keys, as in GraphQL, Avro, or SQL. SHACL couldn’t specific these patterns, making it troublesome to mannequin and validate real-world knowledge throughout heterogeneous techniques.
  • Teams lacked shared authoring practices. Without robust tips, groups modeled their ontologies inconsistently breaking semantic interoperability. Even refined variations in fashion, construction, or naming led to divergent interpretations and made transpilation more durable to outline persistently throughout schemas.
  • Ontology tooling lacked help for collaborative modeling. Unlike GraphQL Federation, ontology frameworks had no built-in help for modular contributions, crew possession, or secure federation. Most engineers discovered the instruments and ideas unfamiliar, and out there authoring environments lacked the construction wanted for coordinated contributions.

To deal with these challenges, UDA adopts a named-graph-first info mannequin. Each named graph conforms to a governing mannequin, itself a named graph within the information graph. This systematic method ensures decision, modularity, and permits governance throughout the whole graph. While a full description of UDA’s info infrastructure is past the scope of this publish, the following sections clarify how UDA bootstraps the information graph with its metamodel and makes use of it to mannequin knowledge container representations and mappings.

Upper is Domain Modeling

Upper is a language for formally describing domains — enterprise or system — and their ideas. These ideas are organized into area fashions: managed vocabularies that outline lessons of keyed entities, their attributes, and their relationships to different entities, which can be keyed or nested, inside the identical area or throughout domains. Keyed ideas inside a website mannequin could be organized in taxonomies of varieties, which could be as complicated because the enterprise or the information system wants them to be. Keyed ideas can be prolonged from different area fashions — that’s, new attributes and relationships could be contributed monotonically. Finally, Upper ships with a wealthy set of datatypes for attribute values, which can be personalized per area.

Visualization of the UDA graph representation of a One Piece character. The Character node in the graph is connected to a Devil Fruit node. The Devil Fruit node is connected to a Devil Fruit Type node.
The graph illustration of the onepiece: area mannequin from our UI. Depicted right here you may see how Characters are associated to Devil Fruit, and that every Devil Fruit has a sort.

Upper area fashions are knowledge. They are expressed as conceptual RDF and arranged into named graphs, making them introspectable, queryable, and versionable inside the UDA information graph. This graph unifies not simply the area fashions themselves, but in addition the schemas they transpile to — GraphQL, Avro, Iceberg, Java — and the mappings that join area ideas to concrete knowledge containers, comparable to GraphQL sort resolvers served by a Domain Graph Service, Data Mesh sources, or Iceberg tables, by their representations. Upper raises the extent of abstraction above conventional ontology languages: it defines a strict subset of semantic applied sciences from the W3C tailor-made and generalized for area modeling. It builds on ontology frameworks like RDFS, OWL, and SHACL so area authors can mannequin successfully with out even needing to be taught what an ontology is.

Screenshot of UDA UI showing domain model for One Piece serialized as Turtle.
UDA area mannequin for One Piece. Link to full definition.

Upper is the metamodel for Connected Data in UDA — the mannequin for all fashions. It is designed as a bootstrapping higher ontology, which implies that Upper is self-referencing, as a result of it fashions itself as a website mannequin; self-describing, as a result of it defines the very idea of a website mannequin; and self-validating, as a result of it conforms to its personal mannequin. This method permits UDA to bootstrap its personal infrastructure: Upper itself is projected right into a generated Jena-based Java API and GraphQL schema utilized in GraphQL service federated into Netflix’s Enterprise GraphQL gateway. These identical generated APIs are then utilized by the projections and the UI. Because all area fashions are conservative extensions of Upper, different system area fashions — together with these for GraphQL, Avro, Data Mesh, and Mappings — combine seamlessly into the identical runtime, enabling constant knowledge semantics and interoperability throughout schemas.

Screenshot of an IDE. It shows Java code using the generated API from the Upper metamodel to traverse and print terms from a domain domain in the top while the bottom contains the output of an execution.
Traversing a website mannequin programmatically utilizing the Java API generated from the Upper metamodel.

Data Container Representations

Data containers are repositories of data. They include occasion knowledge that conform to their very own schema languages or sort techniques: federated entities from GraphQL providers, Avro data from Data Mesh sources, rows from Iceberg tables, or objects from Java APIs. Each container operates inside the context of a system that imposes its personal structural and operational constraints.

Screenshot of a UI showing details for a Data Mesh Source containing One Piece Characters.
A Data Mesh supply is an information container.

Data container representations are knowledge. They are devoted interpretations of the members of information techniques as graph knowledge. UDA captures the definition of those techniques as their very own area fashions, the system domains. These fashions encode each the knowledge structure of the techniques and the schemas of the information containers inside. They present a blueprint for translating the techniques into graph representations.

LEAVE A REPLY

Please enter your comment!
Please enter your name here