How Netflix Accurately Attributes eBPF Flow Logs | by Netflix Technology Blog | Apr, 2025

0
98
How Netflix Accurately Attributes eBPF Flow Logs | by Netflix Technology Blog | Apr, 2025


By Cheng Xie, Bryan Shultz, and Christine Xu

In a earlier weblog submit, we described how Netflix makes use of eBPF to seize TCP move logs at scale for enhanced community insights. In this submit, we delve deeper into how Netflix solved a core downside: precisely attributing move IP addresses to workload identities.

FlowExporter is a sidecar that runs alongside all Netflix workloads. It makes use of eBPF and TCP tracepoints to observe TCP socket state adjustments. When a TCP socket closes, FlowExporter generates a move log report that features the IP addresses, ports, timestamps, and extra socket statistics. On common, 5 million information are produced per second.

In cloud environments, IP addresses are reassigned to completely different workloads as workload situations are created and terminated, so IP addresses alone can not present insights on which workloads are speaking. To make the move logs helpful, every IP deal with have to be attributed to its corresponding workload identification. FlowCollector, a backend service, collects move logs from FlowExporter situations throughout the fleet, attributes the IP addresses, and sends these attributed flows to Netflix’s Data Mesh for subsequent stream and batch processing.

The eBPF move logs present a complete view of service topology and community well being throughout Netflix’s in depth microservices fleet, whatever the programming language, RPC mechanism, or application-layer protocol utilized by particular person workloads.

Accurately attributing move IP addresses to workload identities has been a major problem since our eBPF move logs had been launched.

As famous in our earlier weblog submit, our preliminary attribution strategy relied on Sonar, an inner IP deal with monitoring service that emits an occasion at any time when an IP deal with in Netflix’s AWS VPCs is assigned or unassigned to a workload. FlowCollector consumes a stream of IP deal with change occasions from Sonar and makes use of this data to attribute move IP addresses in real-time.

The elementary downside of this technique is that it could possibly result in misattribution. Delays and failures are inevitable in distributed programs, which can delay IP deal with change occasions from reaching FlowCollector. For occasion, an IP deal with could initially be assigned to workload X however later reassigned to workload Y. However, if the change occasion for this reassignment is delayed, FlowCollector will proceed to imagine that the IP deal with belongs to workload X, leading to misattributed flows. Additionally, occasion timestamps could also be inaccurate relying on how they’re captured.

Misattribution rendered the move information unreliable for decision-making. Users typically depend upon move logs to validate workload dependencies, however misattribution creates confusion. Without skilled data of anticipated dependencies, customers would wrestle to establish or verify misattribution. Moreover, misattribution occurred continuously for vital providers with a big footprint as a result of frequent IP deal with adjustments. Overall, misattribution makes fleet-wide dependency evaluation impractical.

As a workaround, we made FlowCollector maintain obtained flows for quarter-hour earlier than attribution, permitting time for delayed IP deal with change occasions. While this strategy decreased misattribution, it didn’t remove it. Moreover, the ready interval made the information much less contemporary, decreasing its utility for real-time evaluation.

Fully eliminating misattribution is essential as a result of it solely takes a single misattributed move to supply an incorrect workload dependency. Solving this downside required a whole rethinking of our strategy. Over the previous yr, Netflix developed a brand new attribution technique that has lastly eradicated misattribution, as detailed in the remainder of this submit.

Each socket has two IP addresses: an area IP deal with and a distant IP deal with. Previously, we used the identical technique to attribute each. However, attributing the native IP deal with needs to be a less complicated job for the reason that native IP deal with belongs to the occasion the place FlowExporter captures the socket. Therefore, FlowExporter ought to decide the native workload identification from its setting and attribute the native IP deal with earlier than sending the move to FlowCollector.

This is simple for workloads working instantly on EC2 situations, as Netflix’s Metatron provisions workload identification certificates to every EC2 occasion at boot time. FlowExporter can merely learn these certificates from the native disk to find out the native workload identification.

Attributing native IP addresses for container workloads working on Netflix’s container platform, Titus, is tougher. FlowExporter runs on the container host stage, the place every host manages a number of container workloads with completely different identities. When FlowExporter’s eBPF packages obtain a socket occasion from TCP tracepoints within the kernel, the socket could have been created by one of many container workloads or by the host itself. Therefore, FlowExporter should decide which workload to attribute the socket’s native IP deal with to. To resolve this downside, we leveraged IPMan, Netflix’s container IP deal with task service. IPManAgent, a daemon working on each container host, is liable for assigning and unassigning IP addresses. As container workloads are launched, IPManAgent writes an IP-address-to-workload-ID mapping to an eBPF map, which FlowExporter’s eBPF packages can then use to lookup the workload ID related to a socket native IP deal with.

Another problem was to accommodate Netflix’s IPv6 to IPv4 translation mechanism on Titus. To facilitate IPv6 migration, Netflix developed a mechanism that permits IPv6-only containers to speak with IPv4 locations with out incurring NAT64 overhead. This mechanism intercepts join syscalls and replaces the underlying socket with one which makes use of a shared IPv4 deal with assigned to the container host. This confuses FlowExporter as a result of the kernel stories the identical native IPv4 deal with for sockets created by completely different container workloads. To disambiguate, native port data is moreover required. We modified Titus to put in writing a mapping of (native IPv4 deal with, native port) to the workload ID into an eBPF map at any time when a join syscall is intercepted. FlowExporter’s eBPF packages then use this map to appropriately attribute sockets created by the interpretation mechanism.

With these issues solved, we will now precisely attribute the native IP deal with of each move.

Once the native IP deal with attribution downside is solved, precisely attributing distant IP addresses turns into possible. Now, every move reported by FlowExporter contains the native IP deal with, the native workload identification, and connection begin/finish timestamps. As FlowCollector receives these flows, it could possibly be taught the time ranges throughout which every workload owns a given IP deal with. For occasion, if FlowCollector sees a move with native IP deal with 10.0.0.1 related to workload X that begins at t1 and ends at t2, it could possibly deduce that 10.0.0.1 belonged to workload X from t1 to t2. Since Netflix makes use of Amazon Time Sync throughout its fleet, the timestamps (captured by FlowExporter) are dependable.

The FlowCollector service cluster consists of many nodes. Every node have to be able to attributing arbitrary distant IP addresses and, due to this fact, requires data of all workload IP addresses and their latest possession information. To signify this data, every node maintains an in-memory hashmap that maps an IP deal with to a listing of time ranges, as illustrated by the next Go structs:

kind IPAddressTracker struct {
ipToTimeRanges map[netip.Addr]timeRanges
}

kind timeRanges []timeRange

kind timeRange struct {
workloadID string
begin time.Time
finish time.Time
}

To populate the hashmap, FlowCollector extracts the native IP deal with, native workload identification, begin time, and finish time from every obtained move and creates/extends the corresponding time ranges within the map. The time ranges for every IP deal with are sorted in ascending order, and they’re non-overlapping since an IP deal with can not belong to 2 completely different workloads concurrently.

Since every move is just despatched to at least one FlowCollector node, every node should share the time ranges it realized from obtained flows with different nodes. We carried out a broadcasting mechanism utilizing Kafka, the place every node publishes realized time ranges to all different nodes. Although extra environment friendly broadcasting implementations exist, the Kafka-based strategy is straightforward and has labored nicely for us.

Now, FlowCollector can attribute distant IP addresses by trying them up within the populated map, which returns a listing of time ranges. It then makes use of the move’s begin timestamp to find out the corresponding time vary and related workload identification. If the beginning time doesn’t fall inside any time vary, FlowCollector will retry after a delay, finally giving up if the retry fails. Such failures could happen when flows are misplaced or broadcast messages are delayed. For our use instances, it’s acceptable to depart a small proportion of flows unattributed, however any misattribution is unacceptable.

This new technique achieves correct attribution because of the continual heartbeats, every related to a dependable time vary of IP deal with possession. It handles transient points gracefully — just a few delayed or misplaced heartbeats don’t result in misattribution. In distinction, the earlier technique relied solely on discrete IP deal with task and unassignment occasions. Lacking heartbeats, it needed to presume an IP deal with remained assigned till notified in any other case (which may be hours or days later), making it weak to misattribution when the notifications had been delayed.

One element is that when FlowCollector receives a move, it can not attribute its distant IP deal with instantly as a result of it requires the most recent noticed time ranges for the distant IP deal with. Since FlowExporter stories flows in batches each minute, FlowCollector should wait till it receives the move batch from the distant workload FlowExporter for the final minute, which can not have arrived but. To deal with this, FlowCollector briefly shops obtained flows on disk for one minute earlier than attributing their distant IP addresses. This introduces a 1-minute delay, however it’s a lot shorter than the 15-minute delay with the earlier strategy.

In addition to producing correct attribution, the brand new technique can also be cost-effective because of its simplicity and in-memory lookups. Because the in-memory state may be shortly rebuilt when a FlowCollector node begins up, no persistent storage is required. With 30 c7i.2xlarge situations, we will course of 5 million flows per second for all the Netflix fleet.

For simplicity, we’ve got to date glossed over one subject: regionalization. Netflix’s cloud microservices function throughout a number of AWS areas. To optimize move reporting and reduce cross-regional site visitors, a FlowCollector cluster runs in every main area, and FlowExporter brokers ship flows to their corresponding regional FlowCollector. When FlowCollector receives a move, its native IP deal with is assured to be throughout the area.

To reduce cross-region site visitors, the broadcasting mechanism is restricted to FlowCollector nodes throughout the identical area. Consequently, the IP deal with time ranges map accommodates solely IP addresses from that area. However, cross-regional flows have a distant IP deal with in a distinct area. To attribute these flows, the receiving FlowCollector node forwards them to nodes within the corresponding area. FlowCollector determines the area for a distant IP deal with by trying up a trie constructed from all Netflix VPC CIDRs. This strategy is extra environment friendly than broadcasting IP deal with time vary updates throughout all areas, as only one% of Netflix flows are cross-regional.

So far, FlowCollector can precisely attribute IP addresses belonging to Netflix’s cloud workloads. However, not all move IP addresses fall into this class. For occasion, a good portion of flows goes by way of AWS ELBs. For these flows, their distant IP addresses are related to the ELBs, the place we can not run FlowExporter. Consequently, FlowCollector can not decide their identities by merely observing the obtained flows. To attribute these distant IP addresses, we proceed to make use of IP deal with change occasions from Sonar, which crawls AWS assets to detect adjustments in IP deal with assignments. Although this information stream could include inaccurate timestamps and be delayed, misattribution shouldn’t be a fundamental concern since ELB IP deal with reassignment happens very sometimes.

Verifying that the brand new technique has eradicated misattribution is difficult as a result of lack of a definitive supply of fact for workload dependencies to validate move logs in opposition to; the move logs themselves are meant to function this supply of fact, in any case. To construct confidence, we analyzed the move logs of a giant service with well-understood dependencies. A big footprint is important, as misattribution is extra prevalent in providers with quite a few situations, and there have to be a dependable technique to find out the dependencies for this service with out counting on move logs.

Netflix’s cloud gateway, Zuul, served this function completely as a result of its in depth footprint (dealing with all cloud ingress site visitors), its massive variety of downstream dependencies, and our means to derive its dependencies from its routing configurations because the supply of fact for comparability with move logs. We discovered no misattribution for flows by way of Zuul over a two-week window. This offered sturdy confidence that the brand new attribution technique has eradicated misattribution. In the earlier strategy, roughly 40% of Zuul’s dependencies reported by the move logs had been misattributed.

With misattribution solved, eBPF move logs now ship reliable, fleet-wide insights into Netflix’s service topology and community well being. This development unlocks quite a few thrilling alternatives in areas reminiscent of service dependency auditing, safety evaluation, and incident triage, whereas serving to Netflix engineers develop a greater understanding of our ever-evolving distributed programs.

We wish to thank Martin Dubcovsky, Joanne Koong, Taras Roshko, Nabil Schear, Jacob Meyers, Parsha Pourkhomami, Hechao Li, Donavan Fritz, Rob Gulewich, Amanda Li, John Salem, Hariharan Ananthakrishnan, Keerti Lakshminarayan, and different beautiful colleagues for his or her suggestions, inspiration, and contributions to the success of this effort.

LEAVE A REPLY

Please enter your comment!
Please enter your name here