Analyzing Volatile Memory on a Google Kubernetes Engine Node

0
372
Analyzing Volatile Memory on a Google Kubernetes Engine Node



June 22, 2023

Published by Marcus Hallberg, Security Engineer

TL:DR At Spotify, we run containerized workloads in manufacturing throughout our total group in 5 areas the place our essential manufacturing workloads are in Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). If we detect suspicious habits in our workloads, we’d like to have the ability to shortly analyze it and decide if one thing malicious has occurred. Today we leverage business options to observe them, however we additionally do our personal analysis to find choices and various strategies.
One such analysis challenge led to the invention of a brand new technique for conducting reminiscence evaluation on GKE by combining three open supply instruments, AVML, dwarf2json, and Volatility 3, the end result being a snapshot of all of the processes and reminiscence actions on a GKE node.

This new technique empowers us and different organizations to make use of an open supply various if we should not have a business answer in place or if we wish to evaluate our present monitoring to the open supply one.

In this weblog put up, I’ll clarify intimately how reminiscence evaluation works and the way this new technique can be utilized on any GKE node in manufacturing as we speak. 

Spotify is a heavy consumer of GKE on GCP, and we run most of our manufacturing workloads as we speak in GKE. We’re current in 5 GCP areas and run a couple of hundred thousand pods in manufacturing on the similar time throughout greater than 3,000 GKE namespaces. 

In abstract, it’s secure to say that we’re a giant consumer of GKE and have a must each scale our manufacturing workloads and in addition monitor what is going on in our manufacturing.

Although Google has its personal technique for implementing Kubernetes in its cloud atmosphere, being GKE, there are a couple of normal phrases to bear in mind:

  • Control Plane: The container orchestration layer that exposes the API and interfaces to outline, deploy, and handle the lifecycle of containers.
  • Cluster: A set of employee machines, referred to as nodes, that run containerized purposes. Every cluster has at the least one employee node.
  • Node: A node is a employee machine in Kubernetes.
  • Namespace: An abstraction utilized by Kubernetes to assist isolation of teams of assets inside a single cluster.
  • Pod: The smallest and easiest Kubernetes object. A Pod represents a set of working containers in your cluster.
  • Container: A light-weight and transportable executable picture that accommodates software program and all of its dependencies.

Below, you may see a high-level structure of a GKE cluster on GCP (Source: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture). 

Example of a GKE-managed Cluster
Figure 1: GKE-managed cluster overview.

The kernel is the principle layer between the working system (OS) of the GKE node and the underlying server assets. It helps with vital duties like course of and reminiscence administration, file programs, machine management, and networking. Below is an summary of the kernel format:

Application overview featuring Application, Kernel, and CPU, Memory, and Devices
Figure 2: Application overview.

If we wish to perceive what is going on on a GKE node and what processes are working on it in reminiscence — the kernel is the optimum place to search out it. Many business options as we speak leverage the prolonged Berkeley Packet Filter (eBPF) and its sandbox method to entry the kernel. This, nevertheless, requires that you simply purchase a business answer that makes use of eBPF otherwise you construct your individual answer on prime of it. As my analysis confirmed, there’s one other method we will take.

So how can we entry the kernel on a GKE node and analyze the reminiscence? My analysis boiled it all the way down to the next three steps:

  • Step 1: Create a kernel reminiscence dump
  • Step 2: Build a logo file of the kernel 
  • Step 3: Analyze the kernel reminiscence dump 

In order to exhibit the next steps, I created the beneath structure utilizing Terraform and a Python script that built-in with the GCP API.

A view of GCP Architecture
Figure 3: GCP structure for GKE analysis.

Step 1: Create a kernel reminiscence dump

By taking a kernel reminiscence dump, we will get a “snapshot” of all of the kernel exercise at a selected time that we then can analyze.

Since GKE nodes are working the hardened working system COS, we will’t use a kernel module or comparable answer. However, by quickly including a privileged container to the GKE node with privileged permissions, we will entry the kernel area within the file path: /proc/kcore.

Once now we have entry and may learn from this file path, we will use the open supply instrument AVML to take a kernel reminiscence dump. The code beneath reveals a Terraform instance of a privileged container in GKE.

Terraform example displayed in code.
Figure 4: Terraform config of GKE container.

Step 2: Build a logo file of the kernel 

In order to interpret the kernel reminiscence dump, we have to construct an Intermediate Symbol File (ISF) of the particular kernel model of the GKE node. This will be accomplished by accessing the vmlinux file, which is the uncompressed model of the kernel picture, after which utilizing an open supply instrument referred to as dwarf2json to construct the image file. With the image file, we will now interpret the kernel reminiscence dump code into the working software program and processes. 

In our case, the issue was looking for the place Google Cloud hosts the vmlinux file of the COS model of a GKE node. After a lot analysis and interplay with a few of Google’s engineers who construct GKE and COS, we found an undocumented API that permits you to entry the vmlinux file if you recognize the build_id of the COS model working in your GKE node.

As the build_id is current within the GKE picture title, we will discover it and use it to entry the API by way of the next hyperlink: https://storage.googleapis.com/cos-tools/$build_id/vmlinux.

In the instance beneath, you’ll see that the GKE picture has the build_id = 16919.235.1.

GKE image details featuring build ID, architecture, location, labels, creation type, and encryption type
Figure 5: GKE picture configuration, together with build_id.

With this data, we will entry the vmlinux file by way of:

https://storage.googleapis.com/cos-tools/16919.235.1/vmlinux

and construct the image file utilizing dwarf2json.

Step 3: Analyze the kernel reminiscence dump 

Now that we lastly have each the kernel reminiscence dump and the image file to interpret that kernel model, we will now analyze it with Volatility 3. Using Volatility 3 permits us to see all working processes on each the privileged pod and one other take a look at pod on the identical GKE node. This “attacker” pod is working a sequence of take a look at processes to create some examples for us to investigate (for instance, a Netcat listener, a watch command that queries the native IP and at last a Python script). Below, you may see the entire course of output from the kernel reminiscence dump evaluation.

Example of process output from a kernel memory dump analysis
Figure 6: Process output from Volatility 3.

In abstract, we will now see all of the processes on the complete GKE node for all working pods.

Conclusion 

Using the three instruments talked about above has offered us with free and open supply alternate options to preexisting business options for monitoring containerized workloads. Although this method supplies a snapshot of the method exercise, it may be used both as a place to begin for reminiscence evaluation in GKE or as a complement to present business options. 

All the code used on this analysis challenge is accessible right here on GitHub and was additionally introduced at BSidesNYC 2023

Kubernetes is a registered trademark of the Linux Foundation within the United States and different nations.

Tags: backend



LEAVE A REPLY

Please enter your comment!
Please enter your name here