{"id":108260,"date":"2023-06-26T11:17:40","date_gmt":"2023-06-26T11:17:40","guid":{"rendered":"https:\/\/showbizztoday.com\/index.php\/2023\/06\/26\/analyzing-volatile-memory-on-a-google-kubernetes-engine-node\/"},"modified":"2023-06-26T11:17:40","modified_gmt":"2023-06-26T11:17:40","slug":"analyzing-volatile-memory-on-a-google-kubernetes-engine-node","status":"publish","type":"post","link":"https:\/\/showbizztoday.com\/index.php\/2023\/06\/26\/analyzing-volatile-memory-on-a-google-kubernetes-engine-node\/","title":{"rendered":"Analyzing Volatile Memory on a Google Kubernetes Engine Node"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n        <!-- post title --><\/p>\n<div class=\"posted-by\">\n            <img decoding=\"async\" src=\"https:\/\/engineering.atspotify.com\/wp-content\/themes\/theme-spotify\/images\/icon.png\" alt=\"\"\/><\/p>\n<p>&#13;<br \/>\n                <span class=\"date\">June 22, 2023<\/span>&#13;<br \/>\n                <span class=\"author\">&#13;<br \/>\n                    Published by Marcus Hallberg, Security Engineer                <\/span>&#13;\n            <\/p>\n<\/p><\/div>\n<p>        <!-- post details --><\/p>\n<div class=\"img-holder\">\n            <!-- post thumbnail --><\/p>\n<p>                                                <a href=\"https:\/\/engineering.atspotify.com\/2023\/06\/analyzing-volatile-memory-on-a-google-kubernetes-engine-node\/\" title=\"Analyzing Volatile Memory on a Google Kubernetes Engine Node\" target=\"_blank\" rel=\"noopener\">&#13;<br \/>\n                        <img src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/EN196_Analyzing-Volatile-Memory-final-590.png\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Analyzing Volatile Memory header image with Kubernetes shipping container and storage boxes inside.\" decoding=\"async\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/EN196_Analyzing-Volatile-Memory-final-590.png 1200w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/EN196_Analyzing-Volatile-Memory-final-590-250x123.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/EN196_Analyzing-Volatile-Memory-final-590-700x344.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/EN196_Analyzing-Volatile-Memory-final-590-768x378.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/EN196_Analyzing-Volatile-Memory-final-590-120x59.png 120w\" sizes=\"(max-width: 1200px) 100vw, 1200px\"\/>                    <\/a><br \/>\n                        <!-- \/post thumbnail -->\n        <\/div>\n<p>        <!-- \/post title --><\/p>\n<p><strong>TL:DR<\/strong> At Spotify, we run containerized workloads in manufacturing throughout our total group in 5 areas the place our essential manufacturing workloads are in Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). If we detect suspicious habits in our workloads, we&#8217;d like to have the ability to shortly analyze it and decide if one thing malicious has occurred. Today we leverage business options to observe them, however we additionally do our personal analysis to find choices and various strategies.<br \/>One such analysis challenge led to the invention of a brand new technique for conducting reminiscence evaluation on GKE by combining three open supply instruments, <a href=\"https:\/\/github.com\/microsoft\/avml\" target=\"_blank\" rel=\"noreferrer noopener\">AVML<\/a>, <a href=\"https:\/\/github.com\/volatilityfoundation\/dwarf2json\" target=\"_blank\" rel=\"noreferrer noopener\">dwarf2json<\/a>, and <a href=\"https:\/\/github.com\/volatilityfoundation\/volatility3\" target=\"_blank\" rel=\"noreferrer noopener\">Volatility 3<\/a>, the end result being a snapshot of all of the processes and reminiscence actions on a GKE node.<\/p>\n<p>This new technique empowers us and different organizations to make use of an open supply various if we should not have a business answer in place or if we wish to evaluate our present monitoring to the open supply one.<\/p>\n<p>In this weblog put up, I\u2019ll clarify intimately how reminiscence evaluation works and the way this new technique can be utilized on any GKE node in manufacturing as we speak.\u00a0<\/p>\n<p>Spotify is a heavy consumer of GKE on GCP, and we run most of our manufacturing workloads as we speak in GKE. We\u2019re current in 5 GCP areas and run a couple of hundred thousand pods in manufacturing on the similar time throughout greater than 3,000 GKE namespaces.\u00a0<\/p>\n<p>In abstract, it\u2019s secure to say that we\u2019re a giant consumer of GKE and have a must each scale our manufacturing workloads and in addition monitor what is going on in our manufacturing.<\/p>\n<p>Although Google has its personal technique for implementing Kubernetes in its cloud atmosphere, being GKE, there are a couple of <a href=\"https:\/\/kubernetes.io\/docs\/reference\/glossary\/?fundamental=true\" target=\"_blank\" rel=\"noreferrer noopener\">normal phrases<\/a> to bear in mind:<\/p>\n<ul>\n<li><strong>Control Plane:<\/strong> The container orchestration layer that exposes the API and interfaces to outline, deploy, and handle the lifecycle of containers.<\/li>\n<li><strong>Cluster:<\/strong> A set of employee machines, referred to as nodes, that run containerized purposes. Every cluster has at the least one employee node.<\/li>\n<li><strong>Node:<\/strong> A node is a employee machine in Kubernetes.<\/li>\n<li><strong>Namespace:<\/strong> An abstraction utilized by Kubernetes to assist isolation of teams of assets inside a single cluster.<\/li>\n<li><strong>Pod:<\/strong> The smallest and easiest Kubernetes object. A Pod represents a set of working containers in your cluster.<\/li>\n<li><strong>Container:<\/strong> A light-weight and transportable executable picture that accommodates software program and all of its dependencies.<\/li>\n<\/ul>\n<p>Below, you may see a high-level structure of a GKE cluster on GCP (Source: <a href=\"https:\/\/cloud.google.com\/kubernetes-engine\/docs\/concepts\/cluster-architecture\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/cloud.google.com\/kubernetes-engine\/docs\/concepts\/cluster-architecture<\/a>).\u00a0<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster.png\" alt=\"Example of a GKE-managed Cluster\" class=\"wp-image-6247\" width=\"840\" height=\"516\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster.png 1999w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster-250x154.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster-700x430.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster-768x472.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster-1536x944.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Cluster-120x74.png 120w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\"\/><figcaption class=\"wp-element-caption\">Figure 1: GKE-managed cluster overview.<\/figcaption><\/figure>\n<\/div>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Kernel_(operating_system)\" target=\"_blank\" rel=\"noreferrer noopener\">kernel<\/a> is the principle layer between the working system (OS) of the GKE node and the underlying server assets. It helps with vital duties like course of and <a href=\"https:\/\/www.techtarget.com\/whatis\/definition\/memory-management\" target=\"_blank\" rel=\"noreferrer noopener\">reminiscence administration<\/a>, file programs, machine management, and networking. Below is an summary of the kernel format:<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"1128\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel.png\" alt=\"Application overview featuring Application, Kernel, and CPU, Memory, and Devices\" class=\"wp-image-6248\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel.png 1999w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel-250x141.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel-700x395.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel-768x433.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel-1536x867.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Kernel-120x68.png 120w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\"\/><figcaption class=\"wp-element-caption\"><em>Figure 2: Application overview.<\/em><\/figcaption><\/figure>\n<\/div>\n<p>If we wish to perceive what is going on on a GKE node and what processes are working on it in reminiscence \u2014 the kernel is the optimum place to search out it. Many business options as we speak leverage the <a href=\"https:\/\/en.wikipedia.org\/wiki\/EBPF\" target=\"_blank\" rel=\"noreferrer noopener\">prolonged Berkeley Packet Filter (eBPF)<\/a> and its sandbox method to entry the kernel. This, nevertheless, requires that you simply purchase a business answer that makes use of eBPF otherwise you construct your individual answer on prime of it. As my analysis confirmed, there&#8217;s one other method we will take.<\/p>\n<p>So how can we entry the kernel on a GKE node and analyze the reminiscence? My analysis boiled it all the way down to the next three steps:<\/p>\n<ul>\n<li>Step 1: Create a kernel reminiscence dump<\/li>\n<li>Step 2: Build a logo file of the kernel\u00a0<\/li>\n<li>Step 3: Analyze the kernel reminiscence dump\u00a0<\/li>\n<\/ul>\n<p>In order to exhibit the next steps, I created the beneath structure utilizing Terraform and a Python script that built-in with the GCP API.<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture.png\" alt=\"A view of GCP Architecture\" class=\"wp-image-6249\" width=\"700\" height=\"665\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture.png 1932w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture-250x238.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture-700x665.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture-768x730.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture-1536x1460.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Architecture-120x114.png 120w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><figcaption class=\"wp-element-caption\"><em>Figure 3: GCP structure for GKE analysis.<\/em><\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\">Step 1: Create a kernel reminiscence dump<\/h2>\n<p>By taking a kernel reminiscence dump, we will get a \u201csnapshot\u201d of all of the kernel exercise at a selected time that we then can analyze.<\/p>\n<p>Since GKE nodes are working the hardened working system <a href=\"https:\/\/cloud.google.com\/container-optimized-os\/docs\/concepts\/features-and-benefits\" target=\"_blank\" rel=\"noopener\">COS<\/a>, we will\u2019t use a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Loadable_kernel_module\" target=\"_blank\" rel=\"noopener\">kernel module<\/a> or comparable answer. However, by quickly including a privileged container to the GKE node with privileged permissions, we will entry the kernel area within the file path: <a href=\"https:\/\/access.redhat.com\/documentation\/en-us\/red_hat_enterprise_linux\/4\/html\/reference_guide\/s2-proc-kcore#:~:text=This%20file%20represents%20the%20physical,RAM)%20used%20plus%204%20KB.\" target=\"_blank\" rel=\"noopener\"><strong><em>\/proc\/kcore<\/em><\/strong><\/a><strong><em>.<\/em><\/strong><\/p>\n<p>Once now we have entry and may learn from this file path, we will use the open supply instrument AVML to take a kernel reminiscence dump. The code beneath reveals a Terraform instance of a privileged container in GKE.<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"700\" height=\"308\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Terraform-700x308.png\" alt=\"Terraform example displayed in code.\" class=\"wp-image-6250\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Terraform-700x308.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Terraform-250x110.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Terraform-768x338.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Terraform-120x53.png 120w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Terraform.png 886w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><figcaption class=\"wp-element-caption\"><em>Figure 4: Terraform config of GKE container.<\/em><\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\">Step 2: Build a logo file of the kernel\u00a0<\/h2>\n<p>In order to interpret the kernel reminiscence dump, we have to construct an Intermediate Symbol File (ISF) of the particular kernel model of the GKE node. This will be accomplished by accessing the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Vmlinux\" target=\"_blank\" rel=\"noopener\"><em>vmlinux<\/em><\/a> file, which is the uncompressed model of the kernel picture, after which utilizing an open supply instrument referred to as <em>dwarf2json<\/em> to construct the image file. With the image file, we will now interpret the kernel reminiscence dump code into the working software program and processes.\u00a0<\/p>\n<p>In our case, the issue was looking for the place Google Cloud hosts the <em>vmlinux<\/em> file of the COS model of a GKE node. After a lot analysis and interplay with a few of Google\u2019s engineers who construct GKE and COS, we found an undocumented API that permits you to entry the <em>vmlinux<\/em> file if you recognize the <a href=\"https:\/\/cloud.google.com\/container-optimized-os\/docs\/concepts\/versioning#milestones_and_build_numbers\" target=\"_blank\" rel=\"noopener\"><em>build_id<\/em><\/a> of the COS model working in your GKE node.<\/p>\n<p>As the <em>build_id<\/em> is current within the GKE picture title, we will discover it and use it to entry the API by way of the next hyperlink: <strong><em>https:\/\/storage.googleapis.com\/cos-tools\/$build_id\/vmlinux<\/em><\/strong>.<\/p>\n<p>In the instance beneath, you\u2019ll see that the GKE picture has the <em>build_id = 16919.235.1.<\/em><\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1506\" height=\"560\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Image.png\" alt=\"GKE image details featuring build ID, architecture, location, labels, creation type, and encryption type\" class=\"wp-image-6251\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Image.png 1506w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Image-250x93.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Image-700x260.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Image-768x286.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/GKE-Image-120x45.png 120w\" sizes=\"auto, (max-width: 1506px) 100vw, 1506px\"\/><figcaption class=\"wp-element-caption\">Figure 5: GKE picture configuration, together with build_id.<\/figcaption><\/figure>\n<\/div>\n<p>With this data, we will entry the <em>vmlinux<\/em> file by way of:<\/p>\n<p><a href=\"https:\/\/storage.googleapis.com\/cos-tools\/16919.235.1\/vmlinux\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/storage.googleapis.com\/cos-tools\/16919.235.1\/vmlinux<\/a><\/p>\n<p>and construct the image file utilizing <em>dwarf2json<\/em>.<\/p>\n<h2 class=\"wp-block-heading\">Step 3: Analyze the kernel reminiscence dump\u00a0<\/h2>\n<p>Now that we lastly have each the kernel reminiscence dump and the image file to interpret that kernel model, we will now analyze it with <em>Volatility 3<\/em>. Using Volatility 3 permits us to see all working processes on each the privileged pod and one other take a look at pod on the identical GKE node. This \u201cattacker\u201d pod is working a sequence of take a look at processes to create some examples for us to investigate (for instance, a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Netcat\" target=\"_blank\" rel=\"noreferrer noopener\">Netcat<\/a> listener, a watch command that queries the native IP and at last a Python script). Below, you may see the entire course of output from the kernel reminiscence dump evaluation.<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"607\" height=\"188\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Process-Output.jpg\" alt=\"Example of process output from a kernel memory dump analysis\" class=\"wp-image-6254\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Process-Output.jpg 607w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Process-Output-250x77.jpg 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2023\/06\/Process-Output-120x37.jpg 120w\" sizes=\"auto, (max-width: 607px) 100vw, 607px\"\/><figcaption class=\"wp-element-caption\">Figure 6: Process output from Volatility 3.<\/figcaption><\/figure>\n<\/div>\n<p>In abstract, we will now see all of the processes on the complete GKE node for all working pods.<\/p>\n<h2 class=\"wp-block-heading\">Conclusion\u00a0<\/h2>\n<p>Using the three instruments talked about above has offered us with free and open supply alternate options to preexisting business options for monitoring containerized workloads. Although this method supplies a snapshot of the method exercise, it may be used both as a place to begin for reminiscence evaluation in GKE or as a complement to present business options.\u00a0<\/p>\n<p><em>All the code used on this analysis challenge is accessible right here on <\/em><a href=\"https:\/\/github.com\/Monrava\/bsidesnyc2023\" target=\"_blank\" rel=\"noreferrer noopener\"><em>GitHub<\/em><\/a><em> and was additionally introduced at <\/em><a href=\"https:\/\/livestream.com\/internetsociety\/bsidesnyc2023\/videos\/236151647\" target=\"_blank\" rel=\"noreferrer noopener\"><em>BSidesNYC 2023<\/em><\/a><em>.\u00a0<\/em><\/p>\n<p><em>Kubernetes is a registered trademark of the Linux Foundation within the United States and different nations.<\/em><\/p>\n<p><\/p>\n<p>        Tags: <a href=\"https:\/\/engineering.atspotify.com\/tag\/backend\/\" rel=\"tag noopener\" target=\"_blank\">backend<\/a><br \/> \n            <\/div>\n<p><script async defer crossorigin=\"anonymous\"\n    src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js#xfbml=1&#038;autoLogAppEvents=1&#038;version=v7.0&#038;appId=256751791017051\">\n<\/script><br \/>\n<br \/>[ad_2]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] &#13; June 22, 2023&#13; &#13; Published by Marcus Hallberg, Security Engineer &#13; &#13; TL:DR At Spotify, we run containerized workloads in manufacturing throughout our total group in 5 areas the place our essential manufacturing workloads are in Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). If we detect suspicious habits in our workloads, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":108262,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"class_list":{"0":"post-108260","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-spotify"},"_links":{"self":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/108260","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/comments?post=108260"}],"version-history":[{"count":0,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/108260\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media\/108262"}],"wp:attachment":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media?parent=108260"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/categories?post=108260"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/tags?post=108260"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}