{"id":131846,"date":"2024-06-01T15:10:49","date_gmt":"2024-06-01T15:10:49","guid":{"rendered":"https:\/\/showbizztoday.com\/index.php\/2024\/06\/01\/data-platform-explained-part-ii\/"},"modified":"2024-06-01T15:10:49","modified_gmt":"2024-06-01T15:10:49","slug":"data-platform-explained-part-ii","status":"publish","type":"post","link":"https:\/\/showbizztoday.com\/index.php\/2024\/06\/01\/data-platform-explained-part-ii\/","title":{"rendered":"Data Platform Explained Part II\u00a0"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n        <!-- post title --><\/p>\n<div class=\"posted-by\">\n            <img decoding=\"async\" src=\"https:\/\/engineering.atspotify.com\/wp-content\/themes\/theme-spotify\/images\/icon.png\" alt=\"\"\/><\/p>\n<p>&#13;<br \/>\n                <span class=\"date\">May 28, 2024<\/span>&#13;<br \/>\n                <span class=\"author\">&#13;<br \/>\n                    Published by Anastasia Khlebnikova (Senior Engineer) and Carol Cunha (Product Manager)                <\/span>&#13;\n            <\/p>\n<\/p><\/div>\n<p>        <!-- post details --><\/p>\n<div class=\"img-holder\">\n            <!-- post thumbnail --><\/p>\n<p>                                                <a href=\"https:\/\/engineering.atspotify.com\/2024\/05\/data-platform-explained-part-ii\/\" title=\"Data Platform Explained Part II\u00a0\" target=\"_blank\" rel=\"noopener\">&#13;<br \/>\n                        <img src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590.png\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" fetchpriority=\"high\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590.png 2501w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590-250x123.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590-700x344.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590-768x378.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590-1536x755.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590-2048x1007.png 2048w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/EN219-DataPlatform_Part2_BlogPost-1200-x-590-120x59.png 120w\" sizes=\"(max-width: 2501px) 100vw, 2501px\"\/>                    <\/a><br \/>\n                        <!-- \/post thumbnail -->\n        <\/div>\n<p>        <!-- \/post title --><\/p>\n<p>Check out <a href=\"https:\/\/engineering.atspotify.com\/2024\/04\/data-platform-explained\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Platform Explained Part I<\/a>, the place we began sharing the journey of constructing an information platform, its constructing blocks, and the motivation for investing into constructing a platformized resolution at Spotify.<\/p>\n<p>In <a href=\"https:\/\/engineering.atspotify.com\/2024\/04\/data-platform-explained\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Platform Explained Part I<\/a>, we shared the primary steps within the journey to construct an information platform, the insights that point out it\u2019s time to start out constructing one, and the way we&#8217;re organized to succeed on it. In this text, we&#8217;ll take one step additional into the why, what, and the way of our information platform, introduce you to the domains beneath it which might be answerable for the platform\u2019s constructing blocks \u2014 right here we&#8217;ll discuss scalability, the tooling we use and supply, alongside the worth every constructing block brings to a knowledge platform \u2014 and at last our technique to navigate the complexity of an information ecosystem by constructing a powerful group round it.<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1590\" height=\"593\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3.png\" alt=\"\" class=\"wp-image-7124\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3.png 1590w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3-250x93.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3-700x261.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3-768x286.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3-1536x573.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image3-120x45.png 120w\" sizes=\"auto, (max-width: 1590px) 100vw, 1590px\"\/><\/figure>\n<\/div>\n<p>When it involves scalability, Spotify\u2019s Data Collection platform collects greater than 1 trillion occasions per day. Its occasion supply structure is continually evolving by quite a few iterations. To be taught extra about\u00a0 the occasion supply evolution, its <a href=\"https:\/\/engineering.atspotify.com\/2016\/02\/spotifys-event-delivery-the-road-to-the-cloud-part-i\/\" target=\"_blank\" rel=\"noopener\">inception<\/a>, and subsequent enhancements, take a look at <a href=\"https:\/\/engineering.atspotify.com\/2021\/10\/changing-the-wheels-on-a-moving-bus-spotify-event-delivery-migration\/\" target=\"_blank\" rel=\"noopener\">this<\/a> weblog submit.<\/p>\n<p>Data Collection is required, so we are able to:\u00a0<\/p>\n<ul>\n<li>Understand what content material is related to Spotify customers\u00a0<\/li>\n<li>Directly reply to consumer suggestions<\/li>\n<li>Have a deeper understanding of consumer interactions to boost their expertise<\/li>\n<\/ul>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"856\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2.png\" alt=\"\" class=\"wp-image-7125\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2.png 1999w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2-250x107.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2-700x300.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2-768x329.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2-1536x658.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image2-120x51.png 120w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\"\/><figcaption class=\"wp-element-caption\">Figure 1: The occasion supply infrastructure is a major matter that deserves its personal devoted article (coming quickly). Nevertheless, right here\u2019s an summary of the principle elements dealt with by our occasion supply infrastructure.<\/figcaption><\/figure>\n<\/div>\n<p>When a group at Spotify decides to instrument their performance with occasion supply, apart from writing code utilizing our SDs, they solely have to outline the occasion schemas. The infrastructure then robotically deploys a brand new set of event-specific elements (similar to PubSub queues, anonymization pipelines, and streaming jobs) utilizing K8 operators. Any modifications to the occasion schema triggers the deployment of corresponding sources. Anonymization options, together with inner key-handling techniques, are coated intimately in <a href=\"https:\/\/engineering.atspotify.com\/2018\/09\/scalable-user-privacy\/\" target=\"_blank\" rel=\"noopener\">this text<\/a>.\u00a0<\/p>\n<p>The stability between centralized and distributed possession permits most updates to be managed by customers of the consumption dataset, with out requiring intervention from the infrastructure group.<\/p>\n<p>Today, over 1800 totally different occasion varieties \u2014 or indicators representing interactions from Spotify customers \u2014 are being revealed. In phrases of group construction, the info assortment space is organized to give attention to the occasion supply infrastructure, supporting and enhancing consumer SDKs for occasion transmission, and constructing the prime quality datasets that characterize the consumer journey expertise, in addition to the infrastructure wanted behind it.<\/p>\n<p>Our Data Processing efforts give attention to empowering Spotify to make the most of information successfully, whereas Data Management is devoted to making sure information integrity by software creation and collaborative efforts. With greater than 38,000 actively scheduled pipelines dealing with each hourly and every day duties, scalability is a key consideration. Data Management and Data Processing are important for Spotify to successfully handle its in depth information and pipelines. It\u2019s essential to keep up information traceability (lineage), searchability (metadata), and accessibility, whereas implementing entry controls and retention insurance policies to handle storage prices and adjust to laws. These features allow Spotify to extract most worth from its information property whereas upholding operational effectivity and regulatory requirements.<\/p>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"1069\" src=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1.png\" alt=\"\" class=\"wp-image-7126\" srcset=\"https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1.png 1999w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1-250x134.png 250w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1-700x374.png 700w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1-768x411.png 768w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1-1536x821.png 1536w, https:\/\/storage.googleapis.com\/production-eng\/1\/2024\/05\/image1-1-120x64.png 120w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\"\/><figcaption class=\"wp-element-caption\">Figure 2: These domains, like Event Delivery, warrant their very own complete weblog posts. This article supplies a more in-depth take a look at the instruments we use, and our organizational construction.<\/figcaption><\/figure>\n<\/div>\n<p>The scheduling and orchestration of workflows are important elements of Data Processing. Once a workflow is picked up by the scheduler, it\u2019s executed on BigQuery, or both Flink or Dataflow clusters. Most pipelines make the most of <a href=\"https:\/\/spotify.github.io\/scio\/\" target=\"_blank\" rel=\"noreferrer noopener\">Scio<\/a>, a Scala API for Beam.<\/p>\n<p>Data pipelines generate information endpoints, every adhering to a selected schema and presumably containing a number of partitions. These endpoints are geared up with retention insurance policies, entry controls, lineage monitoring, and high quality checks.<\/p>\n<p>Defining a workflow or endpoint includes customized <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/api-extension\/custom-resources\/\" target=\"_blank\" rel=\"noreferrer noopener\">K8 operators<\/a>, which assist us to simply deploy and keep advanced buildings. In that method, the useful resource definition lives in the identical repo because the pipeline code and will get deployed and maintained by the codeowners.<\/p>\n<p>Monitoring choices embody alerts for information lateness, long-running or failing workflows, and endpoints. <a href=\"https:\/\/engineering.atspotify.com\/2020\/03\/what-the-heck-is-backstage-anyway\/\" target=\"_blank\" rel=\"noreferrer noopener\">Backstage<\/a> integration facilitates simple useful resource administration, monitoring, price evaluation, and high quality assurance.<\/p>\n<p>Building an information platform is non-trivial \u2014 it must be versatile sufficient to fulfill quite a lot of totally different use instances, aligning with price effectiveness and return on funding targets, and on the identical time retaining the developer expertise lean. The information platform must be simple to onboard to and have seamless improve paths (no one likes to be disrupted by platform upgrades and breaking modifications). And the platform must be dependable \u2014 if groups\u00a0 have the expectation to construct enterprise essential logic on prime of your platform, we deal with the platform as a essential use case as nicely.\u00a0<\/p>\n<p>There are a number of methods to raise engagement together with your product:<\/p>\n<ul>\n<li><a href=\"https:\/\/backstage.io\/docs\/features\/techdocs\/\" target=\"_blank\" rel=\"noopener\"><strong>Documentation<\/strong><\/a><strong> (which is simple to search out).<\/strong> We all have been in conditions the place, \u201cI remember reading about it, but I don\u2019t remember where.\u201d It ought to be simpler to search out documentation than to ask a query (contemplating the ready time).<\/li>\n<li><strong>Onboard groups.<\/strong> There isn&#8217;t any higher approach to study your product than to start out utilizing it your self. Go to customers and embed there. Learn about totally different use instances, guarantee that your product is simple to make use of in all attainable environments, and produce the learnings again to the platform.<\/li>\n<li><a href=\"https:\/\/engineering.atspotify.com\/2023\/05\/fleet-management-at-spotify-part-3-fleet-wide-refactoring\/\" target=\"_blank\" rel=\"noopener\"><strong>Fleetshift<\/strong><\/a><strong> the modifications.<\/strong> People love evolving and making modifications to their infrastructure and having the code being highlighted as deprecated, proper? Not actually. That is why we must always automate all attainable toils and migrations. Plan to take care of dangers. Make time to assist your clients.<\/li>\n<li><strong>Build a group<\/strong> the place individuals are free to ask questions and the place there are devoted goalies to reply these questions. Answering group questions shouldn&#8217;t be left to free will, however ought to as an alternative be inspired and brought significantly. At Spotify we have now a slack channel #data-support, the place all information questions are addressed.<\/li>\n<\/ul>\n<p>Our Data Platform has come a great distance, and continues to evolve. At the very starting, we have been a number of individuals, a part of one group. We ran the pipelines on-premise, working the<a href=\"https:\/\/engineering.atspotify.com\/2016\/02\/spotifys-event-delivery-the-road-to-the-cloud-part-i\/\" target=\"_blank\" rel=\"noopener\"> largest Hadoop cluster in Europe<\/a>. We are actually 100+ engineers engaged on constructing the Spotify information platform on GCP, with information assortment, administration, and processing capabilities.<\/p>\n<p>There isn&#8217;t any system or script to arrange an information platform. A great way to start out is by aligning your organizational wants together with your investments. These wants grow to be the drivers on your platform\u2019s constructing blocks, and will change over time. Make certain the challenges are clear \u2014 outline clear targets and set clear expectations \u2014 it can make it easier to to have the fitting assist out of your group and to be on the trail for achievement.<\/p>\n<p>Get nearer to your customers, have a transparent means by which clients and stakeholders can attain out and provide you with direct suggestions \u2014 it can set the stage to create a group round your platform. Finally, you would not have to start out massive: simply begin someplace then evolve, iterate, and be taught.<\/p>\n<p><\/p>\n<p>        Tags: <a href=\"https:\/\/engineering.atspotify.com\/tag\/data\/\" rel=\"tag noopener\" target=\"_blank\">Data<\/a><br \/> \n            <\/div>\n<p>[ad_2]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] &#13; May 28, 2024&#13; &#13; Published by Anastasia Khlebnikova (Senior Engineer) and Carol Cunha (Product Manager) &#13; &#13; Check out Data Platform Explained Part I, the place we began sharing the journey of constructing an information platform, its constructing blocks, and the motivation for investing into constructing a platformized resolution at Spotify. In Data [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":131848,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[5086,1569,1248,5087],"class_list":{"0":"post-131846","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-spotify","8":"tag-data","9":"tag-explained","10":"tag-part","11":"tag-platform"},"_links":{"self":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/131846","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/comments?post=131846"}],"version-history":[{"count":0,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/131846\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media\/131848"}],"wp:attachment":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media?parent=131846"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/categories?post=131846"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/tags?post=131846"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}