{"id":123068,"date":"2024-03-04T23:54:59","date_gmt":"2024-03-04T23:54:59","guid":{"rendered":"https:\/\/showbizztoday.com\/index.php\/2024\/03\/04\/evolving-from-rule-based-classifier-machine-learning-powered-auto-remediation-in-netflix-data-platform-by-netflix-technology-blog-mar-2024\/"},"modified":"2024-03-04T23:55:00","modified_gmt":"2024-03-04T23:55:00","slug":"evolving-from-rule-based-classifier-machine-learning-powered-auto-remediation-in-netflix-data-platform-by-netflix-technology-blog-mar-2024","status":"publish","type":"post","link":"https:\/\/showbizztoday.com\/index.php\/2024\/03\/04\/evolving-from-rule-based-classifier-machine-learning-powered-auto-remediation-in-netflix-data-platform-by-netflix-technology-blog-mar-2024\/","title":{"rendered":"Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data Platform | by Netflix Technology Blog | Mar, 2024"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<div>\n<div class=\"hu hv hw hx hy\">\n<div class=\"speechify-ignore ab co\">\n<div class=\"speechify-ignore bg l\">\n<div class=\"hz ia ib ic id ab\">\n<div>\n<div class=\"ab ie\"><a href=\"https:\/\/netflixtechblog.medium.com\/?source=post_page-----039d5efd115b--------------------------------\" rel=\"noopener follow\" target=\"_blank\"><\/p>\n<div>\n<div class=\"bl\" aria-hidden=\"false\">\n<div class=\"l if ig bx ih ii\">\n<div class=\"l fi\"><img decoding=\"async\" alt=\"Netflix Technology Blog\" class=\"l fc bx dc dd cw\" src=\"https:\/\/miro.medium.com\/v2\/resize:fill:88:88\/1*BJWRqfSMf9Da9vsXG9EBRQ.jpeg\" width=\"44\" height=\"44\" loading=\"lazy\" data-testid=\"authorPhoto\"\/><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><\/a><a href=\"https:\/\/netflixtechblog.com\/?source=post_page-----039d5efd115b--------------------------------\" rel=\"noopener  ugc nofollow\" target=\"_blank\"><\/p>\n<div class=\"il ab fi\">\n<div>\n<div class=\"bl\" aria-hidden=\"false\">\n<div class=\"l im in bx ih io\">\n<div class=\"l fi\"><img decoding=\"async\" alt=\"Netflix TechBlog\" class=\"l fc bx bq ip cw\" src=\"https:\/\/miro.medium.com\/v2\/resize:fill:48:48\/1*ty4NvNrGg4ReETxqU2N3Og.png\" width=\"24\" height=\"24\" loading=\"lazy\" data-testid=\"publicationPhoto\"\/><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"6783\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">by <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/binbing-hou\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Binbing Hou<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/stephanievezich\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Stephanie Vezich Tamayo<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/chenxiao000\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Xiao Chen<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/liangtian\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Liang Tian<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/troy-ristow-4899b49\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Troy Ristow<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/haoyuanwang\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Haoyuan Wang<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/snehalchennuru\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Snehal Chennuru<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/pawan-dixit-b4307b2\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Pawan Dixit<\/a><\/p>\n<p id=\"a95f\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\"><em class=\"nz\">This is the primary of the sequence of our work at Netflix on leveraging information insights and Machine Learning (ML) to enhance the operational automation across the efficiency and value effectivity of huge information jobs. Operational automation\u2013together with however not restricted to, auto analysis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing\u2013is vital to the success of contemporary information platforms. In this weblog publish, we current our mission on Auto Remediation, which integrates the at the moment used rule-based classifier with an ML service and goals to robotically remediate failed jobs with out human intervention. We have deployed Auto Remediation in manufacturing for dealing with reminiscence configuration errors and unclassified errors of Spark jobs and noticed its effectivity and effectiveness (e.g., robotically remediating 56% of reminiscence configuration errors and saving 50% of the financial prices attributable to all errors) and nice potential for additional enhancements.<\/em><\/p>\n<p id=\"ccd0\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">At Netflix, tons of of hundreds of workflows and thousands and thousands of jobs are working per day throughout a number of layers of the large information platform. Given the in depth scope and complicated complexity inherent to such a distributed, large-scale system, even when the failed jobs account for a tiny portion of the overall workload, diagnosing and remediating job failures could cause appreciable operational burdens.<\/p>\n<p id=\"17d3\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">For environment friendly error dealing with, Netflix developed an error classification service, known as Pensive, which leverages a rule-based classifier for error classification. The rule-based classifier classifies job errors based mostly on a set of predefined guidelines and gives insights for schedulers to determine whether or not to retry the job and for engineers to diagnose and remediate the job failure.<\/p>\n<p id=\"4123\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">However, because the system has elevated in scale and complexity, the rule-based classifier has been dealing with challenges because of its restricted help for operational automation, particularly for dealing with reminiscence configuration errors and unclassified errors. Therefore, the operational value will increase linearly with the variety of failed jobs. In some circumstances\u2013for instance, diagnosing and remediating job failures attributable to Out-Of-Memory (OOM) errors\u2013joint effort throughout groups is required, involving not solely the customers themselves, but in addition the help engineers and area consultants.<\/p>\n<p id=\"528b\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">To handle these challenges, now we have developed a brand new function, known as <em class=\"nz\">Auto Remediation<\/em>, which integrates the rule-based classifier with an ML service. Based on the classification from the rule-based classifier, it makes use of an ML service to foretell retry success likelihood and retry value and selects one of the best candidate configuration as suggestions; and a configuration service to robotically apply the suggestions. Its main benefits are under:<\/p>\n<ul class=\"\">\n<li id=\"0f4f\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Integrated intelligence. <\/strong>Instead of utterly deprecating the present rule-based classifier, Auto Remediation integrates the classifier with an ML service in order that it may leverage the deserves of each: the rule-based classifier gives static, deterministic classification outcomes per error class, which relies on the context of area consultants; the ML service gives performance- and cost-aware suggestions per job, which leverages the facility of ML. With the built-in intelligence, we will correctly meet the necessities of remediating totally different errors.<\/li>\n<li id=\"d69b\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Fully automated.<\/strong> The pipeline of classifying errors, getting suggestions, and making use of suggestions is totally automated. It gives the suggestions along with the retry resolution to the scheduler, and significantly makes use of a web-based configuration service to retailer and apply really helpful configurations. In this fashion, no human intervention is required within the remediation course of.<\/li>\n<li id=\"aee9\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Multi-objective optimizations. <\/strong>Auto Remediation generates suggestions by contemplating each efficiency (i.e., the retry success likelihood) and compute value effectivity (i.e., the financial prices of working the job) to keep away from blindly recommending configurations with extreme useful resource consumption. For instance, for reminiscence configuration errors, it searches a number of parameters associated to the reminiscence utilization of job execution and recommends the mixture that minimizes a linear mixture of failure likelihood and compute value.<\/li>\n<\/ul>\n<p id=\"0199\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">These benefits have been verified by the manufacturing deployment for remediating Spark jobs\u2019 failures. Our observations point out that Auto Remediation<em class=\"nz\"> <\/em>can efficiently remediate about 56% of all reminiscence configuration errors by making use of the really helpful reminiscence configurations on-line with out human intervention; and in the meantime scale back the price of about 50% because of its potential to advocate new configurations to make reminiscence configurations profitable and disable pointless retries for unclassified errors. We have additionally famous an incredible potential for additional enchancment by mannequin tuning (see the part of Rollout in Production).<\/p>\n<h2 id=\"8240\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Basics<\/h2>\n<figure class=\"qd qe qf qg qh qi qa qb paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"qj qk fi ql bg qm\">\n<div class=\"qa qb qc\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*pnViNRB4q-LX7rcdn6MgHA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*pnViNRB4q-LX7rcdn6MgHA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*pnViNRB4q-LX7rcdn6MgHA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*pnViNRB4q-LX7rcdn6MgHA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*pnViNRB4q-LX7rcdn6MgHA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*pnViNRB4q-LX7rcdn6MgHA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*pnViNRB4q-LX7rcdn6MgHA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*pnViNRB4q-LX7rcdn6MgHA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img alt=\"\" class=\"bg mh qn c\" width=\"700\" height=\"602\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"1c18\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">Figure 1 illustrates the error classification service, i.e., Pensive, within the information platform. It leverages the rule-based classifier and consists of three parts:<\/p>\n<ul class=\"\">\n<li id=\"533c\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Log Collector<\/strong> is answerable for pulling logs from totally different platform layers for error classification (e.g., the scheduler, job orchestrator, and compute clusters).<\/li>\n<li id=\"3636\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Rule Execution Engine<\/strong> is answerable for matching the collected logs towards a set of predefined guidelines. A rule consists of (1) the title, supply, log, and abstract, of the error and whether or not the error is restartable; and (2) the regex to establish the error from the log. For instance, the rule with the title SparkDriverOOM consists of the knowledge indicating that if the stdout log of a Spark job can match the regex <em class=\"nz\">SparkOutOfMemoryError:<\/em>, then this error is classed to be a consumer error, not restartable.<\/li>\n<li id=\"d443\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Result Finalizer<\/strong> is answerable for finalizing the error classification outcome based mostly on the matched guidelines. If one or a number of guidelines are matched, then the classification of the primary matched rule determines the ultimate classification outcome (the rule precedence is decided by the rule ordering, and the primary rule has the very best precedence). On the opposite hand, if no guidelines are matched, then this error can be thought-about unclassified.<\/li>\n<\/ul>\n<h2 id=\"7f9f\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Challenges<\/h2>\n<p id=\"799b\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">While the rule-based classifier is straightforward and has been efficient, it&#8217;s dealing with challenges because of its restricted potential to deal with the errors attributable to misconfigurations and classify new errors:<\/p>\n<ul class=\"\">\n<li id=\"f54e\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Memory configuration errors. <\/strong>The rules-based classifier gives error classification outcomes indicating whether or not to restart the job; nevertheless, for non-transient errors, it nonetheless depends on engineers to manually remediate the job. The most notable instance is reminiscence configuration errors. Such errors are typically attributable to the misconfiguration of job reminiscence. Setting an excessively small reminiscence can lead to Out-Of-Memory (OOM) errors whereas setting an excessively massive reminiscence can waste cluster reminiscence sources. What\u2019s more difficult is that some reminiscence configuration errors require altering the configurations of a number of parameters. Thus, setting a correct reminiscence configuration requires not solely the handbook operation but in addition the experience of Spark job execution. In addition, even when a job\u2019s reminiscence configuration is initially nicely tuned, modifications comparable to information measurement and job definition could cause efficiency to degrade. Given that about 600 reminiscence configuration errors per 30 days are noticed within the information platform, well timed remediation of reminiscence configuration errors alone requires non-trivial engineering efforts.<\/li>\n<li id=\"b798\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Unclassified errors. <\/strong>The rule-based classifier depends on information platform engineers to manually add guidelines for recognizing errors based mostly on the identified context; in any other case, the errors can be unclassified. Due to the migrations of various layers of the info platform and the range of functions, current guidelines could be invalid, and including new guidelines requires engineering efforts and likewise depends upon the deployment cycle. More than 300 guidelines have been added to the classifier, but about 50% of all failures stay unclassified. For unclassified errors, the job could also be retried a number of occasions with the default retry coverage. If the error is non-transient, these failed retries incur pointless job working prices.<\/li>\n<\/ul>\n<h2 id=\"9858\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Methodology<\/h2>\n<p id=\"e305\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">To handle the above-mentioned challenges, our fundamental methodology is to combine the rule-based classifier with an ML service to generate suggestions, and use a configuration service to use the suggestions robotically:<\/p>\n<ul class=\"\">\n<li id=\"7c99\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Generating suggestions. <\/strong>We use the rule-based classifier as the primary go to categorise all errors based mostly on predefined guidelines, and the ML service because the second go to supply suggestions for reminiscence configuration errors and unclassified errors.<\/li>\n<li id=\"fdd0\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Applying suggestions. <\/strong>We use a web-based configuration service to retailer and apply the really helpful configurations. The pipeline is totally automated, and the providers used to generate and apply suggestions are decoupled.<\/li>\n<\/ul>\n<h2 id=\"4acc\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Service Integrations<\/h2>\n<figure class=\"qd qe qf qg qh qi qa qb paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"qj qk fi ql bg qm\">\n<div class=\"qa qb qo\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*2eENd1mhwyGpMWNccEwqlQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*2eENd1mhwyGpMWNccEwqlQ.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*2eENd1mhwyGpMWNccEwqlQ.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*2eENd1mhwyGpMWNccEwqlQ.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*2eENd1mhwyGpMWNccEwqlQ.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*2eENd1mhwyGpMWNccEwqlQ.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*2eENd1mhwyGpMWNccEwqlQ.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*2eENd1mhwyGpMWNccEwqlQ.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img alt=\"\" class=\"bg mh qn c\" width=\"700\" height=\"597\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"e694\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">Figure 2 illustrates the combination of the providers producing and making use of the suggestions within the information platform. The main providers are as follows:<\/p>\n<ul class=\"\">\n<li id=\"29a9\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Nightingale<\/strong> is a service working the ML mannequin educated utilizing <a class=\"af ny\" href=\"https:\/\/metaflow.org\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Metaflow<\/a> and is answerable for producing a retry suggestion. The suggestion consists of (1) whether or not the error is restartable; and (2) if that&#8217;s the case, the really helpful configurations to restart the job.<\/li>\n<li id=\"515b\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">ConfigService<\/strong> is a web-based configuration service. The really helpful configurations are saved in <strong class=\"nc gu\">ConfigService<\/strong> as a JSON patch with a scope outlined to specify the roles that may use the really helpful configurations. When <strong class=\"nc gu\">Scheduler<\/strong> calls <strong class=\"nc gu\">ConfigService<\/strong> to get really helpful configurations, <strong class=\"nc gu\">Scheduler<\/strong> passes the unique configurations to <strong class=\"nc gu\">ConfigService<\/strong> and <strong class=\"nc gu\">ConfigService<\/strong> returns the mutated configurations by making use of the JSON patch to the unique configurations. <strong class=\"nc gu\">Scheduler<\/strong> can then restart the job with the mutated configurations (together with the really helpful configurations).<\/li>\n<li id=\"2e24\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Pensive<\/strong> is an error classification service that leverages the rule-based classifier. It calls <strong class=\"nc gu\">Nightingale<\/strong> to get suggestions and shops the suggestions to <strong class=\"nc gu\">ConfigService<\/strong> in order that it may be picked up by <strong class=\"nc gu\">Scheduler<\/strong> to restart the job.<\/li>\n<li id=\"c5c8\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">Scheduler <\/strong>is the service scheduling jobs (our present implementation is with <a class=\"af ny\" rel=\"noopener ugc nofollow\" target=\"_blank\" href=\"https:\/\/netflixtechblog.com\/orchestrating-data-ml-workflows-at-scale-with-netflix-maestro-aaa2b41b800c\">Netflix Maestro<\/a>). Each time when a job fails, it calls <strong class=\"nc gu\">Pensive<\/strong> to get the error classification to determine whether or not to restart a job and calls <strong class=\"nc gu\">ConfigServices<\/strong> to get the really helpful configurations for restarting the job.<\/li>\n<\/ul>\n<figure class=\"qd qe qf qg qh qi qa qb paragraph-image\">\n<div role=\"button\" tabindex=\"0\" class=\"qj qk fi ql bg qm\">\n<div class=\"qa qb qp\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/1*gyXv3JyvhUODQWecQqy1zg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" type=\"image\/webp\"\/><source data-testid=\"og\" srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*gyXv3JyvhUODQWecQqy1zg.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*gyXv3JyvhUODQWecQqy1zg.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*gyXv3JyvhUODQWecQqy1zg.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*gyXv3JyvhUODQWecQqy1zg.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*gyXv3JyvhUODQWecQqy1zg.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*gyXv3JyvhUODQWecQqy1zg.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*gyXv3JyvhUODQWecQqy1zg.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"\/><img alt=\"\" class=\"bg mh qn c\" width=\"700\" height=\"687\" loading=\"lazy\" role=\"presentation\"\/><\/picture><\/div>\n<\/div>\n<\/figure>\n<p id=\"f376\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">Figure 3 illustrates the sequence of service calls with Auto Remediation:<\/p>\n<ol class=\"\">\n<li id=\"7ba8\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx qq pe pf bj\">Upon a job failure, <strong class=\"nc gu\">Scheduler<\/strong> calls <strong class=\"nc gu\">Pensive<\/strong> to get the error classification.<\/li>\n<li id=\"9122\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx qq pe pf bj\"><strong class=\"nc gu\">Pensive<\/strong> classifies the error based mostly on the rule-based classifier. If the error is recognized to be a reminiscence configuration error or an unclassified error, it calls <strong class=\"nc gu\">Nightingale<\/strong> to get suggestions.<\/li>\n<li id=\"2589\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx qq pe pf bj\">With the obtained suggestions, <strong class=\"nc gu\">Pensive<\/strong> updates the error classification outcome and saves the really helpful configurations to <strong class=\"nc gu\">ConfigService<\/strong>; after which returns the error classification outcome to <strong class=\"nc gu\">Scheduler<\/strong>.<\/li>\n<li id=\"28d0\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx qq pe pf bj\">Based on the error classification outcome obtained from <strong class=\"nc gu\">Pensive, Scheduler<\/strong> determines whether or not to restart the job.<\/li>\n<li id=\"6bf9\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx qq pe pf bj\">Before restarting the job, <strong class=\"nc gu\">Scheduler<\/strong> calls <strong class=\"nc gu\">ConfigService<\/strong> to get the really helpful configuration and retries the job with the brand new configuration.<\/li>\n<\/ol>\n<h2 id=\"1e4d\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Overview<\/h2>\n<p id=\"3521\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">The ML service, i.e., Nightingale, goals to generate a retry coverage for a failed job that trades off between retry success likelihood and job working prices. It consists of two main parts:<\/p>\n<ul class=\"\">\n<li id=\"9c70\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">A prediction mannequin<\/strong> that collectively estimates a) likelihood of retry success, and b) retry value in {dollars}, conditional on properties of the retry.<\/li>\n<li id=\"1d31\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">An optimizer<\/strong> which explores the Spark configuration parameter area to advocate a configuration which minimizes a linear mixture of retry failure likelihood and value.<\/li>\n<\/ul>\n<p id=\"fc47\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">The prediction mannequin is retrained offline day by day, and known as by the optimizer to guage every candidate set of configuration parameter values. The optimizer runs in a RESTful service which known as upon job failure. If there&#8217;s a possible configuration resolution from the optimization, the response consists of this suggestion, which ConfigService makes use of to mutate the configuration for the retry. If there is no such thing as a possible resolution\u2013in different phrases, it&#8217;s unlikely the retry will succeed by altering Spark configuration parameters alone\u2013the response features a flag to disable retries and thus eradicate wasted compute value.<\/p>\n<h2 id=\"5369\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Prediction Model<\/h2>\n<p id=\"7761\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">Given that we wish to discover how retry success and retry value may change below totally different configuration situations, we want some option to predict these two values utilizing the knowledge now we have concerning the job. Data Platform logs each retry success final result and execution value, giving us dependable labels to work with. Since we use a shared function set to foretell each targets, have good labels, and must run inference shortly on-line to fulfill SLOs, we determined to formulate the issue as a multi-output supervised studying process. In explicit, we use a easy Feedforward Multilayer Perceptron (MLP) with two heads, one to foretell every final result.<\/p>\n<p id=\"9504\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\"><strong class=\"nc gu\">Training: <\/strong>Each file within the coaching set represents a possible retry which beforehand failed because of reminiscence configuration errors or unclassified errors. The labels are: a) did retry fail, b) retry value. The uncooked function inputs are largely unstructured metadata concerning the job such because the Spark execution plan, the consumer who ran it, and the Spark configuration parameters and different job properties. We cut up these options into these that may be parsed into numeric values (e.g., Spark executor reminiscence parameter) and people who can&#8217;t (e.g., consumer title). We used function hashing to course of the non-numeric values as a result of they arrive from a excessive cardinality and dynamic set of values. We then create a decrease dimensionality embedding which is concatenated with the normalized numeric values and handed by way of a number of extra layers.<\/p>\n<p id=\"84e8\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\"><strong class=\"nc gu\">Inference: <\/strong>Upon passing validation audits, every new mannequin model is saved in <a class=\"af ny\" href=\"https:\/\/metaflow.org\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Metaflow<\/a> Hosting, a service supplied by our inside ML Platform. The optimizer makes a number of calls to the mannequin prediction operate for every incoming configuration suggestion request, described in additional element under.<\/p>\n<h2 id=\"8895\" class=\"pl ob gt be oc pm pn dx og po pp dz ok nl pq pr ps np pt pu pv nt pw px py pz bj\">Optimizer<\/h2>\n<p id=\"3c41\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">When a job try fails, it sends a request to Nightingale with a job identifier. From this identifier, the service constructs the function vector for use in inference calls. As described beforehand, a few of these options are Spark configuration parameters that are candidates to be mutated (e.g., spark.executor.reminiscence, spark.executor.cores). The set of Spark configuration parameters was based mostly on distilled data of area consultants who work on Spark efficiency tuning extensively. We use Bayesian Optimization (carried out by way of Meta\u2019s <a class=\"af ny\" href=\"https:\/\/ax.dev\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Ax library<\/a>) to discover the configuration area and generate a suggestion. At every iteration, the optimizer generates a candidate parameter worth mixture (e.g., spark.executor.reminiscence=7192 mb, spark.executor.cores=8), then evaluates that candidate by calling the prediction mannequin to estimate retry failure likelihood and value utilizing the candidate configuration (i.e., mutating their values within the function vector). After a set variety of iterations is exhausted, the optimizer returns the \u201cbest\u201d configuration resolution (i.e., that which minimized the mixed retry failure and value goal) for ConfigService to make use of whether it is possible. If no possible resolution is discovered, we disable retries.<\/p>\n<p id=\"a043\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">One draw back of the iterative design of the optimizer is that any bottleneck can block completion and trigger a timeout, which we initially noticed in a non-trivial variety of circumstances. Upon additional profiling, we discovered that a lot of the latency got here from the candidate generated step (i.e., determining which instructions to step within the configuration area after the earlier iteration\u2019s analysis outcomes). We discovered that this concern had been raised to Ax library house owners, who <a class=\"af ny\" href=\"https:\/\/github.com\/facebook\/Ax\/issues\/810\" rel=\"noopener ugc nofollow\" target=\"_blank\">added GPU acceleration choices of their API<\/a>. Leveraging this selection decreased our timeout charge considerably.<\/p>\n<p id=\"fa6a\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">We have deployed Auto Remediation in manufacturing to deal with reminiscence configuration errors and unclassified errors for Spark jobs. Besides the retry success likelihood and value effectivity, the influence on consumer expertise is the main concern:<\/p>\n<ul class=\"\">\n<li id=\"bee8\" class=\"na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx pd pe pf bj\"><strong class=\"nc gu\">For reminiscence configuration errors: <\/strong>Auto remediation improves consumer expertise as a result of the job retry isn&#8217;t profitable with no new configuration for reminiscence configuration errors. This implies that a profitable retry with the really helpful configurations can scale back the operational hundreds and save job working prices, whereas a failed retry doesn&#8217;t make the consumer expertise worse.<\/li>\n<li id=\"0267\" class=\"na nb gt nc b nd pg nf ng nh ph nj nk nl pi nn no np pj nr ns nt pk nv nw nx pd pe pf bj\"><strong class=\"nc gu\">For unclassified errors: <\/strong>Auto remediation recommends whether or not to restart the job if the error can&#8217;t be categorised by current guidelines within the rule-based classifier. In explicit, if the ML mannequin predicts that the retry could be very more likely to fail, it should advocate disabling the retry, which might save the job working prices for pointless retries. For circumstances during which the job is business-critical and the consumer prefers all the time retrying the job even when the retry success likelihood is low, we will add a brand new rule to the rule-based classifier in order that the identical error can be categorised by the rule-based classifier subsequent time, skipping the suggestions of the ML service. This presents some great benefits of the built-in intelligence of the rule-based classifier and the ML service.<\/li>\n<\/ul>\n<p id=\"6868\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">The deployment in manufacturing has demonstrated that Auto Remediation<em class=\"nz\"> <\/em>can present efficient configurations for reminiscence configuration errors, efficiently remediating about 56% of all reminiscence configuration with out human intervention. It additionally decreases compute value of those jobs by about 50% as a result of it may both advocate new configurations to make the retry profitable or disable pointless retries. As tradeoffs between efficiency and value effectivity are tunable, we will determine to attain the next success charge or extra value financial savings by tuning the ML service.<\/p>\n<p id=\"5a75\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">It is value noting that the ML service is at the moment adopting a conservative coverage to disable retries. As mentioned above, that is to keep away from the influence on the circumstances that customers favor all the time retrying the job upon job failures. Although these circumstances are anticipated and could be addressed by including new guidelines to the rule-based classifier, we think about tuning the target operate in an incremental method to regularly disable extra retries is useful to supply fascinating consumer expertise. Given the present coverage to disable retries is conservative, Auto Remediation presents an incredible potential to finally deliver far more value financial savings with out affecting the consumer expertise.<\/p>\n<p id=\"7e4c\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">Auto Remediation is our first step in leveraging information insights and Machine Learning (ML) for enhancing consumer expertise, lowering the operational burden, and enhancing value effectivity of the info platform. It focuses on automating the remediation of failed jobs, but in addition paves the trail to automate operations aside from error dealing with.<\/p>\n<p id=\"b619\" class=\"pw-post-body-paragraph na nb gt nc b nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx gm bj\">One of the initiatives we&#8217;re taking, known as <em class=\"nz\">Right Sizing<\/em>, is to reconfigure scheduled large information jobs to request the right sources for job execution. For instance, now we have famous that the typical requested executor reminiscence of Spark jobs is about 4 occasions their max used reminiscence, indicating a big overprovision. In addition to the configurations of the job itself, the useful resource overprovision of the container that&#8217;s requested to execute the job will also be lowered for value financial savings. With heuristic- and ML-based strategies, we will infer the right configurations of job execution to reduce useful resource overprovisions and save thousands and thousands of {dollars} per yr with out affecting the efficiency. Similar to Auto Remediation, these configurations could be robotically utilized by way of ConfigService with out human intervention. Right Sizing is in progress and can be coated with extra particulars in a devoted technical weblog publish later. Stay tuned.<\/p>\n<p id=\"a363\" class=\"pw-post-body-paragraph na nb gt nc b nd oy nf ng nh oz nj nk nl pa nn no np pb nr ns nt pc nv nw nx gm bj\">Auto Remediation is a joint work of the engineers from totally different groups and organizations. This work would haven&#8217;t been potential with out the strong, in-depth collaborations. We wish to respect all people, together with Spark consultants, information scientists, ML engineers, the scheduler and job orchestrator engineers, information engineers, and help engineers, for sharing the context and offering constructive solutions and useful suggestions (e.g., <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/jzhuge\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">John Zhuge<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/jheua\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Jun He<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/holdenkarau\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Holden Karau<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/samarthjain11\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Samarth Jain<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/julianjaffe\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Julian Jaffe<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/batul-shajapurwala-3274b863\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Batul Shajapurwala<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/michael-sachs-b2453b\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Michael Sachs<\/a>, <a class=\"af ny\" href=\"https:\/\/www.linkedin.com\/in\/fzsiddiqi\/overlay\/about-this-profile\/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Faisal Siddiqi<\/a>).<\/p>\n<\/div>\n<p>[ad_2]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] by Binbing Hou, Stephanie Vezich Tamayo, Xiao Chen, Liang Tian, Troy Ristow, Haoyuan Wang, Snehal Chennuru, Pawan Dixit This is the primary of the sequence of our work at Netflix on leveraging information insights and Machine Learning (ML) to enhance the operational automation across the efficiency and value effectivity of huge information jobs. Operational [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":123070,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":{"0":"post-123068","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-netflix"},"_links":{"self":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/123068","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/comments?post=123068"}],"version-history":[{"count":0,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/123068\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media\/123070"}],"wp:attachment":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media?parent=123068"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/categories?post=123068"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/tags?post=123068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}