{"id":23256,"date":"2022-11-24T16:45:17","date_gmt":"2022-11-24T16:45:17","guid":{"rendered":"https:\/\/showbizztoday.com\/index.php\/2022\/11\/24\/survival-analysis-meets-reinforcement-learning\/"},"modified":"2022-11-24T16:45:17","modified_gmt":"2022-11-24T16:45:17","slug":"survival-analysis-meets-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/showbizztoday.com\/index.php\/2022\/11\/24\/survival-analysis-meets-reinforcement-learning\/","title":{"rendered":"Survival Analysis Meets Reinforcement Learning"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<div class=\"published-date\">\n<div class=\"icon-holder\">\n                                                <img decoding=\"async\" src=\"https:\/\/research.atspotify.com\/wp-content\/themes\/spotify\/images\/icon.png\" alt=\"\"\/>\n                                            <\/div>\n<p><span class=\"date\">November 24, 2022<\/span> Published by Lucas Maystre<\/p>\n<\/p><\/div>\n<div class=\"img-holder\">\n                                            <img decoding=\"async\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/RS043-Survival-Analysis-Meets-RL_FINAL_NO_LOGO.png\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Survival Analysis Meets Reinforcement Learning\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/RS043-Survival-Analysis-Meets-RL_FINAL_NO_LOGO.png 1201w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/RS043-Survival-Analysis-Meets-RL_FINAL_NO_LOGO-250x131.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/RS043-Survival-Analysis-Meets-RL_FINAL_NO_LOGO-700x368.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/RS043-Survival-Analysis-Meets-RL_FINAL_NO_LOGO-768x404.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/RS043-Survival-Analysis-Meets-RL_FINAL_NO_LOGO-120x63.png 120w\" sizes=\"(max-width: 1201px) 100vw, 1201px\"\/><figcaption\/>\n                                        <\/div>\n<p><strong>TL;DR<\/strong>: Survival evaluation offers a framework to cause about time-to-event knowledge; at Spotify, for instance, we use it to know and predict the best way customers may interact with Spotify sooner or later. In this work, we convey temporal-difference studying, a central concept in reinforcement studying, to survival evaluation. We develop a brand new algorithm that trains a survival mannequin from sequential knowledge by leveraging a temporal consistency situation, and present that it outperforms direct regression on noticed outcomes.<\/p>\n<h2>Survival evaluation<\/h2>\n<p>Survival evaluation is the department of statistics that offers with time-to-event knowledge, with functions throughout a variety of domains. Survival fashions are utilized by physicians to know sufferers\u2019 well being outcomes. Such fashions are additionally utilized by engineers to review the reliability of units starting from laborious drives to hoover cleaners. At Spotify, we use survival fashions to know the best way customers will interact with Spotify at a later date.. Such fashions are essential to make sure that Spotify makes selections which are aligned with our customers\u2019 long-term satisfaction\u2014from tiny selections equivalent to algorithmic suggestions all the best way to massive modifications within the consumer interface.<\/p>\n<p>Here is a typical state of affairs for survival evaluation. Suppose that we&#8217;re focused on predicting, for each Spotify consumer with a free account, the time till they convert to a Premium subscription. We name this conversion the \u201cevent\u201d. To be taught a mannequin that predicts the time-to-event, we begin by amassing a dataset of historic observations.<\/p>\n<ol>\n<li>We choose a pattern of customers that have been energetic a number of months in the past, and acquire a function vector that describes how they have been utilizing Spotify again then.<\/li>\n<li>We then fast-forward to the current and verify if they&#8217;ve transformed within the meantime. For these customers who transformed, we file the time at which it occurred. Note that many customers within the pattern is not going to but have transformed, and the technical time period for these observations is \u201cright-censored\u201d (the time-to-event is above a given worth, however we have no idea by how a lot). These observations nonetheless carry helpful alerts in regards to the time-to-event.<\/li>\n<\/ol>\n<p>With this we now have constructed a dataset of triplets (<em>x<\/em><sub>0<\/sub>, <em>t<\/em>, <em>c<\/em>), one for every consumer within the pattern. We name <em>x<\/em><sub>0<\/sub> the preliminary state; it describes the consumer in the beginning of the statement window (i.e., a number of months in the past). The second amount, <em>t<\/em>, denotes the time-to-event (if the consumer has transformed for the reason that starting of the window) or the time till the top of the statement window. Finally, <em>c<\/em> is a binary indicator variable that merely denotes whether or not the consumer has transformed throughout the statement window (<em>c<\/em> = 0) or not (<em>c<\/em> = 1).<\/p>\n<p>The subsequent step is to posit a mannequin for the info. One mannequin that could be very easy and common is known as the Cox proportional-hazards mannequin. At Spotify, we now have additionally had good outcomes with Beta survival fashions. Given a dataset of observations, we are able to practice a mannequin by maximizing its chance underneath the info\u2014a typical strategy in statistics and machine studying. Once we now have educated such a mannequin, we are able to use it to make predictions about customers exterior of the coaching dataset. For instance, a amount of curiosity is<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image1-1.png\" alt=\"\" class=\"wp-image-5160\" width=\"134\" height=\"26\"\/><\/figure>\n<\/div>\n<p>the chance {that a} consumer\u2019s time-to-event <em>T<\/em> is bigger than <em>ok<\/em>, given the consumer\u2019s preliminary state <em>x<\/em><sub>0<\/sub>.<\/p>\n<h2>The dynamic setting<\/h2>\n<p>Increasingly, it&#8217;s changing into commonplace to gather a number of measurements over time. That is, as a substitute of solely gaining access to some preliminary state <em>x<\/em><sub>0<\/sub>, we are able to additionally get hold of extra measurements <em>x<\/em><sub>1<\/sub>, <em>x<\/em><sub>2<\/sub>, \u2026 collected at common intervals in time (say, each month). To proceed with our instance, we observe not simply how lengthy it takes till a free consumer converts but in addition how their utilization evolves over time. In medical functions, from an preliminary state indicating, for example, options of a affected person and a alternative of medical therapy, we would observe not simply the survival time however wealthy data on the evolution of their well being.\u00a0<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8-700x597.png\" alt=\"\" class=\"wp-image-5161\" width=\"385\" height=\"329\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8-700x597.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8-250x213.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8-768x655.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8-1536x1309.png 1536w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8-120x102.png 120w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image8.png 1999w\" sizes=\"auto, (max-width: 385px) 100vw, 385px\"\/><\/figure>\n<\/div>\n<p>In this dynamic setting, the info encompass sequences of states as a substitute of a single, static vector of covariates. This naturally raises the query: Can we reap the benefits of sequential knowledge to enhance survival predictions?<\/p>\n<p>One strategy to doing so is known as landmarking. The concept is that we are able to decompose sequences into a number of easier observations. For instance, a sequence that goes by means of states <em>x<\/em><sub>0<\/sub> and <em>x<\/em><sub>1<\/sub> after which reaches the occasion could be transformed into two observations: one with preliminary state <em>x<\/em><sub>0<\/sub> and time-to-event <em>t<\/em> = 2, and one other one with preliminary state <em>x<\/em><sub>1<\/sub> and time-to-event <em>t<\/em> = 1.<\/p>\n<p>This is neat, however we propose that we are able to do even higher: we are able to reap the benefits of predictable dynamics within the sequences of states. For instance, if we all know very effectively what the time-to-event from <em>x<\/em><sub>1<\/sub> is like, we would acquire rather a lot by contemplating how seemingly it&#8217;s to transition from <em>x<\/em><sub>0<\/sub> to <em>x<\/em><sub>1<\/sub>, as a substitute of attempting to be taught in regards to the time-to-event from <em>x<\/em><sub>0<\/sub> straight.<\/p>\n<h2>A detour: temporal-difference studying<\/h2>\n<p>In our journey to formalizing this concept, we take just a little detour by means of reinforcement studying (RL). We think about the Markov reward course of, a formalism regularly used within the RL literature. For our functions, we are able to consider this course of as producing sequences of states and rewards (actual numbers): <em>x<\/em><sub>0<\/sub>, <em>r<\/em><sub>1<\/sub>, <em>x<\/em><sub>1<\/sub>, <em>r<\/em><sub>2<\/sub>, <em>x<\/em><sub>2<\/sub>, \u2026 A key amount of curiosity is the so-called worth perform, which represented the anticipated discounted sum of future rewards from a given state:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"177\" height=\"45\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image2-2.png\" alt=\"\" class=\"wp-image-5162\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image2-2.png 177w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image2-2-120x31.png 120w\" sizes=\"auto, (max-width: 177px) 100vw, 177px\"\/><\/figure>\n<\/div>\n<p>the place <em>\u03b3<\/em> is a reduction issue. Given sequences of states and rewards, how will we estimate the worth perform? A pure strategy is to make use of supervised studying to coach a mannequin on a dataset of empirical observations mapping a state <em>x<\/em><sub>0<\/sub> to the discounted return <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"83\" height=\"42\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image3-1.png\" alt=\"\" class=\"wp-image-5163\"\/><\/figure>\n<\/div>\n<p>In the RL literature, that is referred to as the <em>Monte Carlo technique<\/em>.<\/p>\n<p>There is one other strategy to studying the worth perform. We begin by making the most of the Markov property and rewrite the worth perform as<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"174\" height=\"22\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image4-2.png\" alt=\"\" class=\"wp-image-5164\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image4-2.png 174w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image4-2-120x15.png 120w\" sizes=\"auto, (max-width: 174px) 100vw, 174px\"\/><\/figure>\n<\/div>\n<p>This can be referred to as the Bellman equation. This suggests a distinct method to make use of supervised studying to be taught a worth perform: as a substitute of defining the regression goal because the precise, noticed discounted return, outline it because the noticed speedy reward <em>r<\/em><sub>1<\/sub>, plus a prediction on the subsequent state, <em>\u03b3V<\/em>(<em>x<\/em><sub>1<\/sub>), the place the worth at <em>x<\/em><sub>1<\/sub> is given by a mannequin. This may appear to be round reasoning (utilizing a mannequin to be taught a mannequin!), however in reality this concept is central in reinforcement studying. It is understood underneath the title of temporal-difference studying, and has been a key ingredient within the success of RL functions over the previous 30 years.<\/p>\n<h2>Our proposal: temporally-consistent survival regression<\/h2>\n<p>We now return to our dynamic survival evaluation setting, and to the issue of predicting time-to-event. Is there one thing we are able to be taught from temporal-difference studying in Markov reward processes? On the one hand, there isn&#8217;t a notion of reward, low cost issue or worth perform in survival evaluation, so at first sight it&#8217;d appear to be we&#8217;re coping with one thing very completely different. On the opposite hand, we&#8217;re additionally coping with sequences of states, so perhaps there are some similarities in spite of everything.<\/p>\n<p>A vital perception is the next. If we assume that the sequence of states x0, x1, \u2026 kind a Markov chain, then we are able to rewrite the the survival chance as<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"313\" height=\"26\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image5-2.png\" alt=\"\" class=\"wp-image-5165\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image5-2.png 313w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image5-2-250x21.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image5-2-120x10.png 120w\" sizes=\"auto, (max-width: 313px) 100vw, 313px\"\/><\/figure>\n<p>for any <em>ok<\/em> \u2265 1. Intuitively, this identification states that the survival chance at a given state must be related (on common) to the survival chance on the subsequent state, accounting for the delay. This appears similar to the Bellman equation above. Indeed, in each circumstances, we reap the benefits of a notion of temporal consistency to write down a amount of curiosity (the worth perform or the survival chance) recursively, by way of a right away statement and a prediction on the subsequent state.<\/p>\n<p>Building on this perception, we develop algorithms that mirror temporal-difference studying, however within the context of estimating a survival mannequin. Instead of utilizing the noticed time-to-event (or time to censoring) because the goal, we assemble a \u201cpseudo-target\u201d that mixes the one-hop consequence (whether or not the occasion occurs on the subsequent step or not) and a prediction about survival on the subsequent state. This distinction is illustrated within the determine under.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"314\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9-700x314.png\" alt=\"\" class=\"wp-image-5166\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9-700x314.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9-250x112.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9-768x344.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9-1536x689.png 1536w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9-120x54.png 120w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image9.png 1771w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><\/figure>\n<\/div>\n<h2>Benefits of our algorithm<\/h2>\n<p>Our strategy could be considerably extra data-efficient than maximum-likelihood-style direct regression. That is, our algorithm is ready to decide up delicate alerts which are predictive of survival even when the scale of the dataset is restricted. This results in predictive fashions which are extra correct, as measured by a number of efficiency metrics. We show these advantages in two methods.<\/p>\n<p>First, we handcraft a process that highlights a setting the place our algorithm yields monumental positive factors. In quick, we design an issue the place it&#8217;s a lot simpler to foretell survival from an preliminary state by making the most of predictions at intermediate states, as these intermediate states are shared throughout many sequences (and thus survival from these intermediate states is far simpler to be taught precisely). We name this the data-pooling profit, and our strategy efficiently takes benefit of this. The take-away is that implementing temporal-consistency reduces the impact of the noise contained within the noticed time-to-event outcomes.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"263\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1-700x263.png\" alt=\"\" class=\"wp-image-5169\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1-700x263.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1-250x94.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1-768x289.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1-1536x578.png 1536w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1-120x45.png 120w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image7-1.png 1999w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><\/figure>\n<\/div>\n<p>Second, we consider fashions realized utilizing our algorithm empirically on real-world datasets. To facilitate reproducibility, we give attention to publicly accessible medical datasets, recording survival outcomes of sufferers recognized with an sickness. For every affected person, biomarkers are recorded at examine entry and at common follow-up visits. In addition, we additionally think about an artificial dataset. In every case, we measure a mannequin\u2019s predictive efficiency as a perform of the variety of coaching samples. Models educated utilizing our strategy systematically end in higher predictions, and the distinction is especially robust when the variety of samples is low. In the determine under, we report the concordance index, a well-liked metric to judge survival predictions (greater is best).<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"177\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2-700x177.png\" alt=\"\" class=\"wp-image-5168\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2-700x177.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2-250x63.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2-768x194.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2-1536x388.png 1536w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2-120x30.png 120w, https:\/\/storage.googleapis.com\/research-production\/1\/2022\/11\/image6-2.png 1999w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><\/figure>\n<\/div>\n<h2>A bridge between RL and survival evaluation<\/h2>\n<p>Our paper focuses totally on utilizing concepts from temporal-difference studying in RL to enhance the estimation of survival fashions. Beyond this, we additionally hope to construct a bridge between the RL and survival evaluation communities. To the survival evaluation group, we convey temporal-difference studying, a central concept in RL. Conversely, to the RL group, we convey many years of modeling insights from survival evaluation. We assume that some RL issues could be naturally expressed by way of time-to-event (for instance, maximizing the size of a session in a recommender system), and we hope that this bridge can be helpful. In the paper, we briefly sketch how our strategy may very well be prolonged to issues with actions, paving the best way for RL algorithms tailor-made to survival settings.<\/p>\n<p>If you have an interest in getting hands-on with this, we encourage you to take a look at our <a href=\"https:\/\/github.com\/spotify-research\/tdsurv\" target=\"_blank\" rel=\"noopener\">companion repository<\/a>, which incorporates a reference Python implementation of the algorithms we describe within the paper.\u00a0 For extra data, please discuss with our paper:<\/p>\n<p><a href=\"https:\/\/research.atspotify.com\/publications\/temporally-consistent-survival-analysis\/\" target=\"_blank\" rel=\"noopener\">Temporally-Consistent Survival Analysis<\/a><br \/>Lucas Maystre and Daniel Russo<br \/>NeurIPS 2022<\/p>\n<\/p><\/div>\n<p>[ad_2]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] November 24, 2022 Published by Lucas Maystre TL;DR: Survival evaluation offers a framework to cause about time-to-event knowledge; at Spotify, for instance, we use it to know and predict the best way customers may interact with Spotify sooner or later. In this work, we convey temporal-difference studying, a central concept in reinforcement studying, to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":23258,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"class_list":{"0":"post-23256","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-spotify"},"_links":{"self":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/23256","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/comments?post=23256"}],"version-history":[{"count":0,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/23256\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media\/23258"}],"wp:attachment":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media?parent=23256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/categories?post=23256"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/tags?post=23256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}