Netflix was thrilled to be the premier sponsor for the 2nd 12 months in a row on the 2023 Conference on Digital Experimentation (CODE@MIT) in Cambridge, MA. The convention incorporates a balanced mix of educational and business analysis from some depraved sensible of us, and we’re proud to have contributed quite a lot of talks and posters together with a plenary session.
Our contributions kicked off with an idea that’s essential to our understanding of A/B checks: surrogates!
Our first discuss was given by Aurelien Bibaut (with co-authors Nathan Kallus, Simon Ejdemyr and Michael Zhao) during which we mentioned find out how to confidently measure long-term outcomes utilizing brief time period surrogates within the presence of bias. For instance, how can we estimate the consequences of improvements on retention a 12 months later with out working all our experiments for a 12 months? We proposed an estimation technique utilizing cross-fold procedures, and assemble legitimate confidence intervals for long run results earlier than that impact is totally noticed.
Later on, Michael Zhao (with Vickie Zhang, Anh Le and Nathan Kallus) spoke in regards to the analysis of surrogate index fashions for product choice making. Using 200 actual A/B checks carried out at Netflix, we confirmed that surrogate-index fashions, constructed utilizing solely 2 weeks of knowledge, result in the identical product ship choices ~95% of the time when in comparison with making a name based mostly on 2 months of knowledge. This means we will reliably run shorter checks with confidence without having to attend months for outcomes!
Our subsequent subject targeted on find out how to perceive and stability competing engagement metrics; for instance, ought to 1 hour of gaming equal 1 hour of streaming? Michael Zhao and Jordan Schafer shared a poster on how they constructed an Overall Evaluation Criterion (OEC) metric that gives holistic analysis for A/B checks, appropriately weighting totally different engagement metrics to serve a single total goal. This new framework has enabled quick and assured choice making in checks, and is being actively tailored as our enterprise continues to broaden into new areas.
In the second plenary session of the day, Martin Tingley took us on a compelling and enjoyable journey of complexity, exploring key challenges in digital experimentation and the way they differ from the challenges confronted by agricultural researchers a century in the past. He highlighted totally different areas of complexity and supplied views on find out how to deal with the suitable challenges based mostly on enterprise aims.
Our ultimate discuss was given by Apoorva Lal (with co-authors Samir Khan and Johan Ugander) during which we present how partial identification of the dose-response perform (DRF) below non-parametric assumptions can be utilized to offer extra insightful analyses of experimental knowledge than the usual ATE evaluation does. We revisited a research that lowered like-minded content material algorithmically, and confirmed how we might lengthen the binary ATE studying to reply how the quantity of like-minded content material a consumer sees impacts their political attitudes.
We had a blast connecting with the CODE@MIT neighborhood and bonding over our shared enthusiasm for not solely rigorous measurement in experimentation, but additionally stats-themed stickers and swag!
We sit up for subsequent 12 months’s iteration of the convention and hope to see you there!
Psst! We’re hiring Data Scientists throughout quite a lot of domains at Netflix — try our open roles.