1 / 23

Recent Developments

Recent Developments. 15min. Previous Topics. https://exp-platform.com/2018StrataABTutorial/ Machine learning in Causal Inference Variance Reduction and Regression Adjustment Beyond Average Treatment Effect, a.k.a. Effect Heterogeneity. Topics. Bayes Factor Bounds

bferguson
Download Presentation

Recent Developments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent Developments 15min

  2. Previous Topics https://exp-platform.com/2018StrataABTutorial/ • Machine learning in Causal Inference • Variance Reduction and Regression Adjustment • Beyond Average Treatment Effect, a.k.a. Effect Heterogeneity

  3. Topics • Bayes Factor Bounds • Treatment Effect Estimation and Empirical Bayes: learn prior from historical data If time permits • Continuous Monitoring and Adaptive Experimentation

  4. Bound Bayes Factor Z-score (or t-stat when sample size is large) Don’t know , but needs to compute Simple bounds can be derived under different assumptions regarding --- the distribution of treatment effect under the alternative

  5. Opportunistic : Likelihood ratio bound The that favor most is to let • Z has the largest likelihood when (MLE) If we want to be symmetric around 0, e.g. two-sided test, the likelihood ratio bound will put with 50% Z and 50% -Z • Likelihood bound is the simple, and most aggressive in favor of

  6. Power bound and UMBPT Power bound: assume (or what ever number) when using (or another number) UMPBT (Uniformly most powerful Bayesian Test): put at 1.96 (for with equal probability

  7. Local bound • More realistically, if sets to be 0, should be more likely to be close to 0 than be away from 0 • Local : is such that the distribution of –log(p-value) under has decreasing hazard rate • For two-sided test, think of it as unimodal symmetric distribution with mode at 0 and density decays at both tails with certain condition • Originally derived assuming p-value under follows Beta() • Bayes Factor bound =

  8. Bayes Factor Bounds

  9. No knowledge about the genuine true prior should not prevent you from using Bayes Factor and Hypothesis classification

  10. Treatment Effect Estimation Regression/Posterior Mean: Benefit: • Robust against multiple testing and post-inference selection/Winner’s curse (Why?) • Data need to include observations for other metrics, and other replications of the same experiment • For any given metric X, Data only need to include all metrics with correlated effects • Usually only small number of metrics have highly correlated effects, condition on each metric alone is good enough • Reduced variance: usually smaller than

  11. Simulation study: Prior ~ t(df = 3), trained with Laplace prior Posterior mean/MLE Note the decreasing coefficient Coverage of 95% “confidence interval” Left: posterior mean and variance Right: MLE and its sampling variance Reduced variance

  12. Simulation Study: Estimation curve and true effects Different traffic sizes True Effect Fatter than normal tail, less shrinkage at tails Observed Delta

  13. Prior Learning • When you have a set of historical data • Parametric prior model • Normal, Laplace, t, etc. • Two group model: with p to be 0, and 1-p to be from another distribution • Maximize Marginal likelihood: Classic empirical Bayes, EM algorithm • Fit unknown parameter first. And apply it to posterior mean and posterior variance. • Some parameter, such as degree of freedom of t, is very hard to fit. • Regression based: treatment the treatment estimation as the goal, directly fit the parameters to maximize prediction accuracy • Issue: unknown ground truth of the effect. • Solution 1: Experiment splitting ( a paper of this conference). • Solution 2: Estimate the testing error using SURE (stein’s unbiased risk estimator) • Both approaches do not even require parametric model for prior, you only need to model the posterior mean directly.

  14. Treatment Effect Prediction Error for pval<10% MLE Simulation, t with df=3 MSE for selected pval<0.1 (about 19%) Boosting tree: predict effect using observed delta and sample size. Expect sample size has a nonlinear effect here. Weighted by sample size Weighted linear regression using observed delta alone Assume Normal prior with mean 0 Laplace(double exponential prior) Weighted linear regression but with one more sensitive metric

  15. Continuous Monitoring and Adaptive Experimentation • Continuous Monitoring and Early stopping • Interim traffic reallocation: Multi-armed bandit • Contextual multi-armed bandit and continuous exploration and exploitation in high dimension configuration space • Bayesian global optimization in high dimension configuration space

  16. Should you consider these? • Do you have a clear and simple goal and constraints? • Do you want to watch out for unexpected movements/results? • Is the signal/reward instantly observable or relatively short term? Do you randomize by page-view or want to track users over a time period? Do you worry about switching users’ experience? • Do you expect the treatment effect to be relatively stable over time? Do you want to learn intraweek pattern of the effect? • Are your goal purely exploration and aim at identifying the winning variant/arm? Do you care about the opportunity cost during the experiment period, e.g. regret? • Do you have a large number of variants or high dimensional configuration space to optimize for?

  17. Continuous Monitoring • Bayes Factor and Posterior based methods supports continuous monitoring and early stopping • The interpretation of Bayes Factor and posterior probability holds and requires no adjustment! (Genuine or empirical Bayes prior or Bayes Factor bound) • mSPRT (mixture sequential probability ratio test) used by Optimizely: • Look at Bayes Factor assuming effect has a normal prior. The unknown prior variance need to be specified manually or fit from historical data(based on Gold/Silver/Bronze tiers of customers) • Test procedure has a one to one mapping to Posterior based methods. The KDD 2017 version stops when BF > • The procedure has a frequentist flavor and does not try to control FDR. Instead it tries to control Type-I error and needs extra steps for FDR control

  18. Global Optimization/Best Arm Identification • Opportunity cost is not part of the concern • Pure-exploration multi-armed bandit: no topology and relation between the arms, assumes the rewards are independent of each other • Typically arms are configured in a continuous/ordered space, reward surface tend to be smooth, rewards at two configurations are more correlated if they are closer • Bayesian global optimization = multi-armed bandit + Gaussian Process • Closed formula • Traffic reallocation based on heuristics such as knowledge gradient or expected improvement • Some of the structure in the GP can be shared across tasks, using offline data

  19. References: Bayes Factor and Estimation • “Redefine statistical significance”, Benjamin et.al. 2017, Nature Human Behavior • “Calibration of ρ Values for Testing Precise Null Hypotheses”, Sellke et.al. 2001, The American Statistician • “Objective Bayesian two sample hypothesis testing for online controlled experiments”, Deng 2015, WWW • “Three Recommendations for Improving the Use of p-Values”. Benjamin and Berger, 2019, The American Statistician • “Uniformly most powerful Bayesian tests”, Johnson, 2013, Annals of Statistics • “Onp-Values and Bayes Factors”, Held and Ott, 2018, Annual Review of Statistics and Its Application • “Improving Treatment Effect Estimators Through Experiment Splitting”, Coey and Cunningham, 2019, WebConf

  20. Reference: Continuous Monitoring • “Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing”, Deng et.al. 2016, IEEE DSAA • “A Sequential Test for Selecting the Better Variant”, Ju, et. al., 2019, WSDM • “Peeking at A/B Tests: Why It Matters, and What to Do About It”, Johari, et.al., 2017 KDD • “Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations”, Hendriksen et. al., 2018, preprint

  21. References: Adaptive Experimentation • “Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems”, Bubeck and Cesa-Bianchi, 2012, now • “Constrained Bayesian Optimization with Noisy Experiments”, Letham, et. al. 2018, Bayesian Analysis • “Bayesian optimization for policy search via mixed online-offline experimentation”, Letham and Bakshy, JMLR

  22. Questions? http://exp-platform.com

  23. Appendix

More Related