1 / 25

Selecting Observations against Adversarial Objectives

Selecting Observations against Adversarial Objectives. Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A. Observation selection problems.

sarila
Download Presentation

Selecting Observations against Adversarial Objectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA

  2. Observation selection problems • Set V of possible observations (sensor locations,..) • Want to pick subset A*µ V such that • For most interesting utilities F, NP-hard! Detectcontaminationsin water networks Place sensors forbuilding automation Monitor rivers, lakes using robots

  3. Key observation: Diminishing returns Placement A = {S1, S2} Placement B = {S1,…, S5} Adding S’ will help a lot! Adding S’ doesn’t help much New sensor S’ Formalization: Submodularity For A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)

  4. Submodularity[with Guestrin, Singh, Leskovec, VanBriesen, Faloutsos, Glance] We prove submodularity for • Mutual information F(A) = H(unobs) – H(unobs|A) • UAI ’05, JMLR ’07 (Spatial prediction) • Outbreak detection F(A) = Impact reduction sensing A • KDD ’07 (Water monitoring, …) Also submodular: • Geometric coverage F(A) = area covered • Variance reduction F(A) = Var(Y) – Var(Y|A) • …

  5. 1 2 3 4 5 ~63% Why is submodularity useful? Greedy Algorithm(forward selection) Theorem [Nemhauser et al ‘78] Greedy algorithm gives constant factor approximation F(Agreedy) ¸ (1-1/e) F(Aopt) • Can get online (data dependent) bounds for any algorithm • Can significantly speed up greedy algorithm • Can use MIP / branch & bound for optimal solution

  6. Sensors Robust observation selection • What if … • … parameters  of model P(XV j) unknown / change? • … sensors fail? • … an adversary selects the outbreak scenario? Morevariabilityhere nownew Best placement forparameters old Attackhere!

  7. 3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3 Robust prediction • Instead: minimize “width” of the confidence bands • For every location s 2 V, define Fs(A) = Var(s) – Var(s|A) • Minimize “width”  simultaneously maximize all Fs(A) • Each Fs(A) is (often) submodular! [Das & Kempe ‘07] Confidencebands pH value Horizontal positions V Low average variance (MSE) but high maximum (in most interesting part!) Typical objective: Minimize average variance (MSE)

  8. 3 2 1 0 -1 -2 -3 Adversarial observation selection • Given: • Possible observations V, • Submodular functions F1,…,Fm • Want to solve • Can model many problems this way: • Width of confidence bands: Fi is variance at location i • unknown parameters: Fi is info-gain with parameters i • adversarial outbreak scenarios: Fi is utility for scenario i • … • Unfortunately, mini Fi(A) is not submodular  … … One Fi foreach location i

  9. How does greedy do?  Greedy does arbitrarily badly. Is there something better? Greedy picks z first Optimalsolution(k=2) Then, canchoose onlyx or y Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP

  10. Alternative formulation • If somebody told us the optimal value, can we recover the optimal solution A*? • Need to solve dual problem Is this any easier? Yes, if we relax the constraint |A| · k

  11. c Solving the alternative problem • Trick: For each Fi and c, define truncation Fi(A) F’i(A) |A| Lemma: mini Fi(A) ¸ c F’avg,c(A) = c F’avg,c(A) is submodular!

  12. Why is this useful? • Can use the greedy algorithm to find (approximate) solution! Proposition: Greedy algorithm finds AG with |AG| · k and F’avg,c(AG) = c where  = 1+log maxsi Fi({s})

  13. Back to our example • Guess c=1 • First pick x • Then pick y  Optimal solution!  • How do we find c?

  14. cmax cmin c Submodular Saturation Algorithm • Given set V, integer k and functions F1,…,Fm • Initialize cmin=0, cmax = mini Fi(V) • Do binary search: c = (cmin+cmax)/2 • Use greedy algorithm to find AG such that F’avg,c(AG) = c • If |AG| >  k: decrease cmax • If |AG| · k: increase cmin until convergence |AG| · k c too low |AG| >  k c too high

  15. Theoretical guarantees Theorem:If there were a polytime algorithm with better constant  < , then NPµ DTIME(nlog log n) Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP Theorem: Saturate finds a solution AS such that mini Fi(AS) ¸ OPTk and |AS|· k where OPTk = max|A|· k mini Fi(A)  = 1 + log maxsi Fi({s})

  16. Experiments: • Minimizing maximum variance in GP regression • Robust biological experimental design • Outbreak detection against adversarial contaminations • Goals: • Compare against state of the art • Analyze appropriateness of“worst-case” assumption

  17. 0.25 2.5 0.2 Greedy Greedy 2 0.15 Maximum marginal variance Maximum marginal variance Saturate 1.5 Simulated 0.1 Annealing 1 Simulated 0.05 Annealing Saturate 0.5 0 0 20 40 60 80 100 0 20 40 60 Number of sensors Number of sensors Spatial prediction • Compare to state of the art [Sacks et.al. ’88, Wiens ’05, …] • Highly tuned simulated annealing heuristics (7 parameters) • Saturate is competitive & faster, better on larger problems better Environmental monitoring Precipitation data

  18. 3 Max. var. Max. var. 0.25 opt. avg. opt. avg. 2.5 Max. var. Max. var. (Greedy) (Greedy) opt. max. 0.2 opt. var. (Saturate) 2 (Saturate) 0.15 Marginal variance Marginal variance 1.5 0.1 1 Avg. var. Avg. var. Avg. var. Avg. var. 0.05 opt. max. opt. max. 0.5 opt. avg. opt. avg. (Saturate) (Saturate) (Greedy) (Greedy) 0 0 0 5 10 15 20 0 5 10 15 20 Number of sensors Number of sensors Maximum vs. average variance • Minimizing the worst-case leads to good average-case score, not vice versa better Environmental monitoring Precipitation data

  19. 3000 3000 2500 2500 Greedy max DT (Greedy) max DT 2000 2000 (Saturate) Simulated Maximum detection time (minutes) avg DT Detection time (minutes) 1500 1500 Annealing (Saturate) 1000 1000 500 500 avg DT Saturate (Greedy) 0 0 0 2 4 6 8 10 0 10 20 30 Number of sensors Number of sensors Outbreak detection • Results even more prominent on water network monitoring (12,527 nodes) better Water networks Water networks

  20. Robust experimental design • Learn parameters  of nonlinear function yi = f(xi,) + w • Choose stimuli xi to facilitate MLE of  • Difficult optimization problem! • Common approach: linearization! yi¼ f(xi,0) + rf0(xi)T (-0) + w • Allows nice closed form (fractional) solution!  • How should we choose 0??

  21. Robust experimental design • State-of-the-art: [Flaherty et al., NIPS ‘06] • Assume perturbation on Jacobian rf0(xi) • Solve robust SDP against worst-case perturbation • Minimize maximum eigenvalue of estimation error (E-optimality) • This paper: • Assume perturbation of initial parameter estimate 0 • Use Saturate to perform well against allinitial parameter estimates • Minimize MSE of parameter estimate(Bayesian A-optimality, typically submodular!)

  22. Experimental setup • Estimate parameters of Michaelis-Menten model (to compare results) • Evaluate efficiency of designs Loss of optimal design, knowing true parameter true Loss of robust design, assuming (wrong) initial parameter 0

  23. 1 1 Saturate 0.8 0.8 Saturate SDP 0.6 0.6 -3 r = 10 Efficiency (w.r.t. E-optimality) Efficiency (w.r.t. E-optimality) SDP 0.4 0.4 -3 r = 10 SDP Classical Classical r = 16.3 0.2 0.2 E-optimal E-optimal design design 0 0 -1 0 1 -1 0 1 q q true true 10 10 10 10 10 10 2 2 q q Initial parameter estimate Initial parameter estimate 02 02 Robust design results B C A B C A • Saturate more efficient than SDP if optimizing for high parameter uncertainty better High uncertainty in 0 Low uncertainty in 0

  24. 3000 2500 2000 k=10 Adversarial score 1500 k=5 1000 k=15 500 k=20 0 0 200 400 600 800 Expected score Future (current) work • Incorporating complex constraints (communication, etc.) • Dealing with large numbers of objectives • Constraint generation • Improved guarantees for certain objectives (sensor failures) • Trading off worst-case and average-case scores

  25. Conclusions • Many observation selection problems require optimizing adversarially chosen submodular function • Problem not approximable to any factor! • Presented efficient algorithm: Saturate • Achieves optimal score, with boundedincrease in cost • Guarantees are best possibleunder reasonable complexity assumptions • Saturate performs well on real-world problems • Outperforms state-of-the-artsimulated annealing algorithms for sensor placement, no parameters to tune • Compares favorably with SDP based solutions for robust experimental design

More Related