Near-Optimal Sensor Placements in Gaussian Processes

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas Krause Ajit Singh Carnegie Mellon University

Precipitation data from Pacific NW Sensor placement applications • Monitoring of spatial phenomena • Temperature • Precipitation • Drilling oil wells • ... • Active learning, experimental design, ... • Results today not limited to 2-dimensions Temperature data from sensor network

This deployment: Evenly distributed sensors assumptions Deploying sensors Chicken-and-Egg problem:  Considered in: Computer science (c.f., [Hochbaum & Maass ’85]) Spatial statistics (c.f., [Cressie ’91]) No data or assumptions about distribution But, what are the optimal placements??? i.e., solving combinatorial (non-myopic) optimization Don’t know where to place sensors

Becomes a covering problem Strong assumption – Sensing radius Problem is NP-complete But there are good algorithms with (PTAS) -approximation guarantees [Hochbaum & Maass ’85] Node predicts values of positions with some radius Unfortunately, approach is usually not useful… Assumption is wrong on real data!  For example…

Precipitation data from Pacific NW Non-local, Non-circular correlations Complex positive and negative correlations Spatial correlation

Complex, noisy correlations • Complex, uneven sensing “region” • Actually, noisy correlations, rather than sensing region

Combining multiple sources of information • Individually, sensors are bad predictors • Combined information is more reliable • How do we combine information? • Focus of spatial statistics Temp here?

Uncertainty after observations are made more sure here less sure here Gaussian process (GP) - Intuition GP – Non-parametric; represents uncertainty; complex correlation functions (kernels) y - temperature x - position

Gaussian processes Posterior mean temperature Posterior variance Kernel function: Prediction after observing set of sensors A:

Gaussian processes for sensor placement Posterior mean temperature Posterior variance Goal: Find sensor placement with least uncertainty after observations Problem is still NP-complete  Need approximation

most uncertain most uncertaingiven A1 most uncertaingiven A1 ... Ak-1 This is exactly the joint entropyH(A) = H({A1 ... Ak}) Non-myopic placements • Consider myopically selecting • This can be seen as an attempt to non-myopically maximize H(A1) + H(A2 | {A1}) + ... + H(Ak | {A1 ... Ak-1})

Temperature data placements: Entropy “Wasted” information observed by [O’Hagan ’78] Entropy High uncertainty given current set A – X is different Entropy criterion (c.f., [Cressie ’91]) Entropy places sensors along borders • A Ã; • For i = 1 to k • Add location Xi to A, s.t.: Uncertainty (entropy) plot Entropy criterion wastes information [O’Hagan ’78], Indirect, doesn’t consider sensing region – No formal non-myopic guarantees 

Temperature data placements: Entropy Mutual information High uncertainty given A – X is different Uncertainty of uninstrumented locations before sensing Low uncertainty given rest – X is informative Uncertainty of uninstrumented locations after sensing Proposed objective function:Mutual information • Locations of interest V • Find locations AµV maximizing mutual information: • Intuitive greedy rule: Intuitive criterion – Locations that are both different and informative We give formal non-myopic guarantees 

An important observation Selecting T1 tells sth.about T2 and T5 Selecting T3 tells sth.about T2 and T4 In many cases, new information is worth less if we know more (diminishing returns)! T2 T1 T3 T5 T4 Now adding T2 would not help much

Submodular set functions • Submodular set functions are a natural formalism for this idea: f(A [ {X}) – f(A) • Maximization of SFs is NP-hard  • But… ¸ f(B[ {X}) – f(B) for AµB B A {X}

~ 63% How can we leverage submodularity? • Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPTapproximation for monotone SFs, i.e.

mutual information num. sensors A=; A=V Mutual information and submodularity • Mutual information is submodular • F(A) = I(A;V\A) • So, we should be able to use Nemhauser et al. • Mutual information is not monotone!!! • Initially, adding sensor increases MI; later adding sensors decreases MI • F(;) = I(;;V) = 0 • F(V) = I(V;;) = 0 • F(A) ¸ 0 Even though MI is submodular, can’t apply Nemhauser et al. Or can we… 

V\A A Z – unobservable Approximate monotonicity of mutual information • If H(X|A) – H(X|V\A) ¸ 0, then MI monotonic • Solution: Add grid Z of unobservable locations • If H(X|A) – H(X|ZV\A) ¸ 0, then MI monotonic H(X|A) << H(X|V\A) MI not monotonic For sufficiently fine Z: H(X|A) > H(X|ZV\A) - MI approximately monotonic X

Result of our algorithm Constant factor Optimal non-myopic solution • Approximate monotonicity • for sufficiently discretization – poly(1/,k,,L,M) • – sensor noise, L – Lipschitz const. of kernels, M – maxX K(X,X) Theorem: Mutual information sensor placement • Greedy MI algorithm provides constant factor approximation: placing k sensors, 8>0:

Different costs for different placements Theorem 1: Constant-factor approximation of optimal locations – select k sensors • Theorem 2: (Cost-sensitive placements) • In practice, different locations may have different costs • Corridor versus inside wall • Have a budget B to spend on placing sensors • Constant-factor approximation – same constant (1-1/e) • Slightly more complicated than greedy algorithm [Sviridenko / Krause, Guestrin]

Model learned from 54 sensors Entropy criterion Mutual information criterion Posterior mean Posterior variance Deployment results “True” temp. prediction “True” temp. variance Mutual information has 3 times less variance than entropy criterion Used initial deployment to select 22 new sensors Learned new GP on test data using just these sensors

Comparing to other heuristics • Greedy • Algorithm we analyze • Random placements • Pairwise exchange (PE) • Start with a some placement • Swap locations while improving solution • Our bound enables a posteriori • analysis for any heuristic • Assume, algorithm TUAFSPGP gives results which are 10% better than the results obtained from the greedy algorithm • Then we immediately know, TUAFSPGP is within 70% of optimum! Better mutual information

Precipitation data Better Entropy criterion Mutual information Entropy Mutual information

Computing the greedy rule At each iteration For each candidate position i 2{1,…,N}, must compute: Requires inversion of NxN matrix – about O(N3) Total running time for k sensors: O(kN4) Polynomial! But very slow in practice  Exploit sparsity in kernel matrix

= Local kernels • Covariance matrix may have many zeros! • Each sensor location correlated with a small number of other locations • Exploiting locality: • If each location correlated with at most d others • A sparse representation, and a priority queue trick • Reduce complexity from O(kN4) to: • Only about O(N logN) Usually, matrix is only almost sparse

Approximately local kernels • Covariance matrix may have many elements close to zero • E.g., Gaussian kernel • Matrix not sparse • What if we set them to zero? • Sparse matrix • Approximate solution • Theorem: Truncate small entries ! small effect on solution quality • If |K(x,y)| ·, set to 0 • Then, quality of placements only O() worse

Effect on solution quality Better About 3 times faster, minimal effect on solution quality Effect of truncated kernels on solution – Rain data Improvement in running time Better

Summary • Mutual information criterion for sensor placement in general GPs • Efficient algorithms with strong approximation guarantees: (1-1/e) OPT-ε • Exploiting local structure improves efficiency • Superior prediction accuracy for several real-world problems • Related ideas in discrete settings presented at UAI and IJCAI this year Effective algorithm for sensor placement and experimental design; basis for active learning

A note on maximizing entropy • Entropy is submodular [Ko et al. `95], but… • Function F is monotonic iff: • Adding X cannot hurt • F(A[X) ¸ F(A) • Remark: • Entropy in GPs not monotonic (not even approximately) • H(A[X) – H(A) = H(X|A) • As discretization becomes finer H(X|A) ! -1 Nemhauser et al. analysis for submodular functions not applicable directly to entropy

Far away points? Overfits How do we predict temperatures at unsensed locations? Interpolation? temperature position

y = a + bx + cx2 + dx3 more sure here less sure here How do we predict temperatures at unsensed locations? Regression Few parameters, less overfitting  How sure are we about the prediction? y - temperature x - position But, regression function has no notion of uncertainty!!! 

Near-Optimal Sensor Placements in Gaussian Processes

Near-Optimal Sensor Placements in Gaussian Processes

Presentation Transcript

Optimal Energy Aware Clustering in Sensor Networks

Gaussian Processes in Machine Learning

Simple and Near-Optimal Auctions

Optimal Anti-Jamming Strategy in Sensor Networks

Sparse Approximations to Bayesian Gaussian Processes

Toward Optimal and Efficient Adaptation in Web Processes

Bayesian Reinforcement Learning with Gaussian Processes

Nonmyopic Active Learning of Gaussian Processes

Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost

Gaussian Processes for Active Sensor Management

Relational Learning with Gaussian Processes

Optimal synthesis of batch separation processes

Sparse Approximations to Bayesian Gaussian Processes

Gaussian Processes for Transcription Factor Protein Inference

Placements in Industry

Splitters and near-optimal derandomization

Near-Optimal Scalable Feature Selection

Optimal Adaptation in Web Processes with Coordination Constraints

Bayesian methods, priors and Gaussian processes

A Near-Optimal Planarization Algorithm

Gaussian Processes in Machine Learning

Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost