Secure and Highly-Available Aggregation Queries via Set Sampling

Secure and Highly-Available Aggregation Queries via Set Sampling Haifeng Yu National University of Singapore

Secure Aggregation Queries in Sensor Networks • Multi-hop sensor network with trusted base station • With the presence of malicious (byzantine) sensors • Goal: Count the # of sensors sensing smoke (i.e., satisfying a certain predicate) • Sum, Avg, and other aggregates are similar – see paper • Type-1 attack: Malicious sensors report fake readings • If # malicious sensor is small – damage is limited • Not the focus of our work Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 2

Secure Aggregation Queries in Sensor Networks • Type-2 attack: Malicious sensors (indirectly) corrupt the readings of other sensors – much larger damage • E.g., in tree based aggregation • Focus of most research on secure aggregation – our focus too 3 6 base station 1 4 2 malicious 1 0 1 0 0 Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 3

State-of-Art and Our Goal • Active area in recent years (e.g. [Chan et al.’06], [Frikken et al.’08], [Roy et al.’06], [Nath et al.’09]) • All these approaches focus on detection (i.e., safety only) • Will detect if the result is corrupted • But will not produce a correct result when under attack Our Goal Detecting attacksTolerating attacks Safety only Safety + Liveness System made harmless System made useful Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 4

Our Approach to Tolerating Attacks 3 6 • Previous approaches: Fix the security holes in tree-based aggregation • Dilemma in in-network processing • Our novel approach: Use sampling • With MACs on each sample, security comes almost automatically 1 4 2 1 0 1 0 0 Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 5

Our Approach to Tolerating Attacks • Previous approaches: Fix the security holes in tree-based aggregation • Dilemma in in-network processing • Our novel approach: Use sampling • With MACs on each sample, security comes almost automatically Cannot modify the result 0 0 0 0 0 0 0 0 0 sampled flood the sample result (with a MAC) Challenge with sampling: Potentially large overhead Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 6

Background: Estimate Count via Sampling • n sensors, b sensors sensing smoke (called black sensors) • Goal: Output (, ) approximation b’ such that: • E.g.: Sample 10 sensors and 5 are black  b’ = 0.5n • Classic result: # sensors needed to sample is (Prohibitively) expensive for small b Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 7

Reduce the Overhead via Set Sampling • Challenges with small b: • Need many samples to encounter black sensors • Set sampling: Sample a set of sensors together • Binary result will tell whether any sensor in the set is black (but not how many) • Efficient implementation in sensor networks – later • Should be easier to hit sets containing black sensors How effective will this be? (How many sets do we need to sample to estimate count?) Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 8

Our Results • Novel algorithm for estimating count using set sampling • Defines randomized and inter-related sets, and sample them adaptively • # sets needed to sample: • Previously without set sampling: # of samples reduced from polynomial to polylogarithmic (can be further reduced – see paper) Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 9

Our Results Haifeng Yu, National University of Singapore Per-sensor msg complexity: • Comparable to some detection-only protocols [Roy et al.’06] • Similar msg sizes • See paper for time complexity • See paper for other aggregates (sum, avg) Set sampling + novel algorithms using set sampling  Enables secure aggregation queries despite adversarial interference Haifeng Yu, National University of Singapore 10

Outline of This Talk • Background, goal, and summary of results • Simple implementation of set sampling in sensor networks • Main technical results: Novel algorithm for estimating count via set sampling Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 11

Implementing Set Sampling – Non-Secure Version Example: sample the set {A, B, C, D} Request flooded from the base station: O(log n) bits We use only O(n) (instead of O(2n)) random sets  O(log n) bits to name a set Reply: Single bit Flood back from all black sensors in the set {e.g., A and C} Each sensor only forwards the first message received Base station sees binary answer Multiple samples can be taken in one flooding Our algorithm takes samples in O(log n) sequential stages Only O(log n) times of flooding Goal: O(1) per-sensor msg complexity for sampling a set Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 12

Implementing Set Sampling – Secure Design • Each set = Some distinct symmetric key K • Preload K onto all sensors in the set • Each sensor should be only be in a small number of sets – O(log n) in our protocol • Request: name of K, nonce • Reply: MAC_K(nonce) • Only sensors holding K can generate • DoS attacks possible • Can be avoided with improved design – see paper Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 13

Outline of This Talk • Background, goal, and summary of results • Implement set sampling in sensor networks • Main technical meat: Novel algorithm for estimating count via set sampling • For now assume all sensors are honest • Security follows from the clean security guarantees of sampling, though some minor modifications needed – see paper Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 14

Random Sets on the Sampling Tree • Basic approach: • Construct (related) randomized sets of different sizes and adaptively sample them • Base station internally created a sampling tree • A complete binary tree with 4n leaves • Each tree node = A distinct symmetric key = Some set of sensors • Sampling tree is an internal data structure and not network topology Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 15

A B K1, K3, K6, K12 loaded onto the sensor B K1, K2, K5, K10 loaded onto the sensor A Each sensor is associated with a uniformly random leaf (independently) Each tree node corresponds to a set containing all the sensors in its subtree Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 16

Properties of the Sampling Tree • A sensor is black if it satisfies the predicate • A key is black iff the corresponding set contains black sensor • : fraction of black keys at level i Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 17

is monotonic as we go down the tree • Decrease by a factor of at most 2 per level • At the top (assuming at least one black sensor) • At the bottom (4n leaves!) Lemma: There exists a level  with Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 18

Why Level  Helps Haifeng Yu, National University of Singapore not too small  Efficient estimation of via naïve sampling: • samples on level  yields an (, ) approximation for not too large  Can potentially estimate final count directly from • Chernoff-type occupancy tail bound for balls into bins • See paper for details Haifeng Yu, National University of Singapore 19

Additional Issues: Too Few Keys on Level  • Challenge: • To estimate final count based on , the number of keys on level  needs to be large enough • If not, need to track down to lower levels • Need to leverage other interesting properties on the sampling tree • See paper Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 20

Additional Issues: Finding Level  • Binary search on the O(log(n)) levels • On each level i examined, sample a small number of random keys to roughly estimate • Extremely efficient • Challenges: • The binary search operates on estimated values (with error and may not be monotonic) • When is small, the estimation only has error guarantee on one side • See paper Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 21

Example Numerical Results • n = 10,000 and count result (b) range from 0 to 10,000 • Overhead: • 5-15 sequential stages of sampling • Total 250-300 samples • Avg approximation error: (1±0.08) • Hard to get better accuracy even in trusted environments ([Nath et al.’09])… • Naive sampling: 300 samples gives same accuracy only when b > 2,000 Haifeng Yu, National University of Singapore

Conclusions • Making aggregation queries secure is critical for many sensor network applications • Contribution: Detecting attacksTolerating attacks • Safety only Safety + Liveness • Our approach: • Abandon in-network processing and use sampling • Use novel set sampling to reduce the overhead • Polynomial overhead  Logarithmic overhead Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 23

Related Work to Set Sampling • Decision tree complexity for threshold-t functions (i.e., whether bt) [Ben-Asher and Newman’95] [Aspnes’09] • Most results are for error-free deterministic protocols • Large lower bound: (t) (implying (b) for count) • No prior results for general Monte Carlo randomized algorithm Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 24

Tolerating Attacks is Difficult • Example: Byzantine consensus • Detection substantially easier than tolerance • n  3f +1 lower bound only applies to tolerance and not detection • Pinpointing / revoking malicious sensors is hard • E.g., due to lack of public-key authentication • Active research area by itself Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 25

System Model • Multi-hop sensor network with trusted base station • Performance metric: Time complexity – see paper • Performance metric: Per-sensor msg complexity • Max number of msgs sent/received by an single sensor (captures loading balance) • msg size is either 8 bytes (size of a MAC) of log(n) bits • Collision ignored – as in all prior work • Or one can apply existing algorithms… Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 26

Implementing Set Sampling – Non-Secure Version Request size: We use at most O(n) (random) sets  O(log(n)) bits to name a set Goal: O(1) per-sensor msg complexity for sampling a set Request flooding – every sensor sends/receives one msg Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 27

Implementing Set Sampling – Non-Secure Version Reply: Single bit Goal: O(1) per-sensor msg complexity for sampling a set A B, C, D satisfies the predicate, A does not Reply flooding – Only the first reply is forwarded B D C This is why set sampling is designed to be binary Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 28

(The overhead of sampling a set needs to be properly controlled – will discuss later.) Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 29

Translating to b • We now have a good estimation for • Need to produce a good estimation for b • Let number of keys on level be n • Throw b balls into n bins • The fraction of occupied bins has the same distribution as • This distribution is highly concentrated near its mean (Chernoff-type occupancy tail bound), assuming • not too close to 1 • n not too small Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 30

Summary of Techniques to Achieve the Results • Define randomized sets based on a complete binary tree • Interesting relationships among the sets • Sample the sets adaptively • Leverages Chernoff-type occupancy tail bounds for balls-into-bins Haifeng Yu, National University of Singapore Haifeng Yu, National University of Singapore 31

Secure and Highly-Available Aggregation Queries via Set Sampling

Secure and Highly-Available Aggregation Queries via Set Sampling

Presentation Transcript

StarFish: highly-available block storage

Overcoming Limitations of Sampling for Aggregation Queries

Privacy-Safe Network Trace Sharing via Secure Queries

StarFish : highly-available block storage

Highly Available Cloud Storage Azure and S3

StarFish : highly-available block storage

Highly available services

Weighted Geometric Set Multicover via Quasi-uniform Sampling (ESA 2012)

Nested Queries and Aggregation

Secure and Highly-Available Aggregation Queries via Set Sampling

6 Rank Aggregation and Top-k Queries

Incremental Aggregation on Multiple Continuous Queries

Highly Available ACID Memory

Secure Aggregation for Wireless Networks

Highly Available and Secure Fault-tolerant Mobile Computing

Building Highly Available Web Applications

DoS-Resilient Secure Aggregation Queries in Sensor Networks

Robust Estimation With Sampling and Approximate Pre-Aggregation

Secure Outsourced Aggregation via One-way Chains

Enabling Highly Available Grid Sites

Efficient Secure Aggregation in VANETs

Aggregation and Secure Aggregation