1 / 22

Data Centric Storage: GHT

Data Centric Storage: GHT. Brad Karp UCL Computer Science. CS 4C38 / Z25 17 th January, 2006. One View of Sensor Networks: Querying Zebra Sightings. user. User remote; connected via base station How do users pose queries? by event name (e.g., “Zebras?” )

ova
Download Presentation

Data Centric Storage: GHT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Centric Storage: GHT Brad Karp UCL Computer Science CS 4C38 / Z25 17th January, 2006

  2. One View of Sensor Networks: Querying Zebra Sightings user User remote; connected via base station How do users pose queries? • by event name (e.g., “Zebras?”) Query(“Zebra”) {(“Zebra”, i, [u, v]); (“Zebra”, j, [x, y])} • Geographic Hash Table (GHT) • In-network storage of data • Data placement, query routing built on geographic routing (x, y) j i (u, v)

  3. Problem:Data Dissemination in Sensornets • Sensors numerous and widely dispersed • Sensed data must reach remote user • Data dissemination problem: • How best can we supply measured data to users? • Design drivers for system: • Energy scarce • Wireless media prone to contention

  4. Context:Directed Diffusion [Estrin et al., 2000] “Zebra?” • Data-centric routing: flood queries (interests) by name • Return any responses along reverse paths (“Zebra”, i, [u,v]) (“Zebra”, j, [x,y]) (u, v) i j (x, y)

  5. Assumptions, Metrics, Terminology • Large-scale networks with known geographic boundaries • Users on WAN, a few APs with WAN uplinks • Nodes know own geographic locations; often needed to annotate sensed data • Energy metrics • Total usage: total number packet txs • Hotspot usage: max. number txs by one node • Event: discrete, named object recognized by sensor (e.g., “Zebra”) • Query: request from user for data under same naming scheme

  6. Outline • Motivation and Context • Canonical Data Dissemination Approaches • Geographic Hash Table (GHT) Service • Evaluation in Simulation • Summary

  7. Canonical Approach: Local Storage “Zebra?” For n nodes, Q event names queried for, and Dq events detected with those names, cost (in pkts): • Total: • Hotspot: (at access point) (“Zebra”, i, [u,v]) (“Zebra”, j, [x,y]) (u, v) i j (x, y)

  8. Canonical Approach: External Storage (“Cat”, k, [s,t]) (s, t) (“Zebra”, i, [u,v]) (u, v) (“Zebra”, j, [x,y]) i (x, y) j For n nodes, Dt total events detected, cost (in pkts): • Total: • Hotspot: (at access point)

  9. Canonical Approach: Data-Centric Storage (DCS) user “Zebra?” For n nodes, Q names queried, Dq of those events detected, cost (in pkts): Total (full enumeration): Total (summarization): Hotspot (full enumeration): (at access point) Hotspot (summarization): (at access point) j (x, y) i (u, v)

  10. Cost Comparison ofCanonical Approaches • Local storage incurs greatest total message count asn grows • External storage always sends fewer total messages than DCS • When many more event types detected than queried for, DCS incurs least hotspot message count • DCS permits summarization of events (return multiple events in one packet)

  11. Outline • Motivation and Context • Canonical Data Dissemination Approaches • Geographic Hash Table (GHT) Service • Evaluation in Simulation • Summary

  12. Geographic Hash Table: A Sketch user “Zebra?” • Two operations: • Put(k, v) stores event v under key k • Get(k) retrieves event associated with key k • Hash key k into geo coordinates; store and retrieve events for that key at that location • Spreads key space storage load evenly across network! (a, b) j (x, y) i H(“Zebra”) = (a, b) (u, v)

  13. Design Criteria forScalable, Robust DCS • Storage system must offer persistence despite node and link failures • If node holding k changes, queries and data must make consistent choice of new node • Storage shouldn’t concentrate at any one node • Storage capacity should increase with node count • As ever, avoid traffic concentration, minimize message count

  14. GHT: Home Nodes and Perimeters • Likely no node exactly at H(k); hash function ignorant of topology • Home node: closest node to point output by H(k) • Home perimeter: perimeter enclosing point output by H(k)

  15. Consistency:Perimeter Refresh Protocol (PRP) • (k,v) pairs replicated at all nodes on home perimeter • Non-home nodes on home perimeter: replica nodes • Home node sends refresh packets every Thseconds, containing all (k,v), to H(k) • Receiver of refresh who is closer to H(k) than originator consumes it, initiates its own • Replica node becomes home node if its own refresh returns • Upon forwarding a refresh, node resets takeover timer for Ttseconds; upon expiration, node generates a refresh for k • Death timer: all nodes expire (k,v) pairs they cache after Tdseconds; reset every time refresh for k received.

  16. Outline • Motivation and Context • Canonical Data Dissemination Approaches • Geographic Hash Table (GHT) Service • Evaluation in Simulation • Summary

  17. Simulation Parameters

  18. Query Success Rate w/Node Failures (100 Nodes)

  19. Storage per Node w/Node Failures (100 Nodes)

  20. Further Scaling and Robustness Results • Mean and maximum storage load per node decrease as node population increases • Query success rate above 96% for mobility rates of 0.1 m/s and 1 m/s • Query success rate degrades gracefully as alternation between up/down states accelerates • Validation of relative message costs of three canonical approaches in simulations of up to 100,000 nodes

  21. Follow-On Work in DCS • Mapping geographic boundaries of a network; support hashing to inside a network with changing boundaries • DCS without geographic routing: GEM [NeSo03] • Range queries for GHT using K-D trees: DIM [LiGo03] • Assigning coordinates for geographic routing using only topological knowledge (not, e.g., GPS) [RaRa03] • Dealing with non-uniform node distributions; multiple hash functions [GaEs03]

  22. DCS: Summary • Three canonical approaches will be useful in data dissemination for sensor networks: local storage, external storage, and data-centric storage • Summarization is a key advantage of the DCS approach in reducing hotspot usage and total usage; home node is a useful aggregation point • Sensor applications with many nodes, many event types, not all queried are those where DCS offers most attractive performance vs. other canonical approaches • GHT spreads storage load evenly on sensor networks • GHT offers robust persistence under node failures and mobility, because it binds data to fixed locations, rather than to “volatile” nodes

More Related