data management in sensor networks
Skip this Video
Download Presentation
Data Management in Sensor Networks

Loading in 2 Seconds...

play fullscreen
1 / 78

Data Management - PowerPoint PPT Presentation

  • Uploaded on

Data Management in Sensor Networks. By Jinbao Li, Zhipeng Cai, and Jianzhong Li. Introduction. The purpose of data management in sensor networks is to separate the logical view (name, access, operation) from the physical view of the data

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data Management ' - sherlock_clovis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data management in sensor networks

Data Management in Sensor Networks


Jinbao Li, Zhipeng Cai, and Jianzhong Li

  • The purpose of data management in sensor networks is to separate the logical view (name, access, operation) from the physical view of the data
  • Users and applications need not be concerned about the details of sensor networks, but the logical structures of queries
  • From the data management point of view, the data management system of a sensor network can be seen as a distributed database system, but it is different from traditional ones
The data management system of a sensor network organizes and manages perceptible information from the inspected area and answers queries from users or applications
  • A sensor network data management system is a system for extracting, storing and managing sensor data
  • This chapter discusses the methods and techniques of data management in sensor networks, including
    • the difference between data management systems in sensor networks and in traditional distributed database systems
    • the architecture of a data management system in a sensor network
the data model and the query language
  • the storing and indexing techniques of sensor data
  • the query processing techniques
  • two examples of data management systems in sensor networks: TinyDB and Cougar
1 difference between data management systems in sensor networks and in distributed database systems
1. Difference Between Data Management Systems In Sensor Networks and In Distributed Database Systems
wsn vs distributed database
WSN vs. Distributed Database
  • In traditional distributed database systems, data management and query processing are just applications of the network systems, and the details of the networks should not be of concern. While in a sensor network, the details of the networks must be of concern
  • The data produced by a sensor is an infinite data stream. However, infinite data streams cannot be managed by traditional database systems.
wsn vs distributed database cont
WSN vs. Distributed Database (cont.)
  • The perceptible data from sensors are not accurate
  • The data management system of a sensor network needs to reduce the waste of power to extend the network lifetime
  • Traditional database systems do not have the ability to process long-running queries
wsn vs distributed database cont8
WSN vs. Distributed Database (cont.)
  • Query processing techniques in a traditional database system are not suitable for sensor networks
    • query optimization technique
    • locks and unlocks the data
    • infinite and uncertain data streams
    • real-time query processing
wsn vs distributed database cont9
WSN vs. Distributed Database (cont.)
  • The amount of data from sensors is very large and not all of the data can be stored
    • sensor networks needs power efficient in-network distributed data processing algorithms
Four system models
    • centralized model
    • semi-distributed model
    • distributed model
    • hierarchical model
2 1 centralized model
2.1 Centralized Model
  • In a centralized model, query processing and access to sensor networks are separated
  • The centralized approach proceeds in two steps
    • data is extracted from the sensor network in a predefined way and is stored in a database located on a central server
    • query processing takes place on the centralized database
centralized model cont
Centralized Model (cont.)
  • Disadvantages:
    • the central server is the performance bottleneck and single point of failure
    • all sensors are required to send data to the central server, which incurs large communication cost
2 2 semi distributed model
2.2 Semi-distributed Model
  • Certain computations can be performed on the raw data at each sensor node
  • Two representative systems for this model –Fjord, Cougar
  • Fjord, part of Telegraph (a developing project at UC Berkeley) [13], is an adaptive dataflow system
  • Fjord has two major components: adaptive query processing engine and sensor proxy
  • Fjord is a query processing engine combining push and pull mechanisms
    • In Fjord, data streams (an infinite sequence of [tuple, timestamp] pairs) are pushed to the query processing engine instead of beingpulled as in traditional database systems
    • At the same time, non-sensor data is pulled by the query processing engine
  • Fjord integrates Eddy to adaptively change the execution plans according to computing environments on a tuple-per-tuple basis.
fjord cont
Fjord (cont.)
  • Fjord is sensor proxy, an interface between a single sensor and query processor as shown in Figure 1
    • A sensor node only needs to deliver data to its sensor proxy. Then the sensor proxy delivers the data to the query processor.
    • the sensor proxy directs sensors to perform certain local computations to aggregate samples in a predefined way.
  • the sensor proxy actively monitors sensors, evaluates user needs and current power conditions, and appropriately programs and controls sensors\' sampling rate and delivery rate to achieve acceptable sensor battery lifetime and performance
  • Cougar is a sensor database project developed at Cornell University [7, 4]
  • The basic idea of this project is to push as much computation as possible into the sensor network to reduce the communication between sensor nodes and front-end server(s)
  • In this model, the query workload determines the data that should be extracted from sensors. Only relevant data are extracted from the sensor network
2 3 distributed model
2.3 Distributed Model
  • In this model, each sensor is assumed to have high storage, computation and communication capabilities
  • Distributed Hash Table (DHT)
    • Each sensor samples, senses and detects events. Then a hash function is applied on the event key, and events are stored at a “home" sensor node which is the closest to the hash value of the event key.
    • To process a query, the same hash function is applied first. Then the query is sent to the node with the closest hash value for further processing
distributed hash table cont
Distributed Hash Table (cont.)
  • This model pushes all computation and communication to sensor nodes.
  • The problem with this model is that sensors are assumed to have almost the same communication and computation capabilities as normal computers
  • DHT is only suitable for key queries, which incurs large communication cost.
2 4 hierarchical model
2.4 Hierarchical model
  • This model includes two layers: sensor network layer and proxy network layer
  • This model combines in-network programming, adaptive query processing and efficient content-based search techniques
  • Sensors have three functions: receiving commands from proxy, performing local computation and delivering data to proxy
  • Sensor nodes receive control commands including sampling rate, delivery rate and operations that need to be performed from the proxy layer.
Proxies have five functions: receiving queries from users, issuing control commands and other information to sensors, receiving data from sensors, processing queries, and delivering query results to users
  • each proxy processes the query in a decentralized way and delivers the results to the users.
  • computation and communication loads are distributed among all proxies.
3 1 data model
3.1 Data Model
  • The data model in TinyDB simply extends the traditional relational data model. It defines the sensed data as a single, infinitely long logical table
  • This table conceptually contains one row for each reading generated by any sensor, and hence the table can be thought of streaming infinitely over time
  • Table 1 is an example of the relational table for TinyDB.
data model cont
Data Model (cont.)
  • A sensor network is looked at as a large distributed database system in the Cougar system developed by Cornell University
  • Each sensor corresponds to a node in a distributed database system and stores part of the data
  • Cougar does not send data at each sensor to a central node for storage and processing. It tries to process data separately within the sensor network
3 2 query language
3.2 Query Language
  • Query schemes proposed include snapshot query, continuous query, event-based query, life-cycle based query and accuracy-based query
  • TinyDB\'s query language is based on SQL, and we will refer to it as TinySQL
  • Query Language in TinySQL supports selection, projection, determining sampling rate, group aggregation, user defined aggregation, event trigger, lifetime query, setting storing point and simple join
  • The Grammar of TinySQL query language is as follows:

SELECT select-list

[FROM sensors]

WHERE predicate

[GROUP BY gb-list]

[HAVING predicate]

[TRIGGER ACTION command-name[(param)]]


tinysql cont
TinySQL (cont.)
  • examples of a TinyDB query:

SELECT room_no, AVERAGE(light), AVERAGE(volume)

FROM sensors

GROUP BY room no

HAVING AVERAGE(light) > l AND AVERAGE(volume) > v



FROM sensors

WHERE temp > thresh



tinysql cont32
TinySQL (cont.)

Select nodeID, light

FROM sensors

WHERE light > 200

query language in cougar
Query Language in Cougar
  • Cougar also provides a SQL-like query language
  • The grammar of query language in Cougar is as follows

SELECT select-list

FROM [Sensordata S]

[WHERE predicate]

[GROUP BY attributes]

[HAVING predicate]

DURATION time-interval

EVERY time-span

query language in cougar cont
Query Language in Cougar (cont.)
  • An example is given below:

SELECT AVG(R.concentration)

FROM ChemicalSensor R

WHERE R.loc IN region

HAVING AVG(R.concentration)>0.6

DURATION (now, now+3600)


storage and index techniques in sensor networks
Storage and Index Techniques in Sensor Networks
  • For sensor networks, one of the most challenging problems is to name data
  • For data centric storage systems, every data generated by each sensor is stored at some sensor(s) in the network according to its name
  • In the same way, it is easy to find the corresponding data in the sensor network
4 1 data centric naming
4.1 Data Centric Naming
  • hierarchical naming: data generated by a camera sensor may be named:


  • attribute-value naming scheme
  • These naming schemes implicitly define a set of ways in which the data may be accesse

type = camera

value = image.jpg

location = “CS Dept, Univ. of Southern California"

4 2 the performance of data centric storage systems
4.2 The Performance of Data-centric Storage Systems
  • Data names are used when storing and receiving data in data centric storage. It uses a mapping between a sensor and the name of a data to store the data
  • Figure 4 describes such a data centric storage algorithm [16, 18]. Assume sensor nodes A and B want to insert a data named bird-sighting and this data is hashed to node C, so the data is routed to node C by the routing protocol.
  • Similarly, a query also uses the name of the data to acquire the location where the data is stored and the query is sent to that sensor.
the performance of data centric storage systems cont
The Performance of Data-centric Storage Systems (cont.)
  • Besides data centric storage, we consider two alternatives:
    • an External Storage scheme in which all events are stored at a node outside the network;
      • for external storage, the cost of accessing events is zero, since all events are available at one node
      • there is an energy cost in sending events to this node, and significant energy is spent at nodes near the external node in receiving all these events (these nodes become hot-spots)
    • a Local Storage scheme where each event is stored at the node at which it is generated;
      • local storage incurs zero communication cost in storing the data, but incurs a large communication cost, a network flood, in accessing the data
the performance of data centric storage systems cont41
The Performance of Data-centric Storage Systems (cont.)
  • analysis shows that the data-centric storage scheme becomes more preferable as the size of the network increases, or when many more events are generated than can be usefully queried
  • Consider a network of n nodes, in which the cost of sending messages to all nodes (e.g., a flood) is O(n) and the cost of sending a message to a designated node is O(n). see Table 3.

De the total number of the detected events, Q

the number of the queries, and Dq the number of the events which are returned as answers for the Q queries

4 3 mechanisms for data centric storage
4.3 Mechanisms for Data-centric Storage
  • the essence of a data-centric storage system is captured at its interface, which supports a put() operation that stores data by name within the network, and a get() operation that retrieves data by name
  • In this section, we describe a system called a Geographic Hash Table (GHT)
  • In a GHT, event names are randomly hashed to a geographic location (i.e., an x, y coordinate). This mapping is multi-to-one
  • A put() operation stores an event at the node which is the closest to the hashed location, and a get() operation retrieves one or more events from that node
  • Applications can determine which parts of an event name are used to compute the geographic hash value
  • Data route to the hashed node by GPSR routing portocol
gpsr an overview
GPSR: an Overview
  • GPSR is a geographic routing protocol that was originally designed for mobile ad-hoc networks
  • Given the coordinates of a node, GPSR routes a packet to that node using location information only
  • GPSR contains two different algorithms: greedy forwardingand perimeter forwarding
gpsr cont
GPSR (cont.)
  • Greedy forwarding
    • Assume each node in a network knows its own location, and that of its neighbors
    • When a node receives a message destined to location D, it sends the message to another neighbor C which is closer to D than itself
    • Such a neighbor might not always exist; in this case, GPSR invokes perimeter routing at that node
gpsr cont47
GPSR (cont.)
  • perimeter routing
    • When a packet finds itself at a node which has no neighbors closer to the destination than itself, we say that the packet has encountered a void
    • Voids can result from irregular deployment of nodes, as well as from radio-opaque obstacles
    • A natural technique to route around a void is the right-hand rule (Figure 5)
    • According to this rule, a traversal walks around the perimeter of the void
    • When this traversal reaches a node that is closer to D than A, greedy forwarding resumes.
gpsr cont49
GPSR (cont.)
  • Assume that GHT hashes an event to a destination location d, and, without loss of generality, that no node exists at that location (Figure 6)
  • When a packet returns in perimeter mode to the node that originated the perimeter traversal, the corresponding event is stored at that node.
ght robustness perimeter refresh
GHT Robustness: Perimeter Refresh
  • GHTs use a simple perimeter-refresh protocol to maintain the home node associations.
  • For a given event, a home node will send a refresh message destined to the corresponding location periodically. This refresh message will traverse the perimeter around the specified location
  • Each node also associates a timer with the event. If the timer expires, nodes use this as an indication of the home node failure, and initiate a refresh message themselves.
  • In this manner, home node failures are detected, and the correct new home node is discovered.
ght robustness perimeter refresh cont
GHT Robustness: Perimeter Refresh (cont.)
  • GHTs are able to detect and adjust for the arrival of new nodes into the system.
  • By GPSR\'s forwarding rules, a new node will initiate a new perimeter traversal that will pass through the previous home node.
  • When the perimeter traversal returns to the new home node, the association between the event and that node will have been completed
ght scaling structured replication
GHT Scaling: Structured Replication
  • If many events are hashed to the same location, the home node can become a hot-spot which will affect the performance and the network lifetime
  • To avoid this, structured replication hierarchically decomposes the geographical region enclosing the sensor network in a manner shown in Figure 7
  • A node that generates an event would, instead of storing the event at the root (or home) node, store it at the nearest mirror
  • the query is sent directly to the root, which then forwards the query to its children, and so on, until it arrives at all the mirrors
4 6 distributed multi dimensional indices
4.6 Distributed Multi-dimensional Indices
  • In this section, we discuss the design of a distributed index structure called Distributed Index for Multidimensional data (DIM) [24, 16] for supporting multi-dimensional queries in sensor networks
  • The key to resolve multiple-dimensional range queries efficiently is data locality: events with comparable attribute values are stored nearby
  • The basic insight underlying DIM is that data locality can be obtained by a locality-preserving geographic hash function
  • The geographic hash function finds a locality-preserving mapping from the multi-dimensional space to a 2-d geographic space. This mapping is inspired by k-d trees
4 6 distributed multi dimensional indices cont57
4.6 Distributed Multi-dimensional Indices (cont.)
  • In DIM, each network node is mapped to a unique zone
  • Consider a DIM that aims to support m distinct attributes. Let us denote these attributes A1; …;Am. Furthermore, assume that all attribute values have been normalized to be between 0 and 1
  • The DIM hashing scheme assigns a k bit zone code to an event as follows. For 0 ≤ i ≤ m, if Ai < 0.5, the i-th bit of the zone code is assigned 0, else 1. For m+1 ≤i ≤2m, if Ai-m < 0.25 or 0.5 ≤ Ai-m < 0.75, the i-th bit of the zone is assigned 0, else 1
    • As an example, consider event < 0.3; 0.8 >. For this event, the 5-bit zone code is 01110.
4 6 distributed multi dimensional indices cont59
4.6 Distributed Multi-dimensional Indices (cont.)
  • To insert an event e, DIM computes the zone code of e. GPSR is then used to route e to zone Z whose zone code has the longest matching prefix of the e\'s zone code.
  • The range of the query [0.3 –0.5; 0.4 –0.8] intersects with a set of zones and the query can be resolved by routing the request to the nodes that own those zones.
Most queries about the data in sensor networks can be classified into three types:
    • Historical query
    • Snapshot query
    • Continuous query
5 1 centralized and distributed query processing
5.1 Centralized and Distributed Query Processing
  • Centralized query processing contains two steps:
    • First, it periodically retrieves data from the sensor network and stores the data at a centralized database.
    • Second, it processes the queries on the centralized database.
    • These two steps can be executed at the same time
  • A centralized approach is suitable for historical query processing
  • centralized query processing can only be applied in the condition that sensors have sufficient power supply and low sampling rate
5 1 centralized and distributed query processing cont
5.1 Centralized and Distributed Query Processing (cont.)
  • In Distributed Query Processing, a query determines which data should be retrieved from the sensor network and the aggregations in the query are processed in-network
  • PlanA in Figure 12 shows the centralized query processing technique. Every sensor returns the current temperature at a user defined sampling rate
  • PlanA wastes the limited energy, increases the load of the sensor network, produces unnecessary transmissions and causes congestion in the network.
  • In PlanB, only qualified data are involved in aggregation and only partial aggregations are sent to the central database to derive the final results. Consequently, this reduces the communication traffic in sensor networks and saves bandwidth resource
5 2 aggregation processing in queries
5.2 Aggregation Processing in Queries
  • In a centralized aggregation, a client host first gathers readings from all the sensors and then computes the aggregate result
  • In the distributed approach, aggregation is achieved by the collaboration of many sensor nodes. While routing the data, sensor nodes compute the whole or partial aggregate value of the data they transfer. Finally, those aggregate results are sent to the client host.
5 3 continuous query processing
5.3 Continuous Query Processing
  • After a continuous query is proposed, the global query processor decomposes the query into a set of sub-queries and sends them to the corresponding sensors to process
  • A Continuously Adaptive Continuous Queries over Streams (CACQ) can be used in a local query processor in sensor networks
  • It will be introduced in two aspects: single continuous query and multiple continuous query
A sensor network data management system is a system for extracting, storing and managing sensor data
  • It mainly concerns query optimization and processing of sensor data.
  • We present two examples of a sensor network data management system: TinyDB of UC Berkeley and COUGAR of Cornell.
6 1 sensor network data management system tinydb
6.1 Sensor Network Data Management System - TinyDB
  • Given a query specifying your data interests, TinyDB collects that data from motes in the environment, filters it, aggregates it together, and routes it out to a PC
  • Some of the features of TinyDB include:
    • Metadata Management
    • High Level Queries
    • Network Topology
    • Multiple Queries
    • Incremental Deployment
tinydb cont
TinyDB (cont.)
  • Each node in the sensor network must be installed with the Sensor Network Software (TinyQP)
  • The TinyDB system can be broadly classified into two subsystems: Sensor Network Software and Client Interface
  • Sensor Network Software is the heart of TinyDB, it runs on each mote in the network, and consists of several major pieces:
sensor network software
Sensor Network Software
  • Sensor Catalog and Schema Manager
    • The catalog is responsible for tracking the set of attributes, or types of readings (e.g., light, sound, voltage) and properties (e.g., network parent, node ID) available on each sensor
  • Query Processor
    • The query processor uses the catalog to fetch the values of local attributes, receives sensor readings from neighboring nodes over the radio, combines and aggregates these values together, filters out undesired data, and outputs values to parents
sensor network software cont
Sensor Network Software (cont.)
  • Memory Manager
    • TinyDB extends TinyOS with a small, handle-based dynamic memory manager
  • Network Topology Manager
    • TinyDB manages the connectivity of motes in the network, to efficiently route data and query sub-results through the network.
    • The network topology is maintained as a routing tree, with Mote #0 at the root
    • Query messages flood down the tree in a straightforward fashion. Data messages flow back up the tree
client interface
Client Interface
  • There are two kinds of User Interface in TinyDB. The first is an SQL-like query language, called TinySQL. The second is Java-based Client Interface, which supports client programming
  • TinyDB Java-based Client Interface consists of a set of Java classes and applications
  • TinyOS components, that can be used in TinyDB
    • Clock, GenericComm, Leds, RandomLFSR, …
6 2 cougar system
6.2 Cougar System
  • The Cougar system consists of three components: the QueryProxy, FrontEnd, and GUI
  • QueryProxy, which is the core of Coguar, runs on each sensor node in the network to parse and execute queries. It is a small software for query processing in a sensor network
    • Communications within the sensor network are transmitted using Directed Diffusion (flooding) and are formatted as XML
queryproxy of cougar
QueryProxy of Cougar
  • Cougar divides the nodes in the sensor network into several clusters. Each cluster contains several nodes but only one leader
  • The QueryProxy system has a hierarchical structure with cluster leaders communicating with the FrontEnd and with the other sensor nodes in their clusters