data centric view of sensornets an overview n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data-centric view of sensornets: An Overview PowerPoint Presentation
Download Presentation
Data-centric view of sensornets: An Overview

Loading in 2 Seconds...

play fullscreen
1 / 36

Data-centric view of sensornets: An Overview - PowerPoint PPT Presentation

  • Uploaded on

Data-centric view of sensornets: An Overview. Puru Kulkarni Vijay Sundaram Bhuvan Urgaonkar. Motivation. Ubiquitous presence of sensor networks Communication, computation, limited storage, sensing capabilities Used to sense, actuate, control Sensors everywhere = Data everywhere!

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Data-centric view of sensornets: An Overview

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data centric view of sensornets an overview

Data-centric view of sensornets: An Overview

Puru Kulkarni

Vijay Sundaram

Bhuvan Urgaonkar

  • Ubiquitous presence of sensor networks
    • Communication, computation, limited storage, sensing capabilities
    • Used to sense, actuate, control
    • Sensors everywhere = Data everywhere!
  • Require an infrastructure for data access and storage
  • Sensors sense/generate data
  • Users/Applications interested in data or some measure of data
  • Common user operations are:
    • Queries and Monitoring
    • Actuate and Control
typical queries
Typical Queries
  • Historical
    • What is the average rainfall over past 2 days?
  • Current
    • What is the current temperate in Rm# 226?
  • Long Running
    • Temperature in Rm# 226 over the next 4 hours every 30 seconds
  • How to identify relevant sensors?
  • Computation vs. Communication tradeoff
    • Where to process query?
      • inside the sensor network (route query)
        • Need new techniques
      • at a centralized location (route data)
        • Large amounts of data transfer (not efficient)
        • Data gathering may not reflect query rate
    • How to process query?
      • queries on streaming data
DataSpace: Querying and Monitoring Deeply Networked Collections in Physical SpaceT. Imielinski and S. Goel, Rutgers University
  • Billions of objects populate space
  • Each produces and locally stores data
  • Location aware
  • Can be selectively monitored, queried and controlled
  • Physical world enhanced with data
  • Dataspace
    • Data lives on the object
    • Users access not only “local” information but can navigate entire dataspace
    • Spatial world divided in 3-D datacubes
      • CS Bldg. , street, block etc
    • Communication, messaging and computation techniques for querying and monitoring required
querying and monitoring
Querying and Monitoring
  • Queries are spatially driven
  • Steps:
    • Identify relevant datacubes
    • Identify relevant nodes (dataflocks)
      • Datacube directory service
    • Aggregation for queries on several datacubes
      • e.g.: Information about Manhattan taxi cabs
architecting dataspace
Architecting DataSpace
  • Network as DataSpace engine
    • multicast mechanisms

(each node has an IP address!)

    • group membership based on
      • physical location
      • attribute (temperature, #vehicles etc)
    • multicast fits selective node addressing criteria to access relevant data
      • e.g.: what is average temperature in CS Bldg?
      • Query reaches only sensors in the CS Bldg datacube and have the corresponding group address
network as dataspace engine

Based on interested attribute

Based on location of datacube



DataSpace address

Network as DataSpace engine
  • Space Handleencodes datacube information
  • Subject Handle attributes that are part of a multicast group
  • Dataspace address is a IPv6 mutlicast address

E.g.: Space handle: 224.4.5

Subject handle: 8

Dataspace address:

geographic routing infrastruture
Geographic Routing infrastruture
  • Route message based on physical location rather than IP address
    • Use GPS coordinates for locations
  • Avoids use of multicast for routing queries to datacubes
  • Once query reaches a region use mutlicast
geographic routing infrastruture1
Geographic Routing infrastruture
  • Geo-router (routes based on datacube location)
  • Geo-node (issue query to nodes in datacube)
  • Geo-host (process geographics messages)
  • Approach
    • Route query to datacube
    • Geo-nodes route query within datacube
      • mulitcast with a TTL of 1
The Sensor Network as a Database
      • Govindan, Hellerstein, Hong, Madden, Franklin, Shenker
  • Querying the Physical World
      • Bonnet, Gehrke, Seshadri
sensornet database architecture
Sensornet Database architecture
  • Given a routing and access mechanism, how to process queries?
  • Provide a DB-view to users/apps
    • well understood programming interface
    • common data operations use computation in network
      • help energy-efficiency
    • allow users to be unaware of actual network, but treat it as a database
    • Sensor Network + Data => Sensor Network Database
what is required
What is required?
  • Core DB operations tailored for sensor networks
  • Design appropriate building blocks for DB operations
    • Join, aggregation, grouping, selection etc
sensornet database architecutre
SensornetDatabase Architecutre
  • Two important ideas:
  • in-network implementationsof primitive database query operators such as grouping, aggregation, and joins
    • group communication and routing protocols with possible processing at intermediate nodes implement the operator in an application independent way
sensornet database architecutre1
SensornetDatabase Architecutre
  • Relax the semantics of database queries to allow approximate results
    • relaxation enables energy-efficient implementations even given the expected high level of network dynamics
  • A sensor network is a proxy for a continuous realworld phenomenon, and by nature samples that phenomenon discretely at some rate, with some degree of error.
in network implementation
In-network Implementation
  • JOIN operator
    • selection over cross-product of a pair of tables
    • Tuples generated at different nodes might be joined at a single node
    • Some JOIN implementations are blocking
  • Blocking is infeasible in sensor networks
    • tables can contain unbounded streams of data
    • amount of memory available is limited
  • Need to retool these operations
    • Pipelining
    • Partitioning
non blocking pipelinined joins
Non Blocking Pipelinined Joins
  • Symmetric hash-join:
    • Maintains two hash tables (keyed by the column(s) used for the join)
    • On an input tuple, looks up matching tuples from other input’s hash table
    • Outputs any matching results
  • Ripple joins:
    • Statistically sample the two tables to be joined, in order to produce a stream of joined tuples
    • Relative rates at which the two tables are sampled adapt to match the variance produced by the data in each
    • low energy approach to obtain approximateanswers
  • Partitioning:
    • tuples are partitioned based on their join-column values and redistributed on the fly across multiple nodes;
    • the work of joining the individual partitions is done in parallel by each of the nodes
  • Partitions can be defined by value, geographically, or by sensor type, and a node (or nodes) can be designated to perform the join for the partition
in network implementation1
In-network Implementation
  • Aggregation operators
      • summarization of a column(s) into a single numerical value E.g. SUM, COUNT, AVERAGE, MIN, MAX etc
      • query flooded in the network and the responses are routed on the reverse path trees,
      • results aggregated across several nodes
      • E.g: to calculate AVERAGE each node returns (SUM, COUNT) values to parent
      • Can be a very common operator
distributed sensnet dbs
Distributed Sensnet DBs
  • How to represent devices in DBs on sensornets?
    • ADTs (Abstract Data Types)
    • Methods correspond to sensing functionality
    • Virtual Relations (VRs) store local data
    • Network used for query operations
virtual relation
Virtual Relation
  • VR with attributes as
  • Inputs to an ADT (device) function
  • Arguments to an ADT function
  • Output of the function
  • Timestamp of the function
virtual relation1
Virtual Relation
  • Some VR properties
    • records are never updated or deleted
    • is naturally partitioned over the sensnet (each device takes care of its set of VR records)
  • What does this mean? – a distributed DB
  • Records from the VRs (distributed over the devices) are processed using distributed query execution plans
approximate results
Approximate Results
  • Energy-efficiency can be achieved using approximate aggregates
  • Uniform sampling:
    • Tuples are uniformly sampled and the resulting average is assumed to represent the actual average
    • Packet loss might invalidate the statistical assumptions that these intervals depend on.
  • Logarithmic sampling
    • The number of respondents (or the size of memory needed for the count) scales logarithmically with the size of the network
    • Provides looser error bounds but uses significantly less memory or communication.
complex query evaluation
Complex query evaluation
  • R x S x T
    • What order to follow?
      • (RxS)xT or Rx(SxT) or (RxT)XS
    • Decided by query optimizer
      • Usually depends on table size
  • With Sensernret DB
      • Need adaptive policy to route tuples based on
        • Energy consumption
        • Topology
        • Loss rates
  • Explosion of data from sensor networks needs an infrastructure for access, storage etc
  • Organizing sensors
    • Datacubes
    • Other techniques ?
  • Identifying relevant sensors is preliminary to fetch data
    • Dataspace provided two solutions
    • Other approaches ?
  • Sensornets as Distributed DB
    • Provide a database view to sensornet data
    • Pros
      • App development easy
      • In-network processing helps resource usage
    • Cons
      • Distributed DB can be difficult
      • Requires to retool DB operations for sensornets
      • Other approaches?
representations for devices functions
Representations for Devices Functions
  • Internal Representation
  • We can’t use trad OO DB methods
  • - they all demand immediate access
  • - with asynchronous quality of sensnets this is unacceptable
  • Direction of sensor networks progress
    • Small form-factor devices
    • On-board computation
    • Wireless communication
    • Increased sensing capabilities
    • Improved OS and networking functionalities
  • Prediction:
    • Every device (> 1 $) will have some sensor
    • Ubiquitous presence of sensor networks
  • Typical sensor networks usage:
    • Sense, collect and convey data
    • Provides a ubiquitous computing platform
    • Applications query/monitor sensed data
      • Ecosystem dynamics
      • Temperature/weather sensing
      • Automobile traffic analysis
    • Data-centric network, generated data more important than node identity
  • Addressing
    • Identify relevant sensors
  • How to access/process data?
    • Communicate data and process centrally
    • Compute query at node and perform DB operations
  • Interface for querying/monitoring and control
what to do with data
What to do with data?
  • Answer queries/give useful info
  • How ??
    • Centralized approach
      • Communicate data
      • Store and process all data at central location (traditional DB approach)
      • Is all temporal data to be stored?
      • Communication overhead?
what to do with data1
What to do with data?
  • De-centralized approach
    • Communicate query (query routing)
    • Required data attribute of node
    • Node stores and communicates data to queries
    • Processing at node
    • Computation overhead
      • Computation overhead smaller than communication!
    • How to aggregate data?
    • How to route queries?
    • How to map nodes to addresses for communication purposes?
need for decentralization
Need for Decentralization
  • Centralized (Traditional databases)
    • Inefficient use of resources
      • Large amounts of data communicated to central location
      • All sensors send data all the time
    • Dissociates access to device from query load
    • Communication more expensive than computation
  • Decentralized (Distributed DBs)
    • Data on devices
    • In-network query processing
pipelining benefits
Pipelining Benefits
  • Provide streamed partial answers, hence, can enable query refinement
  • Schemes like ripple joins form a low energy approach to obtain approximate answers and can be used together with sampling