Data query and processing Dr. WenZhan Song Professor, Computer Science

Data query and processing Dr. WenZhan Song Professor, Computer Science

Goals of this chapter Information processing in sensor network is application specific and diversified, but very important due to limited energy and bandwidth. Here we give some examples: TinyDB Acquisitional Query Processor ALFC data compression

TinyDB: The Design of an Acquisitional Query Processor For Sensor NetworksACM SIGMOD 2003

Introduction and Motivation TinyDB Components Acquisitional Query Language Power-aware Optimization Power-sensitive Dissemination and Routing Processing Queries Summary Table of contents

Introduction and Motivation

Introduction • Query processing system for extracting information from a network of TinyOS sensors • Incorporate acquisitional techniques designed to minimize power consumption • Simple, SQL-like interface to specify the data you want to extract, along with additional parameters • Collects data from motes, filters it, aggregates it together, and routes it out to a PC

Introduction • To use TinyDB, install its TinyOS components onto each mote in the sensor network • simple Java API for writing PC applications that query and extract data from the network • simple graphical query-builder and result display that uses the API.

Introduction • Queries submitted in PC • Parsed, optimized in PC • Disseminated and processed in network • Results flow back through the routing tree A query and results propagating thru the network

Motivation • The primary goal of TinyDB is to allow data-driven applications to be developed and deployed much more quickly. • TinyDB frees you from the burden of writing low-level code for sensor devices, including the very tricky sensor network interfaces. • Acquire and deliver desired data while conserving as much power as possible

TinyDB Components

TinyDB Components • The system can be classified into two subsystems: 1. Sensor Network Software: • Sensor Catalog and Schema Manager • Query Processor • Memory Manager • Network Topology Manager

TinyDB Components 2. Java-based Client Interface: • A network interface class that allows applications to inject queries and listen for results. • Classes to build and transmit queries. • A class to receive and parse query results. • A class to extract information about the attributes and capabilities of devices. • A GUI to construct queries.

TinyDB Components • A graph and table GUI to display individual sensor results. • A GUI to visualize dynamic network topologies. • An application that uses queries as an interface on top of a network of sensors.

Acquisitional Query Language

Acquisitionnal Query Language Basic Language Features • Queries in TinyDB consist of a SELECT-FROM-WHREE clause supporting selection, join, projection, aggregation, sampling, windowing, and sub-queries via materialization points. • Sensor data is viewed as a single table with one column per sensor type. Tuples are appended to this table periodically, at well-defined sample intervals. • Query example: SELECT nodeid, light, temp FROM sensors SAMPLE INTERVAL 1s FOR 10s • Results of query stream to the root of the network in an online fashion, via the multi-hop topology, where they may be logged or output to the user. The output consists of a sequence of tuples and each tuple includes a timestamp.

Acquisitionnal Query Language • Sensors table is conceptually an unbounded, continuous data stream of values, certain blocking operations are not allowed over such streams unless a bounded subset of the stream, or window, is specified. • Joins are allowed between two storage points on the same node, or between a storage point and the sensors relation, in which case sensors is used as the outer relation in a nested-loops join. • Exampe: SELECT COUNT(*) FROM sensors AS s, recentlight AS rl WHERE rl.nodeid = s.nodeid AND s.light < rl.light SAMPLE INTERVAL 10s

Acquisitionnal Query Language • In the event that a storage point and an outer query deliver data at different rates, a simple rate matching construct is provided that allows interpolation between successive samples, or specification of aggregation function to combine multiple rows. • TinyDB includes support for grouped aggregation queries. It reduces the quantity of data that must be transmitted through the network. • When a query is issued in TinyDB, it is assigned an identifier that is returned to the issuer. This identifier can be used to stop a query , limite a query to run for a specific time period or include a stopping condition as an event.

Acquisitionnal Query Language Event-Based Queries • TinyDB supports events as a mechanism for initiating data collection. Events in TingDB are generated either by another query or the operation system. • Query example: ON EVENT bird-detect (loc): SELECT AVG ( light ), AVG ( temp ), event.loc FROM sensors AS s WHERE dist (s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s • Events are central in ACQP, as they allow the system to be dormant until some external conditions occurs, instead of continually polling or blocking on an iterator waiting for some data to arrive.

Acquisitionnal Query Language Lifetime-Based Queries • Specifying lifetime is a much more intuitive way for users to reason about power consumption. Because users are not concerned with small adjustments to the sample rate and how such adjustments influence power consumption, but every concerned with the lifetime of the network executing the queries. • To satisfy a lifetime clause, TinyDB preforms lifetime estimation. The goal of lifetime estimation is to compute a sampling and transmission rate given a number of Joules of energy remaining. • The rates a single node at the root of the sensor network are computed via a simple cost-based formula, considering the costs of accessing sensors, selectivities of operators, expected communication rates and current battery voltage.

Acquisitionnal Query Language • Example of a lifetime computation for simple queries of the form: SELECT a1, ... , a numSensors FROM sensors WHERE p LIFETIME l hours • 1st: determine the available power Ph per hour: • 2nd: compute the energy to collect and transmit one sample, including the costs to forward data for our children: • 3rd: compute the maximum transmission rate:

Acquisitionnal Query Language • Coordinate the transmission rate across all nodes in the routing tree: reason: sensors need to sleep between relaying of samples, it is important that senders and receivers synchronize their wake cycles. method: we allow nodes to transmit only when their parents in the routing tree are awake and listening which is usuall the same time they are transmitting.

Acquisitionnal Query Language • Problem arises: no control over the sample rate by user. reason: some applications require the ability to monitor physical phenomena at a particular granularity. method: allow an optional MIN SAMPLE RATE r clause to be supplied. If the computed sample rate for the specified lifetime is greater than this rate, sampling proceeds at the computed rate. Otherwise, sampling is fixed at a rate of r and the prior computation for transmission rate is done assuming a different rate for sampling and transmission. • Note: Need to periodically re-estimate power consumption.

Power-aware Optimization

Power-aware Optimization • Cost-based optimizer  lowest overall power consumption • The cost is dominated by sampling the physical sensors and transmitting query results rather than applying individual operators. • Focus on ordering joins, selections, and sampling operations.

Power-aware Optimization Metadata Management: • Each node in TinyDB maintains a catalog of metadata that describes its local attributes, events, and user-defined functions. • Periodically copied to the root of the network for use by the optimizer. Metadata fields kept with each attribute

Power-aware Optimization Ordering of Sampling and Predicates: • Sampling is often an expensive operation in terms of power. • The metadata information is used in query optimization to order the sampling and predicates. Energy costs of accessing various common sensors

Power-aware Optimization Ordering of Sampling and Predicates: Consider the query below: SELECT accel, mag FROM sensors WHERE accel > c1 AND mag > c2 SAMPLE INTERVAL 1s Compare the following options of ordering: 1. sample accelerometer and magnetometer, then apply the selection 2. sample magnetometer, apply selection over its reading first; then sample accelerometer 3. sample accelerometer, apply selection over its reading first; then sample magnetometer

Power-aware Optimization Event Query Batching to Conserve Power: • It is possible for multiple instances of the internal query to be running at the same time  power waste • Multi-query optimization technique based on rewriting to alleviate the burden of running multiple copies of the same identical query • The advantage of this approach is that only one query runs at a time no matter how frequently the events of type e are triggered. • For frequent event-based queries, rewriting them as a join between an event stream and the sensors stream can significantly reduce the rate at which a sensor must acquire samples.

Power-aware Optimization Event Query Batching to Conserve Power: ON EVENT e (nodeid) SELECT a1 FROM sensors AS s WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k SELECT s.a1 FROM sensors AS s, events AS e WHERE s.nodeid = e.nodeid AND e.type = e AND s.time – e.time <= k AND s.time > e.time SAMPLE INTERVAL d If this is called by multiple queries, multiple instances will run at same time and waste a lot of energy. Event and sensor stream join to reduce cost

Power-sensitive Dissemination and Routing

Power-sensitive Dissemination and Routing Motivation: • When each sensor hears a query, it must decide if the query applies locally or needs to be broadcast to its children in the routing tree. • If a node knows none of its children will ever satisfy the value of some selection predicate, it need not forward the query down the routing tree, which can save the costs of disseminating, executing, and forwarding results for the query.

Power-sensitive Dissemination and Routing Semantic Routing Tree (SRT): • allow each node to efficiently determine if any of the nodes below it will need to participate in a given query over some constant attributes. • An SRT is an index over constant attribute that can be used to locate nodes that have data relevant to the query.

Power-sensitive Dissemination and Routing How to use SRT: • When a query q with a predicate over A arrives at node n, n checks whether any child’s value of A overlaps the query range of A in q: • If yes, prepare to receive results and forward the query • If no, do not forward q • Is query q applied locally: • If yes, execute the query • If no, simply ignore it

Power-sensitive Dissemination and Routing Gray nodes must produce or forward results in the query

Power-sensitive Dissemination and Routing SRT Summary: • Provide an efficient mechanism for disseminating queries and collecting query results for queries over constant attributes. • Reduce the number of nodes that must disseminate queries and forward the continuous stream of results from children by nearly an order of magnitude.

Processing Queries

Processing Queries • Parent nodes have access to their children’s readings before aggregating. • Subdivide the epoch into fixed-length time intervals, assign nodes to intervals based on their position in the routing tree Communication Scheduling

Processing Queries • During each interval • Computing the partial state record • Child values • Local readings • Output the partial state record to the network • Information reach the root during interval 1 Aggregate Queries

Processing Queries Aggregate Queries Partial state records flowing up the tree during an epoch using interval-based communication.

Processing Queries Three prioritization Schemes: • Naive scheme • no tuple is considered more valuable than any other • FIFO • latest tuples are dropped if the queue is full • Winavg: • basic FIFO scheme • when the queue is full, two results at the head of the queue are averaged to make room for new tuple Policies for Selection Queries

Processing Queries Three prioritization Schemes: • Delta scheme • Relies on the intuition that the largest changes are probably interesting • Tuple with highest score is always delivered • Allowing out of order delivery • When queue is full, the tuple with the lowest change is discarded • Score is used to represent the change Policies for Selection Queries

Processing Queries • Setting : • Sample rate is faster than the maximum delivery rate • Environment : • Single mote running TinyDB • Winavg versus Delta scheme • Delta is closer to the original signal as it tends to emphasize the extremes • Winavg tends to dampen them Performance comparison

Processing Queries An acceleration signal (top) approximated by a delta (middle) and an average (bottom), K=4. Performance comparison

Processing Queries RMS Error for Different Prioritization Schemes and Signals(1000 Samples, Sample Interval = 64ms) RMS Error comparison

Processing Queries • Snooping • Allows nodes to locally suppress local aggregate values by listening to the neighboring nodes. • Example: Max aggregation • Node n hears the value a of a MAX query and compare it with local partial MAX. • If the neighboring a greater than partial a, it assigns partial MAX a low score or suppresses it together. • If the neighboring a less than partial a , it assigns partial MAX a high score. Policies for Aggregate Queries

Processing Queries Snooping reduces the data nodes must send in aggregate queries. Here node 2’s value can be suppressed if it is less than the maximum value snooped from nodes 3,4, and 5. Example of snooping

Processing Queries • When optimizing a query, transmission and sample rate are set. • considering network conditions, sample rates and lifetime • static decision • The Need for Adaptivity • Conditions of Network contention and power consumption are varying • Adaptive backoff : transmission and sample rate changes Adapting Rates

Processing Queries • Not safe to assume that network channel is uncontested • TinyDB reduces packets sent as channel contention rises Adapting Rates

Processing Queries • Compute a predicted battery voltage into processing a query • Compare the current voltage with the predicted voltage • Re-estimate the power consumption characteristics and re-run the life-time calculation Power Consumption

Processing Queries Measuring power consumptions

Data query and processing Dr. WenZhan Song Professor, Computer Science

Data query and processing Dr. WenZhan Song Professor, Computer Science

Presentation Transcript

Query Processing

Query Processing and Optimization Dr. Muhammad Shafique

Query Processing

Computer and Data Processing

Computer Networks and Data Processing

QUERY OPTIMIZATION AND QUERY PROCESSING

Query Execution Professor: Dr T.Y. Lin

Query Processing

Science Data Processing

Query Processing of XML Data

Query Processing and Query Optimization

Topology Control Dr. WenZhan Song Professor, Computer Science

Query Processing of XML Data

Routing, delivery and dissemination Dr. WenZhan Song Professor, Computer Science

Routing, delivery and dissemination Dr. WenZhan Song Professor, Computer Science

TinyOS Tutorial Dr. WenZhan Song Professor, Computer Science

Time Synchronization Dr. WenZhan Song Professor, Computer Science

Query Processing and Query Optmization

Query Processing

Computer and Data Processing

Localization Dr. WenZhan Song Professor, Computer Science