한국기술교육대학교컴퓨터 공학 김홍연 TinyDB : An Acquisitional Query Processing System for Sensor Networks.- Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong DKE
Introduction. • Acquisitional issues. • Query Optimization. • The significant costs of sampling sensors. • Query Dissemination. • The physical co-location of sampling and processing. • Query Execution. • Choices of when to sample. • Proposed. • Incorporate acquisitional techniques designed to minimize power consumption. • The structure of query is simple (or SQL-like). • Additional parameters. • Collects data from motes / Filters data / Aggregates data.
Introduction. • Basic architecture. • Queries are submitted at a powered PC, parsed, optimized and sent into the sensor network. • Disseminated and processed. • Results flowing back up the routing tree.
Introduction. • Features of TinyDB. • Declarative SQL-like query interface. • Metadata catalog management. • Multiple concurrent queries. • Network monitoring. • In-network, distributed query processing. • Extensible framework for attributes, commands and aggregates. • Goal. • The primary goal of TinyDB is to allow data-driven applications to be developed and deployed much more quickly.
Basic Language. • Basic Language Features. • Queries in TinyDB, as in SQL, consist of a SELECT-FROM-WHERE-GROUPBY clause • supporting selection, join, projection, and aggregation. • Sensor data is viewed as a single table (sensors) with one column per sensor type (temperature, humidity, light, …). • Tuples are appended to this table periodically, at well-defined sample intervals. • For example, • SELECT nodeid, light, tempFROM sensors SAMPLE INTERVAL 1s FOR 10s • Meaning : Report light and temperature readings once per second for 10 seconds.
Basic Language. • Materialization point. • Stored table in the nodes. • For example, • CREATE STORAGE POINT recentLight SIZE 8 AS (SELECT nodeid, light FROM sensors SAMPLE PERIOD 10s • Meaning : Store the latest eight light readings, doing one reading every 10 seconds. • DROP clause.
Basic Language. • Joins. • Joins are allowed between two materialization points or between a materialization point and the sensors table. • For example, • SELECT COUNT(*) FROM sensors AS s, recentLight AS r WHERE r.nodeid = s.nodeidAND s.light < r.lightSAMPLE PERIOD 10s • Meaning : Count the number of recent light readings (from 0 to 8 samples) that were brighter than the current reading. Each current reading collected during a time span of 10 seconds.
Basic Language. • Aggregation. • Aggregation can be performed on grouped values as in ordinary SQL. (Aggregation queries) • It reduces the quantity of data that must be transmitted through the network. • For example (using microphone sensors), • SELECT AVG(volume), roomFROM sensors WHERE floor = 6 GROUP BY room HAVING AVG(volume) > threshold SAMPLE PERIOD 30s • Meaning : Find the rooms on floor 6 where the average volume is over some threshold (during a time span of 30 seconds).
Basic Language. • Assign query. • When a query is issued in TinyDB, it is assigned an identifier that is returned to the issuer. • This identifier can be used to stop a query. • Except these case. • A query to run for specific time period. • Include a stopping condition as an event.
Event-Based Queries. • Event-Based Queries. • TinyDB supports events as a mechanism for initiating data collection. • Events in TinyDB are generated either by another query or by a lower-level part of the operation system. • For example, • ON EVENT bird-detect(loc): SELECT AVG(light), AVG(temp), event.locFROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE PERIOD 2s FOR 30s • Meaning : Every time a bird-detect event occurs, the query is issued from the detecting node and the average light and temperature are collected from nearby nodes once every 2 seconds for 30 seconds.
Event-Based Queries. • Events are central in ACQP, • Events are central in ACQP, as they allow the system to be dormant until some external conditions occurs, • instead of continually polling or blocking on an iterator waiting for some data to arrive.
Event-Based Queries. • Generating an event from a query. • For example, • SELECT nodeid, temp FROM sensors WHERE temp > threshold OUTPUT ACTION SIGNAL hot(nodeid, temp) SAMPLE PERIOD 10s • Meaning : Signal the event ‘hot’ whenever the temperature goes above some threshold (during a time span of 10 seconds).
Lifetime-Based Queries. • Goal. • To satisfy a lifetime clause, TinyDB performs lifetime estimation. • The goal of lifetime estimation is to compute a sampling and transmission rate given a number of Joules of energy remaining. • For example, • SELECT , …, FROM sensors WHERE LIFETIME hours • Steps. • Determine the available power Ph per hours: . • Compute the energy to collect and transmit on sample, including the costs to forward data for our children: • Compute the maximum transmission rate:
Types of Queries in Sensor Networks. • Network health queries. • Meta-queries over the network itself. • For example, • SELECT nodeid, voltage WHERE voltage < threshold FROM sensors SAMPLE PERIOD 10 minutes • Actuation queries. • Users specifies an external command that should be invoked in response to a tuple satisfying the query. • For example, • SELECT nodeid, temp FROM sensors WHERE temp > threshold OUTPUT ACTION power-on(nodeid) SAMPLE PERIOD 30s
Power-aware Optimization. • Using a simple cost-based optimizer. • To choose a query plan that will yield the lowest overall power consumption. • The cost of a plan. • The cost of a plan is dominated by the cost of • Sampling the physical sensors • Transmitting query results, • Rather than the cost of applying individual operators. • Focus on. • Optimizer focuses on ordering joins, selections, and sampling operations that run on individual nodes.
Power-aware Optimization. • Metadata Management. • Each node in TinyDB maintains a catalog of metadata the describes its local attributes, events, and user-defined functions. • This metadata is periodically copied to the root of the network for use by the optimizer.
Power-aware Optimization. • Ordering of Sampling and Predicates. • Sampling is often an expensive operation in terms of power. • Ifpredicate discards a tuple of the sensors table, then subsequent predicates need not examine the tuple. • The metadata information is used in query optimization to order the sampling and predicates. • Besides predicates in the WHERE clause, expensive sampling operators must also be ordered appropriately with respect to the SELECT, GROUP BY, and HAVING clauses.
Power-aware Optimization. • Ordering of Sampling and Predicates. • For example, • Consider the query below: • SELECT accel, mag FROM sensors WHERE accel > c1 AND mag > c2 SAMPLEINTERVAL 1s
Power-aware Optimization. • Event Query Batching to Conserve Power. • Itis possible for multiple instances of the internal query to be running at the same time. -> power waste. • Multi-query optimization technique based on rewriting to alleviate the burden ofrunning multiple copies of the same identical query. • The advantage of this approach is that only one query runs at a time no matter how frequently the events of type ‘e’ are triggered. • For frequent event-based queries, rewriting them as a join between an event stream and the sensors stream • can significantly reduce the rate at which a sensor must acquire samples.
Power-sensitive Dissemination and Routing. • Event Query Batching to Conserve Power.
Power-sensitive Dissemination and Routing. • Event Query Batching to Conserve Power.
Power sensitive dissemination and routing. • After the query has been optimized, • As each node hears the query, it must decide if the query applies locally and/or needs to be broadcast to its children in the routing tree. • If a query does not apply at a particular node, and the node does not have any children for which the query applies, • Then the entire sub-tree rooted at that node can be excluded from the query, • Saving the costs of disseminating, executing, and forwarding results for the query across several nodes, significantly extending the node’s lifetime. • Proposed a data structure. • Semantic Routing Tree (SRT).
Power sensitive dissemination and routing. • Semantic Routing Trees. • An SRT is a routing tree designed to allow each node to efficien-tly determine if any of the nodes below it will need to participate in a given query over some constant attributes. • Conceptually, an SRT is an index over constant attribute that can be used to locate node that have data relevant to the query. • When a query with a predicate over (constant attribute) arrives at a node , checks to see if any child’s value of overlaps the query range of in . • If so, it preparesto receive results and forwards the query. • If no child overlaps, the query is not forwarded. • Also, if the query also applies locally begins executing the query itself. • If the query does not apply at or at any of its children, it is simply forgotten.
Power sensitive dissemination and routing. • Figure 8. A semantic routing tree in use for a query. • Gray arrows : flow of the query down the tree. • Gray nodes must produce or forward results in the query.
ProcessingQueries. • Communication Scheduling & Aggregate Queries. • The basic idea is to Subdivide the epoch into a number of intervals, and assign nodes to intervals based on their position in the routing tree. • During a node’s interval, if it is aggregating, it computes the partial state record consisting of the combination of any child values. • After this computation, it transmits either its partial state record or raw sensor readings up the network.
Processing Queries. • Interval-based communication.
Processing Queries. • Prioritizing Data Delivery. • The system must decide if it should discard the overflow tuple, discard some other tuple already in the queue, or combine two tuples via some aggregation policy. • Policies for Selection Queries. • Naïve scheme. • No tuple is considered more valuable than any other. • FIFO. • Tuples are dropped if they do not fit in the queue. • Winavg scheme. • This works similarly, except that instead of dropping results when the queue fills, the two results at the head of the queue are averaged to make room for new results.
Processing Queries. • Policies for Selection Queries. • Delta scheme. • A tuple is assigned an initial score relative to its difference from the most recent value. • This scheme relies on the intuition that the largest changes(score) are probably interesting. • The tuple with the lowest score is evicted when the queue overflows. • Comparison. • Single mote running TinyDB. • The sample rate to be a fixed number faster than the maximum delivery rate.
Processing Queries. • Figure 13.
Processing Queries. • RMS (Root Mean Square) error.
Processing Queries. • Policies for Aggregate Queries. • Snooping. • This technique allows nodes to locally suppress local aggregate values by listening to the answers that neighboring nodes report and exploiting the semantics of aggregate functions. • For example : MAX aggregation. • Node hears the value some attribute of a MAX query and compare it with local partial MAX. • If the neighboring greater than partial , it assigns partial MAX a low score or suppresses it together. • If the neighboring less than partial , it assigns partial MAX a high score.
Processing Queries. • Policies for Aggregate Queries. • For example : MAX aggregation (Figure 14). • Here node 2’s value can be suppressed if it is less than the maximum value snooped from nodes 3, 4, and 5.
Processing Queries. • Adapting Rates. • When initially optimizing a query, TinyDB’s optimizer chooses a transmission and sample rate, • based on current network load conditions,and requested sample rates and life-times. • The need for adaptivity in two contexts: • Network contention & Power consumption. • Adaptive back-off : transmission and sample rate changes.
Processing Queries. • Adapting Rates.
Processing Queries. • Power Consumption. • Compute a predicted battery voltage for a time seconds. • Compare the current voltage to predicted voltage. • Re-estimate the power consumption characteristics of the device.