第 4 讲移动数据分发

第4讲移动数据分发 §4.1 数据发送 Mobile Data Delivery §4.2 数据分发 -- Data Broadcast (Push) §4.3 数据索引 Data Indexing §4.4 数据分发-- Pull-based

Mobile Data Delivery • Two basic modes of data accessing mechanism: • Data dissemination: from server to a large population of clients. • Preferred in utilizing the high bandwidth downstream channel to serve many clients. • Can broadcast information of common interest, such as stock quotations, traffic conditions, special events, or available seats at a theater performance. • Dedicated data access: from server to individual client and from client back to server. • Conveyed over dedicated channels with limited bandwidth. • Client can query (dedicated data query) and update (dedicated data update) data items in the database. • Dedicated data accesses are more expensive. • Possible to handle a dedicated data query, such as querying for the availability of seats in a concert, through data dissemination. • Dedicated data update is still needed in M-commerce applications. • Clients make changes in database state for activities such as those to buy or sell a stock, or to reserve a seat in a show.

Mobile Data Delivery • Two modes for data dissemination: • Push • Server continuously pushes items over the broadcast channel. • Client tunes in the channel to wait for accessing the required data item. • Set of items broadcast depends on anticipated access need of clients. • Server-initiated. • More scalable with number of clients, but less flexible. • Implemented through the construction of a broadcast program. • Pull • Client requests for data items. • Server schedules the appropriate data for dissemination among a set of requested items ready for reply. • Client-initiated. • Less scalable with number of clients, but more flexible. • Implemented by online scheduling algorithm.

Mobile Data Delivery

Mobile Data Delivery • Data-oriented: consider on nature of data delivered. • Publication only • Data are sent to clients regardless of whether they are requested. • Clients can either filter them or ignore their presence. • Demand-driven • Only selected set of data items are sent to clients. • Clients make request to server and server sends them over dedicated channel or multiplexes them over broadcast channel. • Hybrid • Data can be published or delivered when needed. • Static allocation implies that specific data items are always published and specific items are always delivered on demand and this is decided beforehand. For example, weather report and traffic report are published, but expected delay of bus can be delivered on demand. • Dynamic allocation implies that the set of items for publishing may change over time.

Mobile Data Delivery • Mechanics-oriented: consider on mechanisms to deliver data. • Initiation • Initiated by client: closer to on-demand and good for small number of clients. • Initiated by server: closer to publication and good for large number of clients. • Schedulability • Event-driven: send data only upon occurrence of certain events, such as request or value change. • Schedule-driven: send data according to predetermined schedule. • Communication type • Unicast: send data to individual client (normally initiated by clients). • 1:N: send data to a collection of clients, either broadcast (to all) or multicast (to selected set).

Mobile Data Delivery • Organization-oriented: consider on how data is structured for clients. • Broadcast program: clients listen to all items on channel. • Organized program: data items organized together into a flat program (each selected item broadcast once in a cycle), non-flat (selected items broadcast with different frequency), randomized (all broadcast cycles are different). • Ad hoc program: corresponds to on-demand requests. Server may broadcast singular item to each query or batch answer to several queries together. • Selective tuning: clients should listen only to potentially useful items and doze to save energy. • Indexed: client looks at a broadcast index to determine what and when to listen to channel. Index can be either precise or imprecise. • Non-indexed: client and server agree on the time a resultant item will appear and client can doze and wake up accordingly. e.g., answer available after 200 ms.

Mobile Data Delivery • Bandwidth-oriented: consider on the allocation of bandwidth for different data types or mechanisms or structures. • Static allocation • Certain percentage of bandwidth (or number of channels) allocated for a publish and other for demand data type; or for server-initiated/client-initiated requests. • The allocation is determined at beginning. • Simple but not adaptive to change of need. • Dynamic allocation • Number of channels or amount of bandwidth for different need would change over time. • The changed need is determined by monitoring for data access pattern.

Mobile Data Delivery • Performance metrics • Responsiveness: how fast required data items are available. • Characterized by response time (moment when the first data item is available) and access time (moment when all data items are available). • Data affinity: how hungry is client for required data items. • A more comprehensive measure over response time and access time. • Characterized by aggregated data affinity (integral of missing data items over time). • Tuning efficiency: how long a client stays in active mode for tuning. • Characterized by the tuning time (total amount of time spent on listening to the channels in active mode). • Power efficiency: how much energy is consumed for requests. • Characterized by queries per watt or useful data per watt. • Packet efficiency: how well are channels bandwidth used for requests. • Characterized by queries per Hz or useful data per Hz. • Adaptivity: how system adjusts to system characteristics changes. • Definitely bad for static schemes (e.g. static channel allocation, static object classification, static broadcast program).

§4.2 Data Broadcast– push-based dissemination • A broadcast program consists of a set of data items, ordered in time of broadcast. • Periodic / aperiodic • Whether same broadcast program is repeated and cycle length is the same. • Pull-based dissemination is essentially aperiodic. • Equal-sized item / unequal-sized item • Whether data items are of the same size. • Flat program / non-flat program • Whether each included data item is broadcast only once or not. • Flat programs are easy to handle and index, but non-flat programs can help to reduce access time.

Data Broadcast • Flat broadcast program A B C D (Flat – All items are equally important) • Skewed non-flat broadcast program: broadcast items of higher access need more frequently A A A A B C D D • Regular non-flat broadcast program: disperse the repeated items evenly A B A D A C A D

Performance Metrics For a Regular Non-Flat Broadcast Program, the simplified forms for • The Average Waiting Time for an object is equal to the half of the broadcast period of this object. • The Overall Average Waiting Time is equal to the sum of products of the average waiting time of an object and its access probability The Overall Average Access Time is equal to The Overall Average Wait Time + 1 (A request will be processed in next time unit)

Data Broadcast • We would like to broadcast the following set of 33 items (item 1 to 33): • Group A are hot items. • Group B are ordinary items. • Group C are cold items. A B C

Data Broadcast • Generating a flat broadcast program. • Broadcast in the standard order. • 2 9 26 5 6 7 8 10 11 1 3 4 12 13 14 15 16 17 … 33 2 9 26 … • Performance. • Cycle length = 33 • The average waiting time = 16.5 • The average access time = 17.5 • The average tuning time = 17.5 • The same performance will be resulted if the broadcast program goes like • 1 2 3 4 5 6 7 8 9 10 … • Reason: • Access frequency or probability has not been taken into account. • Broadcast program is cyclic in nature.

Data Broadcast • Generating a regular non-flat broadcast program. • All items are spaced regularly (equal-spacing). • Wong’s Algorithm: Input: cycle length L and access probabilities q1 to qN (sorted in descending order) for N items. Output: broadcast schedule. select integer fi such that and for each pair i and j, the ratio fi / fj is close to for i = 1 to N do select integersi as close to L/fi as possible (spacing of items). for i = 1 to N-1 do assign copies of item i into the broadcast cycle positions with the objective of matching the equal-spacing criterion assign copies of item N to remaining slots in the broadcast.

Access Time = 10 +1 Data Broadcast • Assume that access probability for items in group A, B and C are qA = 4/21, qB = 1/21, qC = 1/168. • Thus qA/qB = 4, qA/qC = 32, qB/qC = 8. • Ratio of their square roots = root(4):root(32):root(8) = 2:5.66:2.83. • Choosing fA = 6, fB = 3, fC = 1, we have fA/fB = 2, fA/fC = 6, fB/fC = 3. This ratio 2:6:3 is close enough to the square root of the access frequency ratio above. • Cycle length L = 6x3+3x6+1x24=60. • Spacing of items (si) in A, B and C are 10, 20, 60. • The broadcast schedule: • What is the access time? Time • Note: you could choose other values for frequency fA, fB, fC, but the cycle length would be different. • Difficulty: hard to choose good frequencies.

Broadcast Disk • Item-based broadcast program is expensive to generate, especially when there are many items. • Regular programs respect the equally-spacing property, a necessary condition for optimal access time, with easier management. • Regular programs with close to optimal performance can be generated by collecting items of similar access probability into partitions. • Each partition is viewed as an individual disk. • Broadcasting hot partitions more often translates into a disk spinning with faster speed. • This is called the broadcast disk. • Put items into different disks with different spinning speed. • Multiplex multiple disks onto the same broadcast channel. • This is also called a multi-disk broadcast.

Broadcast Disk • Assuming equal-sized data items. • For simplicity, each item or each group of items is a page. • Acharya, Franklin and Zdonik’s Algorithm: • order the pages from hottest to coldest. • divide the pages into partitions, so that each partition contains pages with similar access probability. (Each partition is called a disk. Let there be Num_disk disks.) • select the broadcast frequency fi for each disk i. • compute Max_num_chunk = LCM of frequency. • compute Num_chunki = Max_num_chunk / fi. • divide disk i into Num_chunki chunks. • for i = 0 to Max_num_chunk-1 do • for j = 1 to Num_disk do broadcast (i mod Num_chunkj + 1)st chunk of disk j.

Broadcast Disk • Assume that there are three disks • with 2 , 3, and 4 chunks respectively. • The broadcast order would be • 1, 3, 6, 2, 4, 7, 1, 5, 8, 2, 3, 9, 1, 4, 6, 2, 5, 7, 1, 3, 8, 2, 4, 9, 1, 5, 6, 2, 3, 7, … 1 2 1 2 1 2 1 2 1 2 1 2 3 4 5 3 4 5 3 4 5 3 4 5 6 7 8 9 6 7 8 9 6 7 8 9

Access Time = 10.29 +1 Broadcast Disk • With the same example as before, access probability for items in group A, B and C are qA = 4/21, qB = 1/21, qC = 1/168. • Those 33 items are collected into three disks A, B and C. • Assume that the frequency is 4:2:1. • Maximum number of chunks on a disk = 4 (LCM of 4, 2, 1). • Number of chunks for disk A, B and C are 4/4=1, 4/2=2 and 4/1=4 respectively. • Each chunk in disk A, B and C contain 3/1=3 items, 6/2=3 items and 24/4=6 items respectively. • The broadcast schedule (L = 48): • What is the access time? chunk 1 of disk A

Access Time = 10 +1 Broadcast Disk • Repeating the example with access probability for items in group A, B and C being qA = 4/21, qB = 1/21, qC = 1/168. • Assume that the frequency is 6:3:1 instead. • Maximum number of chunks on a disk = 6. • Number of chunks for each disk are 1, 2 and 6. • Chunk sizes for A, B and C are 3, 3, 4. • Unused slots when a disk is not divisible well into whole chunks may be used for auxiliary information such as index. • The broadcast schedule (L = 60): • What is the access time? Time

Broadcast Disk – an improvement • Sorted data items in the broadcast is useful • E.g. to serve range queries, requesting for a number of items • A simple solution: • Sort the items in each partition or chunk in key order. • Associate a small description with each chunk on covered range. • Client can doze off for irrelevant chunks. • A better solution: • Superimpose the repeated (hot) items over the standard items. • Determine how many times each hot item is to be broadcast and the number of segments in the broadcast disk algorithm. • Sort the original set of all items based on key value. • Divide them into segments of approximately equal size. • Allocate the repeated items and put them into the appropriate segment, while trying to observe spacing property. • Result: client can determine which segments to listen to with range indicator of a segment by answering range queries.

Broadcast Disk – an improvement • Assuming 4:2:1 frequency, the initial partition (4 segments): • Remaining items (3 more times for items in A and 1 for B): • Final broadcast schedule (cycle length = 48): • All slots can be fully occupied and unfilled one can be skipped. Time

§4.3 Data Indexing • Indexing, indicating the moment data items are available • Data indexing is very important, especially in data broadcast • Selective tuning based indexing can save access cost • Loss of an index can cause a data item to be missed and another cycle needs to be waited for. • To reduce tuning time, an indexing structure is needed.

Methods in Indexing • Indexing on the air • The index is broadcasted over the broadcast channel • Clients can tune to get data or meta-data (index) • Regular broadcast programs can be associated with smaller indexing information. • Indexing by directory • The index is pre-loaded to a client • Simple and efficient (no overhead in broadcasting) • Problem: directory update? • Indexing by computing • The index can be computed by client, if • data items are equal-sized, consecutive key values (e.g., from 1 to N) and • the broadcast algorithm is known by clients.

Terms in Indexing • Index segment: • a collection of consecutive index pages or buckets. • Data segment: • a collection of consecutive data pages or buckets. • File: • concatenation of all data segments. • Bcast (broadcast cycle): • a file with all interleaving index and data segments. • Tuning time: • time a client spent listening to broadcast. • Access time: • time a client waits until all required data items are downloaded. • Probe time: time a client waits until the first indexing structure comes (i.e., knows the context of where-am-i). • Bcast wait: time spent since getting an index to getting all items. • The file is of constant length, but bcast length is a design parameter. • More index pages  shorter tuning time but that larger access time.

Terms in Indexing • Bucket information • ID/type: offset from start of bcast / data bucket or index bucket. • Bcast/index pointers: offset to start of next bcast / index segment. • Index of data: attribute value, offset pairs for index bucket. • Data access protocol • Initial probe: listen to channel / locate start of next index segment. • Index access: tune into index segment to get the start of required data segment or bucket. • Data access: tune into data bucket for the needed data item. index segment data segment

Simple Indexing • Associate a simple multilevel index (complete index) to all data items at the beginning of a file. • Client listens to index and look for data page. • Doze till the data page appears and tune for the item. • Disadvantage: • client must wait for index first and then tune for the data, • with expected access time of (L+Li) /2(cycle length).

B+-tree Indexing • Recall of B+-tree interior node leaf node

B+-tree Indexing Time Key (interior node) Pointer (index of key) Key at leaf or data

(1,m) Indexing • Improve on simple indexing by providing the complete index segment m times, interleaved in the bcast. • Client can tune for next index segment and then for data. • Expected access time is about L(m+1)/2m (e.g., m = 5). Time

(1,m) Indexing • Another replication of index. Time

(1,m) Indexing • An improved version. Time

Distributed Indexing • Observe that it is useless to provide index to data items that have already passed by in a broadcast in the middle part of bcast. • See the negative offsets in previous example. • Improve on (1,m) indexing by not having to replicate all parts of the index. • Index nodes are mixed within broadcast. • Index nodes are broadcast in depth first order. • One needs only to provide index for data being broadcast between current and next index segment. • Based upon B+-tree indexing scheme, with replication along the path from root to current branch of tree. • One may also perform no replication or full replication of path as variation.

Distributed Indexing • Add two more pointers to the front and back of each node. • Generalized B+-tree index and data node would be: next cycle a1 next replicated index node I y v1 v2 v3 x v3 <v1 <v2 <v3 b1 b2 b3 b4 v1 v2 v3 Data 1 next replicated index node a1 x Data for v1 y

Distributed Indexing • Each node/bucket contains either data or index. • For replicated nodes, add two additional pointers: (x, bcastptr) and (y, indexptr). • Items of key less than x have been missed and we should follow the bucket pointer to next broadcast. • Items of key greater than y is not in current branch of data segment, so go to next segment. • Items of key xKy can be found in current branch of data segment.

Distributed Indexing • An example with 243 data items. • 81 are shown. • 3 data items per node.

Example of Distributed Indexing • There are three branches without replication • I and a1, a2, a3 will be repeated three times for branchs rooted at b1, b2, …, b9. • 1st index bucket a1 (for branch b1) : X(1,first I of next bcast) , Y(81,second I of this bcast) • 2nd index bucket a1 (for branch b2), X (28,first I of next bcast) and (81,second I of this bcast Time

Data Access with Distributed Indexing • Tune to the current bucket, read the index pointer pairs and get offset to next index bucket. • Tune to beginning of the bucket and read index. • If item is missed, get first offset (x-ptr) to next bcast and doze, goto step 2. • If item is not in current segment, get second offset (y-ptr) to next higher level index and doze, goto step 3. • If item is in current segment, goto step 3. • Probe the index bucket and follow the multi-level pointers for the required data bucket. • Tune for the required data bucket for the item with key K.

Distributed Indexing • If initial probe is at replicated part, cases will be simpler. • Example to access item 45 with initial probe at index bucket a1. • Tune to a1, b2, c5, d15 and data bucket for item 45. • Example to access item 65 with initial probe at index bucket a1. • Tune to a1, b3, c8, d22 and data bucket for item 65. • If initial probe is at data or non-replicated part, we may miss the data sometimes. • Example to access item 45 with initial probe at data bucket 3. • Tune to data 3, pointing to second a1, b2, c5, d15 and data bucket for item 45. • Example to access item 25 with initial probe at data bucket 3. • Tune to data 3, pointing to second a1 (missed), next I (next cycle), a1, b1, c3, d9 and data bucket for item 25.

Fault-Tolerant Broadcast • Unreliable wireless channels can create major problems. • Missing a data item implies (likely) tuning for next broadcast cycle. • Missing an indexing structure would cause item to be missed. • Solution • Fault-tolerance to data • Broadcast items more frequently. • Broadcast items over several channels. • Use error correcting code. • Fault-tolerance to index • Replicate important parts of index.

Fault-Tolerance to Index • Naive solution by index replication • To duplicate the indexing structure on each index bucket. • For example, “hello” can be transmitted as “hheelllloo”. • This is effective assuming independent failure mode of bucket. • The problems are • The high overhead in doubling the index length. • Burst communication failures are often. • Desirable properties for a solution • Degree of replication should be small. • Re-accessing partial index should be minimized (ensure that existing good index received is useful). • Allow tuning to existing version of index bucket within the same bcast, not to the next replicated version.

Fault-Tolerance to Index • Inter-index replication schemes • Create redundant pointers to the next more complete indexing structure. • That is simpler to implement. • Violate the third property by having to tune to the next broadcast cycle. • Examples: all children replication and self replication. • Intra-index replication schemes • Create additional pointers to existing indexing structure. • More difficult to implement, since pointers must point forward. • Allow staying on same broadcast cycle to look for data. • Examples: extended tree-based indexing or distributed indexing.

A simple solution. For each indexing entry to a child, provide an additional pointer to the next version of the same child. For example, a pointer to child b1 will also provide an additional pointer to (b1)2 (next version of b1). Features: Partial index is not wasted: as search progresses down the index tree and jumps to the parallel version of a failed index at next level. That reduces tuning time. High cost of replication; Unable to handle two successive failures cannot get to index bucket within the same version All Children Replication (ACR)

Data Access Protocol of ACR • Tune for a good bucket and follow the offset to next index bucket. • Tune for the index bucket. If corrupted, repeat step 1. • Follow the intended index and keep the spare index. • If item missed, doze off for next bcast. • If item not in segment, doze off for next higher level index. • If item in segment, goto step 3. • Get the required index bucket. • If correct, doze off for the next index bucket down the level and keep the spare index. • If incorrect, use spare index to doze off for next version of same index bucket. Reset spare pointer to null. Repeat 3. • If out of spare pointer, goto step 1 for next bcast. • Tune in for the required data bucket. • If there is an error, use the spare pointer and repeat step 4. • If out of spare pointer, goto 1 for next bcast.

Instead of replicating the pointers to all children, just replicate the pointer to next version of myself. Features: Partial index is not wasted: as search progresses across the index tree and jumps to parallel version of a failed index at same level. cannot survive two successive failures. Indexing cost is lower, At the expense of one more tuning bucket upon first failure. cannot get to index bucket within the same version Self Replication Scheme (SR)

Tree-Based Index Replication • Make use of the tree structure to provide pointer replication. • Add the (maxvalue, pointer) pair in each index node (except the root). • Construct the pair (maxvalue, pointer) • The maxvalue of a node represents the largest data value that can be obtained for the subtree of the node. E.g. maxvalue of b1 is 27 maxvalue of c2 is 18 • Pointer: (see next slide)

Tree-Based Index Replication • Definitions • For a node, its ancestor’s maxvaule is greater than or equal to its maxvalue. • E_ancestors of a rightmost node: ancestors with the same maxvalue. • E.g. b1 and c3 are the E_ancestors of d9. • F_ancestor of a rightmost node: the E_ancestor furthest away (closest to the root) • Construct Pointers: • For each child, except the rightmost: pointing (the offset) to its right sibling. • For the rightmost child: pointing to the right sibling of its F_ancestor.

Tree-Based Index Replication • Data access protocol • Tune to current bucket and listen for an error-free bucket. • If it is an index bucket, get the next bucket pointer, doze for the bucket and repeat step 1. • If it is a data bucket, locate for the object. If unsuccessful, goto step 1. • Once you access a corrupted bucket, listen to the next bucket. • Since all index bucket contains a pointer to the next index bucket in the pre-order traversal, no data item will be missed, unless the bucket containing the required data item is also corrupted. • Much higher degree of fault-tolerance compared with two previous schemes, which cannot tolerate two successive failures.

Example: tuning for item 54 when first probe is at bucket d1. Error free: d1, second a1, b2, c6, d18, bucket containing 54. Bucket d1 and next bucket are both corrupted: d1, data bucket under d1, d2, second a1, c6, d18, bucket containing 54. Example: tuning for item 27 when first probe is at bucket d1. Error free: d1, d2, d3, c2, c3, d9, bucket containing 27 d1, second a1, next I, a1, b1, c3, d9, bucket containing 27. (not using additional pointers). d1 is corrupted: d1, data bucket under d1, d2, d3, c2, c3, d9, bucket containing 27. c2 is corrupted: d1, d2, d3, c2, d4, d5, d6, c3, d9, bucket containing 27 Tree-Based Index Replication

第 4 讲 移动数据分发