This presentation is the property of its rightful owner.
1 / 21

Project Proposals PowerPoint PPT Presentation

Project Proposals. Simonas Šaltenis Aalborg University. Nykredit Center for Database Research Department of Computer Science , Aalborg University. Outline. An overview of the R-tree and the TPR-tree Project proposals: Update-Efficient TPR-tree Time-parameterized SS-tree. p6. Query.

Project Proposals

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Project Proposals

Simonas Šaltenis

Aalborg University

• Nykredit Center for Database Research

• Department of Computer Science, Aalborg University

Outline

• An overview of the R-tree and the TPR-tree

• Project proposals:

• Update-Efficient TPR-tree

• Time-parameterized SS-tree

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

p6

Query

R1

p7

p1

R1

R2

p4

R4

R3

p2

R5

p8

p3

R3

R4

R5

R6

R7

p5

p12

R2

R7

p3

p4

p11

p5

p9

p10

p8

p6

p7

p12

p13

p9

p1

p2

p13

p10

R6

p11

Pointers to data tuples

Spatial Indexing With the R-Tree

• Example

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

R-tree Properties

• Leaf entry = <n- dimensional point, rid >

• Non-leaf entry = < n- dim MBR, ptr to a child node >

• MBR – a Minimum Bounding Rectangle of all points in the subtreee pointed to by ptr

• R-tree is a balanced tree – all leaves are at same depth from root

• Through insertion and deletion algorithms, nodes are kept at least m% full (except root)

• m is usually chosen to be 40%.

• m is the minimum fill factor, depending on the workload the average fill factor is usually » 70%.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

R-Tree – a Grow-Post tree

Grow-Post Trees

Bounding predicate (BP) = something that describes entries in a subtree

• Building blocks of algorithms:

• Consistent(BP, Q) – returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q)

• PickSplit(node) – splits a page of entries into two groups

...

BP1

BP2

BPn

• Penalty(BP, E) – returns an estimate how “worse” BP becomes if E is inserted under it

…..

.

.

• Union( ) – computes a BP of a coleection of entries (in the R-tree, computes an MBR – minimum and maximum in all dimensions )

.

Internal Nodes

Leaf Nodes

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Range Query in R-trees

• Answering range query Q in R-trees

• Start at the root

• If current node is non- leaf, for eachentry <MBR, ptr>, if Consistent(MBR, Q) ,search subtree identified by ptr

• If current node is leaf, for each entry<E, rid>, if E overlaps Q, rid identifiesa point that overlaps Q

• Note: We may have to search several subtrees at each node!(In contrast, a B- tree equality search goes to just one leaf.)

• Worst-case performance O(n)!

• But in practice, R-trees exhibit good query performance for various data sets

• What about insertion and deletion?

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Insert Entry E<point, ptr>

• Insertion algorithm

• cn = root

• If cn is leaf stop.

• From all entries in cn choose the one ewith the smallest Penalty (e.BP, E). (In R-trees, choose an entry whose MBR needs leastenlargement tocover B; resolve ties by going to smallest area child)

• cn = e.ptr, go to3.

• Insert einto cn. Call PropogateUp (cn).

• PropogateUp(cn)

• If cn is overfull, call PickSplit(cn) to produce cn1 and cn2, replace cn’s old entry in its parent bye1 = Union(cn1), e2 = Union(cn2), callPropogateUp on cn’s parent.

• Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent.

• Create a new root with two entries whenever a root is split.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

R1

R2

p14

Heuristics for Penalty

• Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty.

p6

p7

p1

R1

R2

p4

R4

R3

p2

R5

p8

p3

R3

R4

R5

R6

R7

p5

p12

R7

p3

p4

p11

p5

p9

p10

p8

p6

p7

p12

p13

p9

p1

p2

p13

p10

R6

p11

Pointers to data tuples

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

R1

R2

Heuristics for Penalty

• Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty.

p6

p7

p1

R1

R2

p4

R4

R3

p2

R5

p8

p3

R3

R4

R5

R6

R7

p5

p12

R7

p3

p4

p11

p5

p9

p10

p8

p6

p7

p12

p13

p9

p1

p2

p14

p13

p10

p14

R6

p11

Pointers to data tuples

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Deletion in R-trees

• Delete entry E

• Using the search procedure,find a leaf cnwhere entry E is located

• Remove E from cn. Call PropogateUp(cn).

• PropogateUp(cn)

• If cn is underfull, deallocate the node cnremovecn’s entry in its parent, callPropogateUp on cn’s parent, and reinsert all cn’s entries or merge them into some other node

• Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent.

• No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Modeling Continuous Movement

• In conventional databases, data is assumed constant unless explicitly modified.

• With continuous movement, this is problematic.

• Outdated, inacurate data

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Modeling Continuous Movement

• In conventional databases, data is assumed constant unless explicitly modified.

• With continuous movement, this is problematic.

• Outdated, inacurate data

• Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions.

• We use linear functions to capture the present and future positions.

• Updates are necessary only when the parameters of the functions change.

• For example, given , the current and anticiapted, future position of a two-dimensional point can be described by four parameters.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

x

o3

6

5

4

o2

o1

3

2

o1

1

o4

t

1

2

3

4

5

6

Queries

• Type 1: objects that intersect a given rectangle at

• Type 2: objects that intersect a given rectangle sometime from to

• Type 3: objects that intersect a given moving rectangle sometime between and

• Wecan expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t1, t2 <= CT + W

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

• At any t > tcwe can get a valid R-tree: TPBR-tree(t) = R-tree

Time-Parameterized Rectangles

• The TPR-tree is based on the R-tree.

• Moving points are bounded with time-parameterized rectangles.

• Are bounding from now on.

• The R-tree allows overlap.

• The tree employs conservative bounding rectangles.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

5

5

5

5

5

5

4

4

4

4

7

7

7

7

7

7

4

4

6

6

6

6

6

6

2

2

2

2

2

2

1

1

1

1

1

1

3

3

3

3

3

3

Insertion: Grouping Points

• How to group moving points (Penalty and PickSplit)?

• The R-tree’s algorithms minimize characteristics of MBRs such as area, overlap, and margin.

• How does that work for moving points?

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

• We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals.

• What H to use?

• H depends on the update rate, and on how far queries may reach into the future (W).

where A(t) is, e.g., the area of an MBR

Insertion in the TPR-Tree

• The bounding rectangle characteristics (area, overlap, and margin) are functions of time.

• The goal is to minimize these for all time points from now to now+H.

• Minimizing the characteristics for time now + H/2 does not work (e.g., the area of a conservative bounding rectangle is not linear).

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Outline

• An overview of the R-tree and the TPR-tree

• Project proposals:

• Update-Efficient TPR-tree

• Time-parameterized SS-tree

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Update-Efficient TPR-tree

• Handling hyper-dynamic data

• 500,000 objects; on the average each object updates its positional info three times per hour

• => ~400 updates per second

• Update – deletion followed by an insertion

• Observations:

• Usually object’s positional information does not change too drastically in-between updates

• Most of the update cost is due to a search phase of a deletion (several paths down the tree may be followed)

• We assume that the object reports it’s previous positional information, so that we know what to delete.

• We need to spend I/Os on making bounding predicates as “tight” as possible, although we may be willing to sacrifice query performance

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

• Lazy Update R-tree (LUR-tree):

• Hash table (on object id’s) is used to access leaf pages directly (without the search phase of deletion).

• Update is one operation:

• Go to the hash table with an object’s id, and get the pointer to the leaf page

• Update the object’s information in this page or, if object’s information changed too “drastically”, insert it from the top of the tree using the normal insertion procedure

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Problems to Solve

• Problems (that you have to try to solve, refining and applying these ideas to the TPR-tree):

• How do we update bounding rectangles in ancestor nodes?

• Possible solution: hash table storing the full path from the root to the leaf

• When do we do a real insertion and when an update in place?

• What do we do when nodes are split/merged? (Can we spend so many I/Os maintaining our hash table?)

• Possible solution: Lazy updating of the hash table and use of pointers to split-off nodes as in R-link trees.

WIM workshop, Gl. Vrå Slot, December 6-8, 2001

Time-Parameterized SS-trees

• SS-tree – a Grow-Post tree, where bounding predicates are spheres:

• Good for Nearest Neighbor queries

• Compact description of a bounding predicate (independent of dimensionality)

• Project – explore time-parameterized SS-trees. Issues to be addressed:

• Writing the Consistent method

• Writing the Penalty method

• Experimentally comparing with TPR-tree for range queries and NN queries

WIM workshop, Gl. Vrå Slot, December 6-8, 2001