Loading in 5 sec....

G eneral i zed S earch T reesPowerPoint Presentation

G eneral i zed S earch T rees

- 71 Views
- Uploaded on
- Presentation posted in: General

G eneral i zed S earch T rees

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Generalized Search Trees

J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21st Int’l Conf. On VLDB, Sep. 1995

Presented ByIhab Ilyas

- Motivation.
- Database Search Trees.
- Generalized Search Tree.
- Properties.
- Methods.
- Applications.

New Data types

Extending search trees to maximum flexibility

- New applications (Multimedia, CAD tools, document libraries…etc.)

- Specialized Search Trees
- Example: Spatial Search Trees ( R-Trees)
- Problem: New Applications implies new tree structure from scratch

- Search Trees For Extensible Data Types
- Example: Extending B+ to index any ordinal data
- Problem: Extending data but not the set of queries supported.

- A third direction for extending search trees
- Extensible both in data types supported and in the queries applied on this data.
- Allows new data types to be indexed in a manner that supports the queries natural to the data type.

- Unifies previously disparate structures for currently common data types.
- Examples: B+ and R trees can be implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications.

Key1 Key2 ….

- Canonical rough picture of database search tree

Internal Nodes

Leaf nodes (Linked List)

- Search Key: A search key may be arbitrary predicate that holds for each datum below the key.
- Search Tree: A hierarchy of categorizations, in which each categorization holds for all data stored under it in the hierarchy.

Definition: A GiST is a balancedmulti-way tree of variable fan-out between kM and M Where k is the fill factor.

With the exception of the root node that can have fan-out from 2 to M.

- Leaf nodes: (p,ptr)
- p: Predicate used as a search key.
- ptr: the identifier of some tuple of the database.

- Non-leaf nodes: (p,ptr)
- p: Predicate used as a search key.
- ptr: Pointer to another tree node.

- Every node contain between kM and M unless it is the root.
- For each index entry (p,ptr) in a leaf node, p holds for the tuple
- For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr.
- All leaves appear on the same level.

The ability of orthogonal classification.. Recall R-Tree

p holds for p1,p2

…. (p,ptr) …..

p’ holds for p1,p2

p’ p

Not Required

…. (p’,ptr’) …..

…. (p1,ptr1) …..

…. (p2,ptr2)

- Key Methods: the methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree.
- Tree Methods: Provided by the GiST, and may invoke the required key methods.

E is an entry of the form (p,ptr) , q is a query, P a set of entries

- Consistent(E,q): False if p^q guaranteed unsatisfiable, true otherwise.
- Union(P): returns predicate r that holds for all predicates in P
- Compress(E): returns (p’,ptr).
- Decompress(E): returns (r,ptr) where pr. This a lossy compression as we do not require p r

- Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2).
- PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here.

- Search: Controlled by the Consistent Method.
- Insert: Controlled by the Penalty and PickSplit.
- Delete: Controlled by the Consistent

R

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

New (q,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(q,ptr)

(p,ptr)

New (q,ptr)

Penalty = m

Penalty = n

m < n

Penalty =i

Penalty = j

j < i

Full.. Then split according to PickSplit

- GiST Over Z (B+ Trees)
- GiST Over Polygons in R2 (R Trees)

p here is on the form Contains([xp,yp),v)

- Consistent(E,q) returns true if
- If q= Contains([xq,yq),v): (xp<yq)^(yp>xq)
- If q= Equal (xq,v): xp xq <yp

- Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).

- Penalty(E,F)
- If E is the leftmost pointer on its node, returns MAX(y2-y1,0)
- If E is the rightmost pointer on its node, returns MAX(x1-x2,0)
- Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0)

- PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.

- Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x
- Decompress(E)
- if E is the leftmost key on a non-leaf node let x= - otherwise let x=E.p.x
- If E is the rightmost key on a non-leaf node let y= . If E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1

The key here is in the form (xul,yul,xlr,ylr)

- Query predicates are:
- Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))
Returns true if (xul1xul2) ^(yul1yul2) ^ (xlr1xlr2) ^ (ylr1ylr2)

- Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))
Returns true if (xul1xlr2) ^(yul1ylr2) ^ (xul2xlr1) ^ (ylr1yul2)

- Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))
Returns true if (xul1=xul2) ^(yul1=yul2) ^ (xlr1=xlr2) ^ (ylr1=ylr2)

- Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

- Consistent(E,q)
- p contains (xul1,yul1,xlr1,ylr1), and q is either Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2)
- Returns true if Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

- Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.

- Penalty(E,F)
- Compute q= Union(E,F) and return area(q) – area(E.p)

- PickSplit(P)
- Variety of algorithms are provided to best split the entries in a over-full node.

- Compress(E)
- Form the bounding rectangle of E.p

- Decompress(E)
- The identity function