G eneral i zed s earch t rees
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

G eneral i zed S earch T rees PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

G eneral i zed S earch T rees. J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21 st Int’l Conf. On VLDB, Sep. 1995. Presented By Ihab Ilyas. Topics. Motivation. Database Search Trees. Generalized Search Tree. Properties. Methods.

Download Presentation

G eneral i zed S earch T rees

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


G eneral i zed s earch t rees

Generalized Search Trees

J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21st Int’l Conf. On VLDB, Sep. 1995

Presented ByIhab Ilyas


Topics

Topics

  • Motivation.

  • Database Search Trees.

  • Generalized Search Tree.

  • Properties.

  • Methods.

  • Applications.


Motivation

New Data types

Extending search trees to maximum flexibility

Motivation

  • New applications (Multimedia, CAD tools, document libraries…etc.)


Before gist

Before GiST

  • Specialized Search Trees

    • Example: Spatial Search Trees ( R-Trees)

    • Problem: New Applications implies new tree structure from scratch

  • Search Trees For Extensible Data Types

    • Example: Extending B+ to index any ordinal data

    • Problem: Extending data but not the set of queries supported.


G eneral i zed s earch t rees

GiST

  • A third direction for extending search trees

  • Extensible both in data types supported and in the queries applied on this data.

  • Allows new data types to be indexed in a manner that supports the queries natural to the data type.


Gist cont

GiST (Cont.)

  • Unifies previously disparate structures for currently common data types.

    • Examples: B+ and R trees can be implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications.


Database search trees

Key1 Key2 ….

Database Search Trees

  • Canonical rough picture of database search tree

Internal Nodes

Leaf nodes (Linked List)


Search trees cont

Search Trees (cont.)

  • Search Key: A search key may be arbitrary predicate that holds for each datum below the key.

  • Search Tree: A hierarchy of categorizations, in which each categorization holds for all data stored under it in the hierarchy.


Generalized search tree

Generalized Search Tree

Definition: A GiST is a balancedmulti-way tree of variable fan-out between kM and M Where k is the fill factor.

With the exception of the root node that can have fan-out from 2 to M.


Gist cont1

GiST (Cont.)

  • Leaf nodes: (p,ptr)

    • p: Predicate used as a search key.

    • ptr: the identifier of some tuple of the database.

  • Non-leaf nodes: (p,ptr)

    • p: Predicate used as a search key.

    • ptr: Pointer to another tree node.


Properties

Properties

  • Every node contain between kM and M unless it is the root.

  • For each index entry (p,ptr) in a leaf node, p holds for the tuple

  • For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr.

  • All leaves appear on the same level.


Note on properties

The ability of orthogonal classification.. Recall R-Tree

Note on Properties

p holds for p1,p2

…. (p,ptr) …..

p’ holds for p1,p2

p’  p

Not Required

…. (p’,ptr’) …..

…. (p1,ptr1) …..

…. (p2,ptr2)


Gist methods

GiST Methods

  • Key Methods: the methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree.

  • Tree Methods: Provided by the GiST, and may invoke the required key methods.


Key methods

Key Methods

E is an entry of the form (p,ptr) , q is a query, P a set of entries

  • Consistent(E,q): False if p^q guaranteed unsatisfiable, true otherwise.

  • Union(P): returns predicate r that holds for all predicates in P

  • Compress(E): returns (p’,ptr).

  • Decompress(E): returns (r,ptr) where pr. This a lossy compression as we do not require p r


Key methods cont

Key Methods (Cont.)

  • Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2).

  • PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here.


Tree methods

Tree Methods

  • Search: Controlled by the Consistent Method.

  • Insert: Controlled by the Penalty and PickSplit.

  • Delete: Controlled by the Consistent


Example

R

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

New (q,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(p,ptr)

(q,ptr)

(p,ptr)

Example

New (q,ptr)

Penalty = m

Penalty = n

m < n

Penalty =i

Penalty = j

j < i

Full.. Then split according to PickSplit


Applications

Applications

  • GiST Over Z (B+ Trees)

  • GiST Over Polygons in R2 (R Trees)


B trees using gist

B+ Trees Using GiST

p here is on the form Contains([xp,yp),v)

  • Consistent(E,q) returns true if

    • If q= Contains([xq,yq),v): (xp<yq)^(yp>xq)

    • If q= Equal (xq,v): xp xq <yp

  • Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).


B trees using gist cont

B+ Trees Using GiST (Cont.)

  • Penalty(E,F)

    • If E is the leftmost pointer on its node, returns MAX(y2-y1,0)

    • If E is the rightmost pointer on its node, returns MAX(x1-x2,0)

    • Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0)

  • PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.


B trees using gist cont1

B+ Trees Using GiST (Cont.)

  • Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x

  • Decompress(E)

    • if E is the leftmost key on a non-leaf node let x= - otherwise let x=E.p.x

    • If E is the rightmost key on a non-leaf node let y= . If E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1


R trees using gist

R - Trees Using GiST

The key here is in the form (xul,yul,xlr,ylr)

  • Query predicates are:

    • Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

      Returns true if (xul1xul2) ^(yul1yul2) ^ (xlr1xlr2) ^ (ylr1ylr2)

    • Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

      Returns true if (xul1xlr2) ^(yul1ylr2) ^ (xul2xlr1) ^ (ylr1yul2)

    • Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

      Returns true if (xul1=xul2) ^(yul1=yul2) ^ (xlr1=xlr2) ^ (ylr1=ylr2)


R trees using gist cont

R – Trees Using GiST(Cont.)

  • Consistent(E,q)

    • p contains (xul1,yul1,xlr1,ylr1), and q is either Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2)

    • Returns true if Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

  • Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.


R trees using gist cont1

R – Trees Using GiST (Cont.)

  • Penalty(E,F)

    • Compute q= Union(E,F) and return area(q) – area(E.p)

  • PickSplit(P)

    • Variety of algorithms are provided to best split the entries in a over-full node.


R trees using gist cont2

R – Trees Using GiST (Cont.)

  • Compress(E)

    • Form the bounding rectangle of E.p

  • Decompress(E)

    • The identity function


  • Login