Trees for spatial indexing

Trees for spatial indexing Part 2 : SAMs

SAMs R-Tree R*-Tree X TV

Answering question • The Kd-Trie, is similar to kd-tree. In the article it was used for kd-tree. • The split-axis isn’t in the middle, but is choosen is the median point. • Because, we work with points, we have no problem is separating the elements.

UB-Tree range queries • Algorithm is : • Find all region who intersects q • IF this region is a page, all objects that intersects q is in the answer. • After that we search for the last subcube in this region and we search the brother, and if it intersects q we make the same loop on it. • After that we look the father of B and search again.

R-Tree • Special B+-Tree for spatial indexing. • The performance of the R*-Tree is decreasing with the dimensionality. • R-tree access method is prohibitively slow for dimensions higher than 5.

Problems of (R-Tree based) Index Structures • Because it has been shown that with the increasing of the dimensionality we have also more overlap. • Overlap is intuitively when for some point queries, we have multiple paths to search.

Definition of overlap • Intuitively, overlap is the pourcentage of the volume that is covered by more than one directory hyperrectangle. • This intuitive definition of overlap is directly correlated to the query performance. • Because it implies multiple paths.

Definition of the overlap (2) • Overlap = ||( Ui,j, i≠j Ri∩ Rj )|| / ||( Ui Ri )|| • We add all the intersection of the MBR in volume and we divide it by the union of all the MBR in volume. • But overlap in highly populated areas is much more critical than overlap in low population. • WeightedOverlap = |{ p|p Ui,j,i≠j Ri∩ Rj )}| / |(p|p Ui Ri )|

1 1 Overlap = (¼)/(2) = 1/8 = 12,5 % WeightedOverlap = (2)/(6) = 1/3 = 33 %

Overlap / WeightedOverlap • Depending the kind of data the the measurement can be different. • If we have uniformed distributed data points, we can use the overlap measure • In the case of real data, when can have clustering, so the weightedOverlap is more accurate.

X-Tree • Avoid overlap in the directory. • X-Tree hybrid of a linear array-like and a hierarchical R-Tree-like directory. • In low dimensions the most efficient organization of the directory is hierarchical organization. • For high dimensionality a linear organization is more efficient.

X-Tree • In the X-Tree we have 3 types of nodes : data nodes,normal directory, and supernodes. • The supernodes avoid splits in directory, so it’s more faster to search. • Not the same as R*-Tree with larger blocks, because it creates larger blocks only if necessary.

X-Tree Supernode Normal directory Data nodes

Creation of supernodes • They are only created if there is no other possibility to avoid overlap during insertion.

TV-Tree (Telescopic-Vector tree) • The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

TV-Tree • A m-contraction of x, is a sequence of • Amx where Am is a contraction matrix. • A natural Am is • ( 1 0 … 0 )( 0 1 0 … 0 )( …. )( 0 …. 0 1)

Multiple shapes • We can use for example a sphere, because it’s only a center and a radius r. Represents the set of points with euclidean distance ≤ r. • ~the euclidean distance is a special case of the Lp metrics with p=2. • For L1 metric (manhattan distance) it defines a diamond shape. • The TV-tree is working with any Lp-sphere.

Tv-Tree principle • So the TV treats the attributs asymmetrically favoring the first few features over the rest. • TV-Tree can use any type of MBR (minimum bounding region), rectangle,cube,sphere etc. • TV-Tree can use any Lp-Sphere

TV-Tree node structure • Each node is represents the MBR of all it’s descendents ( say an Lp-sphere ). • Each region is represented by a center which is a telescopic-vector and a radius. • So we talk about TMBR.

TV-1-Tree example

TV-2-Tree example

TMBR Act. Dim : y Act. Dim : x,z Act. Dim : z Act. Dim : x,y Act. Dim : x

What is the best number of active dimensions ? • They find out that the best number of active dimensions was two

TV-Tree conclusion • We accept overlap, so also multiple path to search. • Branch choosen for new point is done with the following criteria :

Trees for spatial indexing