160 likes | 277 Views
This paper discusses advanced methodologies for processing multi-dimensional queries using B+-trees and R-trees. It delves into single and multi-attribute queries, providing practical examples such as querying student records based on height and weight. Techniques for query optimization are explored, including the use of minimum bounding rectangles (MBR) and spatial indexing strategies. The comparison of different methods for indexing, such as extending B+-trees and dynamic hashing, is presented to address issues like dead space and object storage. These insights aim to enhance the efficiency of multi-dimensional data retrieval systems.
E N D
File Processing : Multi-dimensional Index 2014, Spring Pusan National University Ki-Joune Li
Multi-Dimensional Index • Multi-Attributes Query vs. Single Attribute Query • Single Attribute : Only ONE attribute to specify query condition • Example : Find Students whose record is in [3.5, 4.5] • Multi-Attributes : Several attributes • Example : Find students whose height is greater than 180 cm and weight is less than 70 Kg • Each attribute corresponds to a dimension • Multi-Attribute Query : Multi-Dimensional Query
< 70 Result 180 Processing Multi-dimensional Queries • Example : Find students whose height > 180 cm and weight < 70 Kg • Method 1 : Using a B+-tree • Step 1 : Apply B+-tree to search student taller than 180 cm • Step 2 : Search students lighter than 70 Kg from the result of step 1 • Height and Weight or Weight and Height ?
< 70 < 70 180 = = Result Processing Multi-dimensional Queries • Method 2 : Using Two B+-trees • Step 1 : Result1 ← Students taller than 180 cm by B+-tree • Step 2 : Result2 ← Students lighter than 70 Kg by B+-tree • Step 3 : Result ← Result1 Result2 • Comparison of Method 1 and Method 2
Index for Height and Weight Processing Multi-dimensional Queries • Method 3 : Unified Index for Several Attributes • One index for several attributes • Multi-Dimensional Space • Two approaches • Extending B+-tree • Extending Dynamic Hashing Weight Height
Block Pointer Array Query block pointer . . . Height Weight Fixed Grid Method Fixed Variable Grid File Extending Hashing : Grid Approach
Extending Hashing : Grid File Directory (x1, y1) (x2, y2) Block Pointer Query
Problem 1: Dead Space No objects in this query area 5 block accesses Query Dead Space Empty space with no objects How to reduce dead space
Minimum Bounding Rectangle MBR(Minimum Bounding Rectangle) Query Only 1 Disk Access
Problem 2: Non-Point Object Where to store this object
(X1max , X2max ) (X1min , X2min) Minimum Bounding Rectangle • MBR (Minimum Bounding Box) • Two dimensional geometric simplification of objects • Not the Whole space, • only in the region occupied by objects
Extending B+-tree : R-tree • B+-tree vs. R-tree • B+-tree : Interval (1-D rectangle) • R-tree : Multi-Dimensional Interval (Rectangle) • R-tree : Rectangle B+-tree • Each Node • MBR (Minimum Bounding Rectangle) instead of Interval (or Delimiter) • No Linked-List for External Nodes • A certain amount of overlapping is indispensable
Extending B+-tree : R-tree • Example Root Query
New MBR Upward Split like B-tree • Split MBR in the case of overflow • Line sweeping : Compare Cost-X and Cost-Y Splitting Line
Good Split Bad Split Splitting Strategy • 50:50 Split • Instead of 50:50 split, other cost measures • Area, • Perimeter • Overlapping Area 1. Make them as COMPACT as possible 2. Preserve spatial proximity as possible
More Compact Newly Inserted Object Delete and Re-Insert this Re-Inserted Object R*-tree: An Improvement of R-tree • Re-Insertion Strategy on Overflow • Most Popular Index for Multi-Dimensional Index Overflow