1 / 46

Spatial Access Methods & Query Processing

Spatial Access Methods & Query Processing. Matei Lunca GIA 2004 Richardson Van Oosterom - Advances In Spatial Data Handling. Inhoud. Extend RDMS for GIS/GIA Trees Query types The curse of dimensionality Approximate matches. Geographic Information Retrieval. Spatial Access Methods

brinda
Download Presentation

Spatial Access Methods & Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Access Methods & Query Processing Matei Lunca GIA 2004 Richardson Van Oosterom - Advances In Spatial Data Handling

  2. Inhoud • Extend RDMS for GIS/GIA • Trees • Query types • The curse of dimensionality • Approximate matches

  3. Geographic Information Retrieval • Spatial Access Methods • Algoritmes voor opslaan en vinden van ruimtelijke gegevens; 3+-D met sterke relatie en dus niet via gewone structuren zoals B-Trees op te slaan • Query Processing • Datastructuur en DB zoekacties in deze context • GIS vragen zoals “buffer rond rivier”

  4. Extending RDMS for GIS/GIA • In GIS objects organized by location and extension in space • Because of arbitrary complexity of spatial objects access methods for 2D objects such as minimum bounding rectangles needed • Curse of dimensionality!

  5. Requirements of spatial access methods • Dynamic Random access and queries must be supported • Space efficient Complex spatial data can in many cases not be partitioned because of relations between objects, thus data blocks may be large and not fit In memory • Efficiency independent of operators/ distribution For multiple DB storing different types of data to be joined • Compatible with concurrency

  6. Practical requirements • Costs of computing and communicating data • Minimize external access costs (I/O) • Indexing = Trees • Pointers at leaves/nodes • Searching = going down tree • Fast for range queries • Hashing = address buckets • No ordering needed

  7. Challenges in Indexing • Most DB support • B+-Trees • Hash tables • Few DB support • R-Trees • Region quadtree • Why is implementation so difficult? • Integration with query optimizer • Providing query operators that utilize the index • Cost model (efficiency known before implementation) • Concurrency control and recovery techniques

  8. Space Driven VS Data Driven • Space Driven Trees • Decomposition independent from data insertion order • Region quadtree • Data Driven Trees • Space decomposed based on input data • Point quadtree • K-D Tree

  9. Space/Data Driven Structures • Space driven structures – Grids • Twin grid file • Shuffles points between the primary and secondary file to minimize the total size • Multilayer grid file • Uses two or more grid files, storing objects in the first grid file where no splitting across hyperplanes is needed • Data driven structures - R-Tree

  10. Trees • X-Tree • TR*-Tree • IQ, PX & MDX-Trees • PX-Tree • TV-Tree • VAM-Split Trees

  11. Trees: X-Tree • Adapts R*-Trees to high dimensional data • Overlap-free split based on split history • R/R*-Trees lead to high overlap • diminish advantages of hierarchical partitions • When algorithm would lead to unbalanced directory the X-Tree omits the split and the node becomes a super node • Supernodes are nodes enlarged by a multiple of the block size that avoid splits that would result in an inefficient structure by linear scanning

  12. Trees: X-Tree (2) • Dynamically use overlap-minimizing splits • Supernodes accessed sequentially if no good split decision found for a directory node

  13. Trees: TR*-Tree • Improved R*-Tree • Represent exact geometry spatial attributes • Reduce memory operations • Store components of 1 decomposed object • Internal node • Pointer child node • Minimum bounding rectangle of trapezoids in child • Leaf node • Trapezoids

  14. Trees: TR*-Tree (2) • Representation of Bavaria

  15. Trees: IQ-, PX- & MDX-Trees • IQ-Tree • Index structure for query processing in high-dimensional data spaces • Compresses data to improve query processing • PX-Tree & Multi-Disc X-Tree • Parallel access method • Short response time & high query throughput

  16. Trees: TV-Tree • R-Tree-like + varying length feature vector • Telescope vector • Divide attributes into • Those common to all subtree items • Those used for branching • Those ignored • Knowledge about the behaviour of single attributes (their selectivity) is necessary

  17. Trees: VAM-Split Trees • VAM-Split R-Tree • VAM-Split KD-Tree • Static index structures • All objects must be available when index is created • Splits are performed at maximum variance value • Built in memory before permanently stored on disk • Size limited to the amount of (virtual) memory available

  18. Other Trees • The Cell Tree • Levels of data split by arbitrary hyperplanes • Concave objects decomposed into convex pieces, which are indexed in every cell that they overlap • The K-D Tree • Levels of data are split along different dimensions into non-overlapping cells • Objects indexed in all cells they intersect

  19. Other Trees (2) • Generalized BD Tree • Stores objects as hierarchy of minimum bounding boxes • The P-Tree • Hyperplanes split space hierarchically by polytopes = multidimensional boxes with nonrectangular sides • R-Tree special case in which all polytopes are boxes • R-files • Divide space into hierarchy of nested boxes in which objects are indexed in lowest cell which contains them

  20. Cost Models • Curse of dimensionality performance deteriorations • Cost model for query processing in high-dimensional data spaces for careful optimization of parameters of an index • Data space quantization • Data compression - VA File, IQ Tree • Reduce I/O by representing attributes in less bits • Page size • Dimension assignment

  21. High-dimensional data spaces & massive data sets • Exotic data, cardinality/dimensionality++ • Terabyte, petabyte • Common problem: overfit the data • Common challenge: fit model/pattern robustly • Compression, statistics, stochastic analysis, discrete mathematics, harmonic analysis • Complexity & noisiness lead to constructing statistical/fuzzy models

  22. The Pyramid-Technique • Maps data from D-dimensional space to 1D so B+-Trees can be used to manage data • Data space is divided into 2D pyramids • Pyramids partitioned into data pages of B+-Tree • No inverse transformation needed because data and D-dimensional key stored

  23. The Pyramid-Technique (2) • Complex queries • Pyramid value calculated from query input • Querying the tree with this value • Result = D-dimensional points sharing pyramid value that must be scanned for the search item • Efficient query processing only in < 8 D

  24. Query processing • Direct VS indirect spatial search • Direct = locating objects in an geographical area • Indirect = queries based on non-spatial attributes • Show geography complying non-spatial requirements

  25. Query processing steps • Query input • Filter step • Spatial index • Candidate set • Refinement step • Load spatial extent • Test spatial extent • Hits/false drops • Query result output

  26. Graphical Query Example

  27. Graphical Query Example

  28. Query types • Point query/point-in-polygon query • Parameter: coordinates • What objects exists at these coordinates? • Window/range query • Parameter: region defined by coordinates • What objects are located in this region? • Distance and Buffer Zone queries • Parameters: buffer object and distance • What objects are there within given distance from buffer?

  29. Query types (2) • Path queries (network structure required) • Parameters: network locations • What is the shortest route from A to B? • Join and Range queries • Spatial objects and relationships • Spatial predicates: points, windows, buffers, paths • Overlaying roads and waterworks GIS layers and displaying the result according to relative height (river, bridge, aqueduct) is a spatial join

  30. Query types (3) • Feature approach – feature vectors • Neighborhood search • Spatial-Query-by-Sketch • Multimedia (2D) search instead of alphanumeric

  31. Spatial-Query-by-Sketch Sketcho 1.1b

  32. Spatial-Query-by-Sketch Sketcho 1.1b

  33. Spatial-Query-by-Sketch Sketcho 1.1b

  34. Spatial-Query-by-Sketch Sketcho 1.1b

  35. Similarity search • Approximate surface by parametric functions • Assigning appropriate class to query object • Section Coding – each polygon’s circumcircle is decomposed into sectors & normalized • Similarity = distance feature vectors

  36. Similarity search (2) • Shape Histograms (feature vectors!) • Bins = complete & disjoint cells of space • Shell Model • Concentric uniform shells around the center • Independent of rotation around the center • Sector Model • Distribute uniformly on surface (Voronoi)

  37. Shape Histograms

  38. Special Query Types • Spatial continuous queries • In dynamic environments continuous pooling necessary, because otherwise query results meaningless • Result, expiry time given current motion vector, and change that can cause expiration • Spatio-temporal queries • Spatiotemporal Database Systems (STDBS) track and presenting data about moving objects, such as GPS • Probabilistic models are also available that attempt to plot future values in order to give faster response

  39. Query pre-processing • Pre-optimize index structure • With specific knowledge: if we use a TIN for river network studies, valleys are more important and could be stored at high nodes in tree • Avoid characteristic areas: don’t store exact geometry of a chasm, but no-go denomination

  40. Query processing strategies • Parallel searches (nice split) • In varying data structures • Shape-based strategy • Models the direction region • Converts processing of direction predicates into processing of topological operations between open shapes and closed geometry objects • Eliminates computation related to world boundary

  41. Approximate ? Search/Match

  42. Screenshots - LTRMP

  43. Hoofdpunten • Spatial context definieren/representeren • Space Driven VS Data Driven • Ieder toepassing zijn eigen techniek • Tree • Hashing • 3D histogram • Approximate/Fuzzy approach

  44. Multitude of Practical Uses

  45. Responsibility/Accountability?

  46. Vragen ?

More Related