spatial access methods query processing n.
Skip this Video
Loading SlideShow in 5 Seconds..
Spatial Access Methods & Query Processing PowerPoint Presentation
Download Presentation
Spatial Access Methods & Query Processing

Loading in 2 Seconds...

play fullscreen
1 / 46

Spatial Access Methods & Query Processing - PowerPoint PPT Presentation

  • Updated on

Spatial Access Methods & Query Processing. Matei Lunca GIA 2004 Richardson Van Oosterom - Advances In Spatial Data Handling. Inhoud. Extend RDMS for GIS/GIA Trees Query types The curse of dimensionality Approximate matches. Geographic Information Retrieval. Spatial Access Methods

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Spatial Access Methods & Query Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
spatial access methods query processing

Spatial Access Methods & Query Processing

Matei Lunca GIA 2004

Richardson Van Oosterom - Advances In Spatial Data Handling

  • Extend RDMS for GIS/GIA
  • Trees
  • Query types
  • The curse of dimensionality
  • Approximate matches
geographic information retrieval
Geographic Information Retrieval
  • Spatial Access Methods
    • Algoritmes voor opslaan en vinden van ruimtelijke gegevens; 3+-D met sterke relatie en dus niet via gewone structuren zoals B-Trees op te slaan
  • Query Processing
    • Datastructuur en DB zoekacties in deze context
    • GIS vragen zoals “buffer rond rivier”
extending rdms for gis gia
Extending RDMS for GIS/GIA
  • In GIS objects organized by location and extension in space
    • Because of arbitrary complexity of spatial objects access methods for 2D objects such as minimum bounding rectangles needed
  • Curse of dimensionality!
requirements of spatial access methods
Requirements of spatial access methods
  • Dynamic

Random access and queries must be supported

  • Space efficient

Complex spatial data can in many cases not be partitioned because of relations between objects, thus data blocks may be large and not fit In memory

  • Efficiency independent of operators/ distribution

For multiple DB storing different types of data to be joined

  • Compatible with concurrency
practical requirements
Practical requirements
  • Costs of computing and communicating data
  • Minimize external access costs (I/O)
    • Indexing = Trees
      • Pointers at leaves/nodes
      • Searching = going down tree
      • Fast for range queries
    • Hashing = address buckets
      • No ordering needed
challenges in indexing
Challenges in Indexing
  • Most DB support
    • B+-Trees
    • Hash tables
  • Few DB support
    • R-Trees
    • Region quadtree
  • Why is implementation so difficult?
    • Integration with query optimizer
    • Providing query operators that utilize the index
    • Cost model (efficiency known before implementation)
    • Concurrency control and recovery techniques
space driven vs data driven
Space Driven VS Data Driven
  • Space Driven Trees
    • Decomposition independent from data insertion order
    • Region quadtree
  • Data Driven Trees
    • Space decomposed based on input data
    • Point quadtree
    • K-D Tree
space data driven structures
Space/Data Driven Structures
  • Space driven structures – Grids
    • Twin grid file
      • Shuffles points between the primary and secondary file to minimize the total size
    • Multilayer grid file
      • Uses two or more grid files, storing objects in the first grid file where no splitting across hyperplanes is needed
  • Data driven structures - R-Tree
  • X-Tree
  • TR*-Tree
  • IQ, PX & MDX-Trees
  • PX-Tree
  • TV-Tree
  • VAM-Split Trees
trees x tree
Trees: X-Tree
  • Adapts R*-Trees to high dimensional data
    • Overlap-free split based on split history
    • R/R*-Trees lead to high overlap
      • diminish advantages of hierarchical partitions
    • When algorithm would lead to unbalanced directory the X-Tree omits the split and the node becomes a super node
      • Supernodes are nodes enlarged by a multiple of the block size that avoid splits that would result in an inefficient structure by linear scanning
trees x tree 2
Trees: X-Tree (2)
  • Dynamically use overlap-minimizing splits
  • Supernodes accessed sequentially if no good split decision found for a directory node
trees tr tree
Trees: TR*-Tree
  • Improved R*-Tree
    • Represent exact geometry spatial attributes
    • Reduce memory operations
    • Store components of 1 decomposed object
  • Internal node
    • Pointer child node
    • Minimum bounding rectangle of trapezoids in child
  • Leaf node
    • Trapezoids
trees tr tree 2
Trees: TR*-Tree (2)
  • Representation of Bavaria
trees iq px mdx trees
Trees: IQ-, PX- & MDX-Trees
  • IQ-Tree
    • Index structure for query processing in high-dimensional data spaces
    • Compresses data to improve query processing
  • PX-Tree & Multi-Disc X-Tree
    • Parallel access method
    • Short response time & high query throughput
trees tv tree
Trees: TV-Tree
  • R-Tree-like + varying length feature vector
  • Telescope vector
    • Divide attributes into
      • Those common to all subtree items
      • Those used for branching
      • Those ignored
  • Knowledge about the behaviour of single attributes (their selectivity) is necessary
trees vam split trees
Trees: VAM-Split Trees
  • VAM-Split R-Tree
  • VAM-Split KD-Tree
  • Static index structures
    • All objects must be available when index is created
  • Splits are performed at maximum variance value
  • Built in memory before permanently stored on disk
    • Size limited to the amount of (virtual) memory available
other trees
Other Trees
  • The Cell Tree
    • Levels of data split by arbitrary hyperplanes
    • Concave objects decomposed into convex pieces, which are indexed in every cell that they overlap
  • The K-D Tree
    • Levels of data are split along different dimensions into non-overlapping cells
    • Objects indexed in all cells they intersect
other trees 2
Other Trees (2)
  • Generalized BD Tree
    • Stores objects as hierarchy of minimum bounding boxes
  • The P-Tree
    • Hyperplanes split space hierarchically by polytopes

= multidimensional boxes with nonrectangular sides

    • R-Tree special case in which all polytopes are boxes
  • R-files
    • Divide space into hierarchy of nested boxes in which objects are indexed in lowest cell which contains them
cost models
Cost Models
  • Curse of dimensionality performance deteriorations
    • Cost model for query processing in high-dimensional data spaces for careful optimization of parameters of an index
  • Data space quantization
    • Data compression - VA File, IQ Tree
    • Reduce I/O by representing attributes in less bits
  • Page size
  • Dimension assignment
high dimensional data spaces massive data sets
High-dimensional data spaces & massive data sets
  • Exotic data, cardinality/dimensionality++
  • Terabyte, petabyte
    • Common problem: overfit the data
    • Common challenge: fit model/pattern robustly
  • Compression, statistics, stochastic analysis, discrete mathematics, harmonic analysis
  • Complexity & noisiness lead to constructing statistical/fuzzy models
the pyramid technique
The Pyramid-Technique
  • Maps data from D-dimensional space to 1D so B+-Trees can be used to manage data
    • Data space is divided into 2D pyramids
    • Pyramids partitioned into data pages of B+-Tree
  • No inverse transformation needed because data and D-dimensional key stored
the pyramid technique 2
The Pyramid-Technique (2)
  • Complex queries
    • Pyramid value calculated from query input
    • Querying the tree with this value
    • Result = D-dimensional points sharing pyramid value that must be scanned for the search item
  • Efficient query processing only in < 8 D
query processing
Query processing
  • Direct VS indirect spatial search
    • Direct = locating objects in an geographical area
    • Indirect = queries based on non-spatial attributes
      • Show geography complying non-spatial requirements
query processing steps
Query processing steps
  • Query input
  • Filter step
    • Spatial index
    • Candidate set
  • Refinement step
    • Load spatial extent
    • Test spatial extent
  • Hits/false drops
  • Query result output
query types
Query types
  • Point query/point-in-polygon query
    • Parameter: coordinates
    • What objects exists at these coordinates?
  • Window/range query
    • Parameter: region defined by coordinates
    • What objects are located in this region?
  • Distance and Buffer Zone queries
    • Parameters: buffer object and distance
    • What objects are there within given distance from buffer?
query types 2
Query types (2)
  • Path queries (network structure required)
    • Parameters: network locations
    • What is the shortest route from A to B?
  • Join and Range queries
    • Spatial objects and relationships
    • Spatial predicates: points, windows, buffers, paths
    • Overlaying roads and waterworks GIS layers and displaying the result according to relative height (river, bridge, aqueduct) is a spatial join
query types 3
Query types (3)
  • Feature approach – feature vectors
  • Neighborhood search
  • Spatial-Query-by-Sketch
    • Multimedia (2D) search instead of alphanumeric
similarity search
Similarity search
  • Approximate surface

by parametric functions

  • Assigning appropriate

class to query object

  • Section Coding – each polygon’s circumcircle is decomposed into sectors & normalized
  • Similarity = distance feature vectors
similarity search 2
Similarity search (2)
  • Shape Histograms (feature vectors!)
    • Bins = complete & disjoint cells of space
    • Shell Model
      • Concentric uniform shells around the center
      • Independent of rotation around the center
    • Sector Model
      • Distribute uniformly on surface (Voronoi)
special query types
Special Query Types
  • Spatial continuous queries
    • In dynamic environments continuous pooling necessary, because otherwise query results meaningless
    • Result, expiry time given current motion vector, and change that can cause expiration
  • Spatio-temporal queries
    • Spatiotemporal Database Systems (STDBS) track and presenting data about moving objects, such as GPS
    • Probabilistic models are also available that attempt to plot future values in order to give faster response
query pre processing
Query pre-processing
  • Pre-optimize index structure
    • With specific knowledge: if we use a TIN for river network studies, valleys are more important and could be stored at high nodes in tree
    • Avoid characteristic areas: don’t store exact geometry of a chasm, but no-go denomination
query processing strategies
Query processing strategies
  • Parallel searches (nice split)
    • In varying data structures
  • Shape-based strategy
    • Models the direction region
    • Converts processing of direction predicates into processing of topological operations between open shapes and closed geometry objects
    • Eliminates computation related to world boundary
  • Spatial context definieren/representeren
  • Space Driven VS Data Driven
  • Ieder toepassing zijn eigen techniek
    • Tree
    • Hashing
    • 3D histogram
    • Approximate/Fuzzy approach