Temple University – CIS Dept. CIS616– Principles of Data Management - PowerPoint PPT Presentation

temple university cis dept cis616 principles of data management n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Temple University – CIS Dept. CIS616– Principles of Data Management PowerPoint Presentation
Download Presentation
Temple University – CIS Dept. CIS616– Principles of Data Management

play fullscreen
1 / 164
Temple University – CIS Dept. CIS616– Principles of Data Management
113 Views
Download Presentation
adam-carson
Download Presentation

Temple University – CIS Dept. CIS616– Principles of Data Management

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Temple University – CIS Dept.CIS616– Principles of Data Management V. Megalooikonomou Spatial Access Methods (SAMs) (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)

  2. General Overview • Multimedia Indexing • Spatial Access Methods (SAMs) • k-d trees • Point Quadtrees • MX-Quadtree • z-ordering • R-trees

  3. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  4. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer spatial queries (like??)

  5. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  6. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  7. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  8. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  9. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ within ε)

  10. SAMs - motivation • Q: applications?

  11. SAMs - motivation traditional DB GIS age salary

  12. SAMs - motivation traditional DB GIS age salary

  13. SAMs - motivation CAD/CAM find elements too close to each other

  14. SAMs - motivation CAD/CAM

  15. SAMs - motivation eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg 1 365 day

  16. SAMs: solutions • K-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees • (grid files) Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)

  17. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  18. k-d trees • Used to store k dimensional point data • It is not used to store region data • A 2-d tree (i.e., for k=2) stores 2-dimensional point data while a 3-d tree stores 3-dimensional point data, etc.

  19. 2-d trees – node structure • Binary trees • Info: information field • Xval,Yval: coordinates of a point associated with the node • Llink, Rlink: pointers to children • Properties (N: node): • If level N even -> • for all nodes M in the subtree rooted at N.Llink: M.Xval < N.Xval • for all nodes P in the subtree rooted at N.Rlink: P.Xval >= N.Xval • If level N odd -> • Similarly use Yvals

  20. 2-d trees – Example

  21. 2-d trees: Insertion/Search • To insert a node N into the tree pointed by T • If N and T agree on Xval, Yval then overwrite T • Else, branch left if N.Xval < T.xval, right otherwise (even levels) • Similarly for odd levels (branching on Yvals)

  22. 2-d trees – Example of Insertion Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Sinj

  23. 2-d trees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise either Tl (left subtree) or Tr (right subtree) is non-empty • Find a “candidate replacement” node R in Tl or Tr • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • Recursion guaranteed to terminate - Why?

  24. 2-d trees: Deletion • Finding candidate replacement nodes for deletion • Replacement node R must bear same spatial relation to all nodes in Tl and Tr as node N

  25. 2-d trees: Range Queries • Q: Given a point (xc, yc) and a distance r find all points in the 2-d tree that lie within the circle • A: Each node N in a 2-d tree implicitly represents a region RN – If the circle (specified by the query) has no intersection with RN then there is no point in searching the subtree rooted at node N

  26. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • z-ordering • R-trees

  27. Point Quadtrees • Represent point data • Always split regions into 4 parts • 2-d tree: a node N splits a region into two by drawing one line through the point (N.xval, N.yval) • Point quadtree: a node N splits a region by drawing a horizontal and a vertical line through the point (N.xval, N.yval) • Four parts: NW, SW, NE, and SE quadrants • Q: Quadtree nodes have 4 children?

  28. Point Quadtrees • Nodes in point quadtrees represent regions

  29. Point quadtrees - Insertion Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Tuzla Splitting of region by Sinj

  30. Point Quadtrees - Insertion

  31. Point quadtrees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-empty • Find a “candidate replacement” node R in one of the subtrees such that: • Every other node R1 in N.NW is to the NW of R • Every other node R2 in N.SW is to the SW of R • etc… • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • In general, it may not always be possible to find such as replacement node • Q: What happens in the worst case?

  32. Point quadtrees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-empty • Find a “candidate replacement” node R in one of the subtrees such that: • Every other node R1 in N.NW is to the NW of R • Every other node R2 in N.SW is to the SW of R • etc… • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • In general, it may not always be possible to find such as replacement node • Q: What happens in the worst case? May require all nodes to be reinserted

  33. Point quadtrees: Range Searches • Each node in a point quadtree represents a region • Do not search regions that do not intersect the circle defined by the query

  34. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  35. MX-Quadtrees • Drawbacks of 2-d trees, point quadtrees: • shape of tree depends upon the order in which objects are inserted into the tree • splits may be uneven depending upon where the point (N.xval, N.yval) is located inside the region (represented by N) • MX-quadtrees: shape (and height) of tree independent of number of nodes and order of insertion

  36. MX-Quadtrees • Assumption: the map is represented as a grid of size (2k x 2k) for some k • When a region gets “split” it splits down the middle

  37. MX-Quadtrees - Insertion After insertion of A, B, C, and D respectively

  38. MX-Quadtrees - Insertion After insertion of A, B, C, and D respectively

  39. MX-Quadtrees - Deletion • Fairly easy – why? • All point are represented at the leaf level • Total time for deletion: O(k)

  40. MX-Quadtrees –Range Queries • Same as in point quadtrees • One difference: • Checking to see if a point is in the circle defined by the range query needs to be performed at the leaf level (points are stored at the leaf level)

  41. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  42. z-ordering Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points(!!) Q1: why? A: Q2: how?

  43. z-ordering Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points (!!) Q1: why? A: B-trees! Q2: how?

  44. z-ordering Q2: how? A: assume finite granularity; z-ordering = bit-shuffling = N-trees = Morton keys = geo-coding = ...

  45. z-ordering Q2: how? A: assume finite granularity (e.g., 232x232 ; 4x4 here) Q2.1: how to map n-d cells to 1-d cells?

  46. z-ordering Q2.1: how to map n-d cells to 1-d cells?

  47. z-ordering Q2.1: how to map n-d cells to 1-d cells? A: row-wise Q: is it good?

  48. z-ordering Q: is it good? A: great for ‘x’ axis; bad for ‘y’ axis

  49. z-ordering Q: How about the ‘snake’ curve?

  50. z-ordering Q: How about the ‘snake’ curve? A: still problems: 2^32 2^32