1 / 76

Spatial and Temporal Data Mining

Spatial and Temporal Data Mining. V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos). General Overview. Multimedia Indexing spatial access methods problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees.

vivian
Download Presentation

Spatial and Temporal Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)

  2. General Overview • Multimedia Indexing • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  3. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer spatial queries (like??)

  4. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  5. SAMs – Detailed Outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  6. SAMs - more detailed outline • R-trees • main idea; file structure • (algorithms: insertion/split) • (deletion) • (search: range, nn, spatial joins) • variations (packed; hilbert;...)

  7. R-trees z-ordering: cuts regions to pieces -> dup. elim. how could we avoid that? Idea: Minimum Bounding Rectangles

  8. R-trees [Guttman 84] Main idea: allow parents to overlap! => guaranteed 50% utilization => easier insertion/split algorithms. (only deal with Minimum Bounding Rectangles - MBRs)

  9. R-trees • eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page I C A G H F B J E D

  10. R-trees A D F G B E C H I J • eg., w/ fanout 4: P1 P3 I C A G H F B J E P4 P2 D

  11. R-trees P1 A H D F P2 G B E I P3 C J P4 • eg., w/ fanout 4: P1 P3 I C A G H F B J E P4 P2 D

  12. R-trees - format of nodes P1 A P2 B P3 C P4 • {(MBR; obj-ptr)} for leaf nodes x-low; x-high y-low; y-high ... obj ptr ...

  13. R-trees - format of nodes P1 A P2 B P3 C P4 x-low; x-high y-low; y-high ... node ptr ... • {(MBR; node-ptr)} for non-leaf nodes

  14. R-trees - range search? P1 A H D F P2 G B E I P3 C J P4 P1 P3 I C A G H F B J E P4 P2 D

  15. R-trees - range search? P1 A H D F P2 G B E I P3 C J P4 P1 P3 I C A G H F B J E P4 P2 D

  16. R-trees - range search Observations: • every parent node completely covers its ‘children’ • a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (i.e., no need for dup. elim.) • a point query may follow multiple branches. • everything works for any dimensionality

  17. SAMs - more detailed outline • R-trees • main idea; file structure • algorithms: insertion/split • deletion • search: range, nn, spatial joins • performance analysis • variations (packed; hilbert;...)

  18. R-trees - insertion P1 D A H F P2 G B E I P3 C J P4 • eg., rectangle ‘X’ P1 P3 I C A G H F B X J E P4 P2 D

  19. R-trees - insertion P1 H D A F P2 G B E I P3 C J P4 • eg., rectangle ‘X’ P1 P3 I C A G H F B X J E P4 P2 D X

  20. R-trees - insertion P1 D A H F P2 G B E I P3 C J P4 • eg., rectangle ‘Y’ P1 P3 I C A G H F B J Y E P4 P2 D

  21. R-trees - insertion P1 H D A F P2 G B E I P3 C J P4 • eg., rectangle ‘Y’: extend suitable parent. P1 P3 I C A G H F B J Y E P4 P2 D Y

  22. R-trees - insertion • eg., rectangle ‘Y’: extend suitable parent. • Q: how to measure ‘suitability’?

  23. R-trees - insertion • eg., rectangle ‘Y’: extend suitable parent. • Q: how to measure ‘suitability’? • A: by increase in area (volume) (more details: later, under ‘performance analysis’) • Q: what if there is no room? how to split?

  24. R-trees - insertion P1 D H A F P2 G B E I P3 C J P4 • eg., rectangle ‘W’ P1 P3 K I C A G W H F B J K E P4 P2 D

  25. R-trees - insertion • eg., rectangle ‘W’ - focus on ‘P1’ - how to split? P1 K C A W B

  26. R-trees - insertion • eg., rectangle ‘W’ - focus on ‘P1’ - how to split? P1 • (A1: plane sweep, • until 50% of rectangles) • A2: ‘linear’ split • A3: quadratic split • A4: exponential split K C A W B

  27. R-trees - insertion & split seed2 R • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1

  28. R-trees - insertion & split • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ • Q: how to measure ‘closeness’?

  29. R-trees - insertion & split • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ • Q: how to measure ‘closeness’? • A: by increase of area (volume)

  30. R-trees - insertion & split seed2 R • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1

  31. R-trees - insertion & split seed2 R • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1

  32. R-trees - insertion & split • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ • smart idea: pre-sort rectangles according to delta of closeness (ie., schedule easiest choices first!)

  33. R-trees - insertion - pseudocode • decide which parent to put new rectangle into (‘closest’ parent) • if overflow, split to two, using (say,) the quadratic split algorithm • propagate the split upwards, if necessary • update the MBRs of the affected parents.

  34. R-trees - insertion - observations • many more split algorithms exist (next!)

  35. SAMs - more detailed outline • R-trees • main idea; file structure • algorithms: insertion/split • deletion • search: range, nn, spatial joins • performance analysis • variations (packed; hilbert;...)

  36. R-trees - deletion • delete rectangle • if underflow • ??

  37. R-trees - deletion • delete rectangle • if underflow • temporarily delete all siblings (!); • delete the parent node and • re-insert them

  38. SAMs - more detailed outline • R-trees • main idea; file structure • algorithms: insertion/split • deletion • search: range, nn, spatial joins • performance analysis • variations (packed; hilbert;...)

  39. R-trees - range search pseudocode: check the root for each branch, if its MBR intersects the query rectangle apply range-search (or print out, if this is a leaf)

  40. R-trees - nn search P1 P3 I C A G H F B J q E P4 P2 D skip

  41. R-trees - nn search P1 P3 I C A G H F B J q E P4 P2 D skip • Q: How? (find near neighbor; refine...)

  42. R-trees - nn search skip • A1: depth-first search; then, range query P1 P3 I C A G H F B J E P4 q P2 D

  43. R-trees - nn search skip • A1: depth-first search; then, range query P1 P3 I C A G H F B J E P4 q P2 D

  44. R-trees - nn search skip • A1: depth-first search; then, range query P1 P3 I C A G H F B J E P4 q P2 D

  45. R-trees - nn search skip • A2: [Roussopoulos+, sigmod95]: • priority queue, with promising MBRs, and their best and worst-case distance • main idea:

  46. R-trees - nn search P1 P3 I C A G H F B J E P4 P2 D skip consider only P2 and P4, for illustration q

  47. R-trees - nn search skip best of P4 => P4 is useless for 1-nn worst of P2 H J E P4 q P2 D

  48. R-trees - nn search skip • what is really the worst of, say, P2? worst of P2 E q P2 D

  49. R-trees - nn search skip • what is really the worst of, say, P2? • A: the smallest of the two red segments! q P2

  50. R-trees - nn search skip • variations: [Hjaltason & Samet] incremental nn: • build a priority queue • scan enough of the tree, to make sure you have the k nn • to find the (k+1)-th, check the queue, and scan some more of the tree • ‘optimal’ (but, may need too much memory)

More Related