1 / 24

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins. Wendy Osborn and Saad Zaamout. Outline. Introduction Related Work Algorithm Performance Evaluation Conclusion and Future Work. Spatial Data. Canadian Cow Country. *borrowed from www.mapquest.ca.

carney
Download Presentation

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy Osborn and SaadZaamout

  2. Outline • Introduction • Related Work • Algorithm • Performance Evaluation • Conclusion and Future Work

  3. Spatial Data Canadian Cow Country..... *borrowed from www.mapquest.ca

  4. Distributed Database Montreal Calgary Toronto *borrowed from docs.google.com

  5. Research Problem • Efficient processing of a distributed spatial query • Cost considerations: • data transmission • CPU • I/O

  6. Related Work • Spatial join • Kang et al. (2002) • Spatial semijoins • Tan, Ooi, Abel (1995, 2000) • Karam and Petry (2006) • Limitations • Two-site distributed spatial queries

  7. The Algorithm - Assumptions • Each site has one participating spatial relation • Each spatial relation has one spatial attribute • All MBRs in a relation are unique • relation cardinality = number of MBRs in relation • Each spatial relation is indexed by an R-tree

  8. Spatial Semijoin Implementation • “Project” spatial attribute from relation R • obtain (MBR,ID) pairs from leaf node of R-tree • Transmit spatial attribute to relation S • Perform semijoin RSA S • Transmit identifiers from RSA whose MBR qualifies in the query back to relation R

  9. Algorithm - Example R2 R3 800 200 R1 100 R4 600 QS

  10. Algorithm - Overview • Sort and group by spatial attribute cardinality • Transmit spatial attributes • Execute spatial semijoins • Transmit qualifying tuples to query site

  11. Algorithm – Stage 1 • All sites (i.e. relations) are sorted in ascending order of spatial attribute cardinality • Divided into two groups • P – the first n/2 sites • Q – the remaining n/2 sites

  12. Algorithm - Stage 2 • Transmit spatial attribute from sites in P to sites in Q in the following manner: • Spatial attribute with smallest cardinality in P sent to site with smallest cardinality in Q • Spatial attribute with next smallest cardinality in P sent to site with next smallest cardinality in Q • and so on…

  13. Algorithm Example P Q R4 R2 R1 R3 SA SA = MBR + ID

  14. Algorithm – Stage 3 • Spatial semijoin performed between spatial attribute and relation at each site in Q • Result: • set of tuples from relation that qualify in the semijoin • set of identifiers from spatial attribute whose MBRs qualify in the semijoin • Identifiers shipped back to originating site in P

  15. Algorithm Example P Q R4 R2 R1 R3 ID

  16. Algorithm – Stage 4 R2 R3 QT R1 QT R4 QT QT QS

  17. Performance Evaluation • comparison vs. naïve approach • six-site distributed spatial query • 100, 200, 400, 600, 800, 1000 tuples • each tuple has the following structure: • MBR, identifier, region name, population, line slope indicator

  18. Cost Calculations • Data sizes: • Character – 1 byte • Integer – 2 bytes • long integer and double float – 8 bytes • Cost of transmitting an identifier • cost(ID) = sizeof(int) • Cost of transmitting a spatial attribute value (MBR) • cost(MBR) = 4 * sizeof(double) + sizeof(int) • Cost of transmitting a tuple • cost(MBR) + 20 * sizeof(char) * sizeof(longint) + sizeof(int)

  19. Cost Calculations • Cost of performing a semijoin and transmitting tuples to query site: cost(X, Y, Z) = number_of_tuples(Y) * cost(MBR) + number_of_qualifiers(X) * cost(ID) + cost(tuple) + number_of_qualifiers(Z) * cost(tuple) • Calculated for all n/2 semijoins

  20. Two-site Query Test

  21. Four- and Six-site Query Test • For the six-site query – 100, 200, 400, 600, 800, 1000 • Optimized = 127,456 bytes • Naïve = 198400 bytes • %improvement = 36%

  22. Conclusions • For multiple-site queries, our algorithm outperforms the naïve approach in all cases • The greater the difference in relation sizes, the greater the reduction in data transmission

  23. Future Work • CPU and I/O costs • Evaluate two-site queries vs. existing strategies • A real distributed database • Development of more multi-site distributed spatial query processing strategies

  24. THANK YOU! ?

More Related