1 / 19

Optimization of Spatial Joins on Mobile Devices

Optimization of Spatial Joins on Mobile Devices. N. Mamoulis 1 , P. Kalnis 2 , S. Bakiras 3 , X. Li 2. 1 Department of Computer Science and Information Systems, University of Hong Kong. 2 Department of Computer Science, National University of Singapore.

xena
Download Presentation

Optimization of Spatial Joins on Mobile Devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Spatial Joins on Mobile Devices N. Mamoulis1, P. Kalnis2, S. Bakiras3, X. Li2 1 Department of Computer Science and Information Systems, University of Hong Kong 2 Department of Computer Science, National University of Singapore 3 Department of Electrical and Electronic Engineering, University of Hong Kong

  2. Restaurants Hotels Motivation • Users are equipped with a mobile device (eg. PDA) • Ad-hoc spatial queries • Combine data from remote servers “Find hotels which are within 500m of a seafood restaurant” • Servers do not collaborate with each other • The query is executed on the mobile device

  3. Cost • Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time. • Goal: Minimize the amount of transferred data.

  4. Mediators? Restaurants Hotels Mediator • Services may only allow end-user connections (eg., subscribers only) • Access through mediators may be more expensive • Requests are ad-hoc; existing mediators may not support them

  5. Solution • Integrate the statistics retrieval with the query processing phase • Ask aggregate queries to estimate the data distribution • Partition the space recursively to achieve sub-linear transfer cost • Choose the physical operator indepen-dently for each partition

  6. Related Work • Hash-based methods (eg. PBSM): require all data to be transferred • R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index • Mediators : • HERMES : Statistics from previous queries • DISCO, Garlic : Statistics during initialization • Tuckila : Optimize parts of the execution tree

  7. Operators • WINDOW query: return all objects intersecting a window w • COUNT query: return the number of objects intersecting w • ε-RANGE query: return all objects within range ε from a point p We do not have access to the internal indices!

  8. Hash based spatial join Each partition must fit in memory

  9. Recursive evaluation Retrieve statistics for each subpart

  10. Nested loop spatial join Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES

  11. Cost Model • TCP/IP: MTU = MSS + BH • c1: download |RW| objects from R and |Sw| objects from S and join them on the PDA • c2: download |RW| objects from R, send them as window queries to S and retrieve the results • c4: repartition w, retrieve detailed statistics and apply the algorithm recursively

  12. MobiJoin algorithm MobiJoin(w, |Rw|, |Sw|) if |Rw|=0 or |Sw|=0 then return compute c1, c2, c3, c4 cmin = min(c1,c2,c3,c4) if cmin = c4 then impose a regular grid over w for each cell w’ in w retrieve |Rw’| and |Sw’| MobiJoin(w’, |Rw’|, |Sw’|) else follow action specified by cmin

  13. Iceberg Spatial Semi-Join SELECT H.id FROM Hotels H, Restaurants R WHERE dist(H.location, R.location) ≤ ε GROUP BY H.id HAVING COUNT(*) ≥ m

  14. Experimental setup • Implementation • Server: Unix • Client: HP-Ipaq PDA (WiFi network, 400MHz RISC CPU, 64MB RAM, Windows Pocket PC) • Datasets: • Synthetic: 1K – 10K points, varying skew • Real: Roads and railways of Germany • Algorithms: • NLSP: Only nested loop spatial join • HBSJ: Only hash-based spatial join

  15. Varying the distance threshold ε PDA buffer = 5%

  16. Varying the data skew Uniform data => MobiJoin reduces to HBSJ

  17. Varying the PDA’s buffer size Packets Bytes Large buffer => HBSJ fails to prune the empty areas

  18. Iceberg queries Uniform data Skewed data Real dataset (35K) joins a synthetic dataset (1K)

  19. Conclusions • Distributed spatial joins on mobile devices • No mediator – non collaborative servers – limited set of supported operators • MobiJoin • Dynamically optimizes the entire process of statistics retrieval and query execution • Single ad-hoc query • Future work • Support multi-way spatial joins • Improve the accuracy of the cost model

More Related