1 / 19

Query Processing of Massive Trajectory Data based on MapReduce

Query Processing of Massive Trajectory Data based on MapReduce. Qiang Ma, Bin Yang ( Fudan University) Weining Qian , Aoying Zhou (ECNU) Presented By: Xin Cao (Aalborg University). Outline. Introduction Preliminary Trajectory Processing Execution Overview Storage Indexing Methods

eloise
Download Presentation

Query Processing of Massive Trajectory Data based on MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) WeiningQian, Aoying Zhou (ECNU) Presented By: Xin Cao (Aalborg University)

  2. Outline • Introduction • Preliminary • Trajectory Processing • Execution Overview • Storage • Indexing Methods • Query Processing • Experimental Study • Future Works

  3. Introduction • Location-based services are playing important roles. • Large volumes of diverse formats of trajectory data have been accumulated. • Traditional centralized technologies may not deal with the large amount of trajectories. • Cloud computing, such as GFS and MapReduce, provides a promising paradigm to conquer the explosion of trajectory data.

  4. Challenge • Huge volume, updates frequently, rapidly increasing. • Trajectory data is “continuous”, i.e. ordered sequentially. • Highly skewed. • MapReduce is good at offline data analysis, but not efficient for online query.

  5. Our Contributions • Extend the MapReduce framework to manage massive sequential data, such as trajectories of moving objects. • Study what kind of query processing methods are appropriate for large clusters. • Provide two scalable indexing methods to facilitate query processing efficiently.

  6. Preliminary • Data Model - line segments model • A polylinein three-dimensional space. • Query Types • Spatio-temporal Range Query: • Q(Es, Et) → {Sk} • Trajectory-based Query: • Q(O, Et) → {Sk}

  7. Trajectory Processing • Execution Overviews

  8. Storage • Data are grouped with key and organized in data chunks in GFS-style storage. • The whole data set is divided into several parts, and each part is called a partition and assigned to one data chunk to store. • Each trajectory data is assigned to at least one partition according to spatio-temporal information

  9. Storage • A good spatio-temporal partitioning makes the size of data per chunk is fairly uniform. • Static partitioning strategies are easy to control and suitable for distributed scheduling, but may lead to load imbalance. • Dynamic strategies can resolve load imbalance, but re-split data can cause distantly migration of large volume of data in clusters. • Appropriate strategies should be trained

  10. PMI (Partition based Multilevel Index) • Aim to speed up spatio-temporal range queries. • Generate all candidate partitions by invoking space partition strategy. • Store together as key/value. • <PartitionID, Sk> • Each data chunk only contains trajectory segments that belong to the same partition. • Multilevel index for each node can be built local. (using traditional centralized methods)

  11. OII (Object Inverted Index) • Aim to speed up trajectory based queries. • Collect each object's all historical trajectories. • Store together as key/value. • <OID, { PartitionID, T}> • Access according to key(object identifier).

  12. Data Insertion

  13. Query Processing • Query Processing • Trajectory based Queries • Given any object ID, the system can locate the object's trajectory according to OII. • Range Queries

  14. Experimental Study • Settings • Hadoop version 0.19.0 • 8 PC nodes • Ubuntu Linux version 8.04 • Pentium IV 1.7GHz CPU • 512M memory • Java SDK 1.42 • Experiment data: Network-based Generator

  15. Experiments – Load Balance Standard Deviation of Partitioning Load Balance of PRADASE

  16. Experiments – Data Importing and Index Creating Data Importing with PMI Data Importing with OII

  17. Experiments – Query Processing Spatio-temporal Range Query Processing with PMI Trajectory Base Query Processing with OII

  18. Future Works • More heuristic partitioning methods. • Reducing data migration between nodes. • Efficient real-time query processing on Cloud infrastructure.

  19. Thanks!

More Related