1 / 16

Efficient OLAP Operations in Spatial Data Warehouses

Efficient OLAP Operations in Spatial Data Warehouses. Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong. Motivating Scenario.

cmarrufo
Download Presentation

Efficient OLAP Operations in Spatial Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong

  2. Motivating Scenario The spatial dimension at the finest granularity consists of a set of regions (e.g., road segments in traffic supervision systems, areas covered by cells in mobile communication systems) The raw data provide the set of objects that fall in each region every timestamp (e.g., cars in a road segment, users serviced by a cell). Queries ask for aggregate data over regions that satisfy some spatio-temporal condition (find the current traffic in all areas in a 1km range around each hospital). Unlike traditional OLAP, there do not exist pre-defined hierarchies.

  3. The aggregate R-tree An R-tree with aggregate data for every entry. The same idea can be applied for other access methods (e.g, quadtrees). Other functions may be used (e.g., avg, max).

  4. Why keep spatiotemporal aggregate information For efficient query processing (e.g., the number of objects inside an area can be found by a window query instead of a spatial join). Aggregate information is all that we need/know for some applications (e.g., traffic systems record the number of cars in an area not their ids) Storing historical information about individual objects may raise privacy issues (having all locations of mobile phone users through history may be illegal) Although the actual data may be highly volatile and involve extreme space requirements, the summarized data are less voluminous and may remain rather constant for long intervals.

  5. aR-trees and OLAP operations The aR-tree corresponds to a lattice. There may be multiple dimensions.

  6. Query Processing- Single Window "find the total number of cars on all road segments inside a query window" • Start from the root of the aR-tree: for all entries one of the following three conditions may hold: • · The entry is disjoint with the query window; thus, the corresponding node cannot contain any cars contributing to the answer and is not retrieved. • · The entry is inside the query window in which case all aggregate information is stored with the entry and the corresponding node does not need to be accessed. • · The entry partially overlaps the query window in which case the corresponding node must be recursively followed.

  7. Query Processing - Multiple Windows "Find the total number of cars on road segments inside each city suburb" Without aR-trees, the query can be processed as a multiway spatial join (suburbs, cars, road segments). With aR-trees, it is processed as a pairwise join (suburbs, aR-tree). If the query windows (i.e., suburbs) fit in memory, we propose an extension of the single-window technique that considers all windows in parallel.

  8. Experimental Settings Tiger Dataset (130,000 road segments) We randomly selected 5,000 seed points which were located on roads. For each seed point, we generated a cluster with 250 points (i.e. car positions) with Gaussian distribution; therefore the total number of cars was 1.25M. The distribution of the queries follows the distribution of the roads

  9. Evaluation for Single-Window Queries Raw data approach: join the cars and streets datasets. Fact table approach: an R-tree indexes the fact table (i.e., similar to aR-trees, but no aggregate information in the intermediate nodes).

  10. Evaluation for Multiple-Window Queries aR-tree (single queries):a set of single-window queries processed using the single_aggregation algorithm of aR-trees. Fact table (join): join between the R-tree index of the fact table and the query windows which fit in memory. Fact table (single): indexed nested loops using the R-tree index of the fact table.

  11. Applications to spatio-temporal data Query: "find the total number of objects in the regions intersecting some window qs during a time interval qt"

  12. The aggregate 3DR-tree (a3DR-tree) Each entry has the form <r.MBR, r.pointer, r.lifespan, r.aggr[]>, that is, for each region it keeps the aggregate value and the time interval during which this value is valid. Whenever the aggregate information about a region changes a new entry is created. Advantage: the a3DR-tree integrates spatial and temporal dimensions in the same structure (and is, therefore, expected to be more efficient than column scanning for queries that involve both conditions) Disadvantage: it wastes space by storing the MBR each time there is an aggregate change

  13. The aggregate RB tree

  14. Query Example Find all objects in some region overlapping the query window qs during the time interval [1-3]

  15. The aggregate 3DRB-tree

  16. Conclusions and directions for future work Spatio-temporal OLAP very promising direction of work Incorporation of multi-version structures for dynamic dimensions Formalization - analysis of when aggregation multi-trees are preferable

More Related