QED : An Efficient Framework for Temporal Region Query Processing

QED: An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National Taiwan University

Introduction • Dense Region Query • Data records are viewed as data points in the d-dimensional data space constructed by the d-attributes. • Locate the regions with higher density than their surroundings. Salary (*1000) Dense region Age

Grid-based Approach • The data space is divided into non-overlapping rectangular grids (cells). • Density of a cell: the percentage of data points contained in this cell Salary (*1000) Dense cell Maximal connected dense cells Dense region Age 0 10 20 30 40 50 60 70 80 90 100

Motivation • Previous research tends to ignore the time feature of the data. They execute queries over the entire database. • However, different dense regions may be discovered if different time periods are taken into consideration. (the density of a cell: ) • Discovering dense regions over different time intervals is crucial for users to get the interesting patterns hidden in data.

Example • Some dense regions may exist in certain time intervals but will not be discovered if taking all data records into account. • Middle-aged people: <A>: the number of customers in different time slots <B>: the number of middle-aged people in different time slots

Temporal Dense Region Query • Dense Region Discovery in the constrained time intervals. • E.g., each Sunday in May, • Time slots: • Derived by segmenting the data points with a time granularity, e.g. hour, week, month, etc. • For users to specify a variety of time periods of interest • Problem Definition: • Given a set of time slots, and the density threshold ρ, find the dense regions in the queried time slots.

QED Framework • Challenge • The queried time intervals are unknown in advance. • QED (Querying tEmporal Dense region) • Offline Maintaining Phase • Construct a summarized data structure, RF-tree, for each time slot • Online Clustering Phase • Answer various user queries based on the RF-trees

Temporal Dense Region Query W1 Combine Query Result W2 W3 Online query processing phase Offline maintaining phase QED Framework

Offline Maintaining Phase- Construct the RF-trees • Basic Idea: • A number of cells having nearly the density value can be summarized by their average density value. • Uniform Region • A region where the cells contained in it have nearly of the same density value region

Uniform Region • Entropy-based approach • Entropy of a region • Maximum entropy of a region • Uniform region

Example (Uniform Region) • Case 1: • Case 2: Region A Region A

Construct the RF-tree • Recursively partition the data space to find the uniform region • The leaf nodes will be of two cases: • A cell • A uniform region • RF (Region Feature):

Online Query Processing Phase • Step1: Combine the RF-trees of the queried time slots. • Step2: Execute the query on the combined RF-tree.

Step1: Combine the RF-trees • Three cases for combining the corresponding regions in two RF-trees. • Case 1 : Both are uniform regions • Case 2 : Both are non-uniform regions • Case 3 : Only one is a uniform region

Step2: Execute the query • All leaf nodes in the combined RF-trees are examined to discover the dense cells in the data space. • The leaf nodes will be of two cases: • A cell • A uniform region: compare the average density with the density thresholdρ • The leaf nodes containing dense cells will be put into a queue for further dense region discovery.

Conclusion • The problem of temporal dense region query is explored to discover dense regions in the queried time slots. • We also propose the QED framework to execute temporal dense region queries. • QED is advantageous in that various queries with different density thresholds and time slots can be efficiently supported by using the concept of time slot and proposed RF-tree.

References • Yi-Hong Chu, Kun-Ta Chuang, Ming-Syan Chen, QED: an Efficient Framework for Temporal Dense Region Processing, in Proc. of PAKDD, 2005. • W. Wang, J. Yang, and R. Muntz1997, STING: A Statistical Information Grid Approach to Spatial Data Mining, in Proc. of VLDB, 1997. • D,-S. Cho, B-H.Hong, and J.Max. Efficient Region Query Processing by Optimal Page Ordering. In Proc. of ADBIS-DASFAA, 2000.

Thank You~ Q & A

QED : An Efficient Framework for Temporal Region Query Processing