Online interval skyline queries on time series
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Online Interval Skyline Queries on Time Series PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Online Interval Skyline Queries on Time Series. Bin Jiang, Jian Pei. Outline. Problem Definition An On-the-fly Method Interval Skyline Query Answering Algorithm Online Interval Skyline Query Algorithm Radix Priority Search Tree A View-Materialization Method

Download Presentation

Online Interval Skyline Queries on Time Series

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Online interval skyline queries on time series

Online Interval Skyline Queries on Time Series

Bin Jiang, Jian Pei


Outline

Outline

  • Problem Definition

  • An On-the-fly Method

    • Interval Skyline Query Answering Algorithm

    • Online Interval Skyline Query Algorithm

      • Radix Priority Search Tree

  • A View-Materialization Method

    • Non-redundant skyline time series---NRSky[i:j]

  • Experiments


Problem definition

Problem Definition

  • Notions

    • Time Series: A time series s consists of a set of ( value, timestamp) pairs.Here we denote the value of s at timestamp I by s[i], and s as a sequence of values s[1],s[2],…

    • Time Interval: a range in time, denoted as [i : j]. We write

      if ; if .

Some Notions in This Paper


Problem definition1

Problem Definition

  • Interval Skyline

    • Given a set S of time series and interval[i:j], the interval skyline is the set of time series that are not dominated by any other time series in [i:j], denoted by

Suppose S={S1, S2, S3}

S1 and S2 are in Sky[16:22], while S3 is doninated by S2.

S2

S1

S3


Problem definition2

Problem Definition

  • Interval Skyline

    Property 1:If there exist timestamps k1,…,kl(i≤k1<…<kl≤j) such that

    and s is the only such a time series, then

    time series is in .


Problem definition3

Problem Definition

  • Problem Definition

    • Given a set of time series S such that each time series is in the base interval ,we want to maintain a data structure D such that any interval skyline queries in interval can be answered efficiently using D.

  • Methods

    • An On-The-Fly Method

      • Original Interval Skyline Query Algorithm

      • Online Interval Skyline Query Algorithm

    • A View-Materialization Method


Outline1

Outline

  • Problem Definition

  • An On-the-fly Method

    • Interval Skyline Query Answering Algorithm

    • Online Interval Skyline Query Algorithm

      • Radix Priority Search Tree

  • A View-Materialization Method

    • Non-redundant skyline time series---NRSky[i:j]

  • Experiments


An interval skyline query algorithm

An Interval Skyline Query Algorithm

  • Idea

    Using the maximum value and minimum value of the time series, we can determine the domination of some time series without checking the details.


An interval skyline query algorithm1

An Interval Skyline Query Algorithm

  • Algorithm

  • Set current Skyline Set Sky is null;

  • Sort the time series in a list L in the descending order of their maximum value;

  • Set the maximum value of the minimum value of the time series in Sky

  • For each time series s that satisfies in L, determine whether it can dominate or be dominated by time series in Sky; If it can not be dominated:

  • add it into Sky ;

  • delete its dominance in Sky ;

  • update ;

  • Return Sky;


An interval skyline query algorithm2

An Interval Skyline Query Algorithm

  • Example

Goal: compute the skyline in interval [2:3]

Steps:

1. s2->Sky, maxmin =1

2. s3->Sky, maxmin =2

3. s5->Sky, maxmin =4

4. s5->s1, s1 is discarded, maxmin =4

5. s4.min=3<4=maxmin, s4 is discarded.

Return Sky={s2,s3,s5}


An interval skyline query algorithm3

An Interval Skyline Query Algorithm

  • Disadvantage

    Checking the max value for each time series and the min[i:j] for the query interval [i:j] is costly.

  • Improvement Idea

  • Utilize Radix Priority Search Tree to maintain the min[i:j]

  • Use a sketch to keep the max value for each time series


Online interval skyline query algorithm

Online Interval Skyline Query Algorithm

  • Radix Priority Search Tree

    Radix Priority Search Tree is a two-dimensional data structure, a hybrid of a heap on one dimension and a binary search tree on the other dimension.

  • Advantages:

    • Insertion in O(h)

    • Deletion in O(h)

    • Query in O(h)

  • h: the height of the tree


Online interval skyline query algorithm1

Online Interval Skyline Query Algorithm

  • Radix Priority Search Tree

    • Build

      • Use the timestamps as the binary tree dimension X and the data value as the heap dimension Y;

      • Map W into a fixed domain of X, {0,1,...,w-1};

      • The height of the tree is O(logw)

    • Update →

      One insertion s[ ]

      One deletion s[ ]

      : the most recent timestamp


Maintain max values using sketches

Maintain max values Using Sketches

  • Sketches

    • A pair (v,t) is maintained if no other pair (v1,t1) such that v1>v, t1>t;

    • These pairs form the skyline of points in the interval;

    • The expected number of points in the skyline is O(logw);

    • With the sketches, finding the maximum value in W costs O(1) time ;

W=[1,3]

Sketches : (4,1),(3,2),(2,3)

W=[1,4]

Sketches : (5,4)


Online interval skyline query algorithm2

Online Interval Skyline Query Algorithm

  • Complexity

    • Space

      • Radix priority search tree O(w)

      • Sketch of the max values O(logw)

        Total: O(nw)

    • Time

      • Radix priority search tree O(logw)

      • Sketch of the max values O(logw)

        Total: O(nlogw)


Outline2

Outline

  • Problem Definition

  • An On-the-fly Method

    • Interval Skyline Query Answering Algorithm

    • Online Interval Skyline Query Algorithm

      • Radix Priority Search Tree

  • A View-Materialization Method

    • Non-redundant skyline time series---NRSky[i:j]

  • Experiments


A view materialization method

A View-Materialization Method

  • Non-redundant interval skylines

    A time series s is called a non-redundant skyline time series in interval [i:j] if

    • S is in the skyline in interval[i:j]

    • S is not in the skyline in any subinterval[i׳:j׳] [i:j]

      It can be proved by pigeonhole principle, if there are more than w skyline intervals, at least two of them will share the same starting timestamps, then one of them is not a minimum skyline interval.


Useful theories

Useful Theories


A view materialization method1

A View-Materialization Method

  • Idea

    Suppose all non-redundant interval skylines are materialized, we can union all these skylines over all intervals in [i:j] and remove those fail Lemma 2.

    • Algorithm


A view materialization method2

A View-Materialization Method

  • Example

W= [2:4]

Goal: compute the interval skyline in [3:4]

Steps:

1. s3->Sky

2. s4->Sky

3. s1->Sky(s2 is dominated by s1)

Return Sky={s1,s3,s4}

How to maintain the non-redundant skylines ?


Maintain non redundant interval skylines

Maintain Non-Redundant Interval Skylines

  • Steps


Maintain non redundant interval skylines1

Maintain Non-Redundant Interval Skylines

  • Step1

    • Use the on-the-fly algorithm to obtain the interval skyline in the new interval W׳.

    • Find possible false negatives .


Maintain non redundant interval skylines2

Maintain Non-Redundant Interval Skylines

  • Step2-Shared Divide-and-Conquer Algorithm

    • This algorithm is an extension of the divide-and conquer algorithm(DC).

    • In SDC, a space is defined as a time interval. Each timestamp represents a dimension.

    • The related spaces(intervals) are organized as a path, eg. [j:j],[j-1,j],...,[i,j](i<j).


Divide and conquer algorithm

Divide-and-Conquer Algorithm

Merge Step

Divide Step

S12

S22

B

B

S1

S2

B

P4

P4

P3

P3

P3

P1

P1

P1

mB

P5

P5

P5

P2

P2

P2

S11

S21

mA

mA

A

A

A


Sdc algorithm

SDC Algorithm

  • Comparisons

  • Results


Maintain non redundant interval skylines3

Maintain Non-Redundant Interval Skylines

  • Step3-Remove “redundant time series”


Outline3

Outline

  • Problem Definition

  • An On-the-fly Method

    • Interval Skyline Query Answering Algorithm

    • Online Interval Skyline Query Algorithm

      • Radix Priority Search Tree

  • A View-Materialization Method

    • Non-redundant skyline time series---NRSky[i:j]

  • Experiments


Experiments

Experiments

  • Parameters


Experiments1

Experiments

  • Synthetic Data Sets

    • Data Sets Properties

    • Query Efficiency


Experiments2

Experiments

  • Synthetic Data Sets

    • Update Efficiency

    • Space Cost


Experiments3

Experiments

  • Stock Data Sets

    • Query Time


Online interval skyline queries on time series

Q&A


  • Login