mining time series databases
Download
Skip this Video
Download Presentation
Mining Time-Series Databases

Loading in 2 Seconds...

play fullscreen
1 / 19

Mining Time-Series Databases - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Mining Time-Series Databases. Mohamed G. Elfeky. Introduction. A Time-Series Database is a database that contains data for each point in time. Examples: Weather Data Stock Prices. What to Mine?. Full Periodic Patterns

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mining Time-Series Databases' - jeneil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction
Introduction
  • A Time-Series Database is a database that contains data for each point in time.
  • Examples:
    • Weather Data
    • Stock Prices
what to mine
What to Mine?
  • Full Periodic Patterns
    • Every point in time contributes to the cyclic behavior of the time-series for each period.
    • e.g., describing the weekly stock prices pattern considering all the days of the week.
  • Partial Periodic Patterns
    • Describing the behavior of the time-series at some but not all points in time.
    • e.g., discovering that the stock prices are high every Saturday and small every Tuesday.
mining partial periodic patterns
Mining Partial Periodic Patterns
  • Problem Definition
  • Methods
    • Apriori
    • Max-Subpattern Hit Set

Jiawei Han, Guozhu Dong, and Yiwen Yin – ICDE98

problem definition
Problem Definition
  • The time-series is: S = D1 D2 … Dn
  • A pattern is: s = s1 … sp over the set of features L and the letter *.
  • |s| = p is the period of the pattern s.
  • L-length of s is the number of si that is not *.
  • If s has L-length j, it is called a j-pattern.
  • A subpattern of s is: s’ = s’1 … s’psuch that for each position i: s’iis a * or subset of si.
problem definition cont
Problem Definition (Cont.)
  • Each segment of the form Di|s|+1 … Di|s|+|s|is called a period segment.
  • A period segment matchess if for each position j, either sjis * or subset of Di|s|+j.
  • The frequency count of s in a time-series S is the number of period segments of S that matches s.
  • The confidence of s is defined as the division of its frequency count by the maximum number of periods of length |s| in S.
  • A pattern is called frequent if its confidence not less than a minimum threshold.
problem definition example
Problem Definition (Example)
  • The pattern: a*{a,c}de is of length 5 and of L-length 4 and so it is called 4-pattern.
  • The patterns: a*{a,c}** and **cde are subpatterns of the above pattern.
  • In the series a{b,c}baebaced, the pattern: a*b, whose period is 3, has frequency count 2. Its confidence is 2/3 where 3 is the maximum number of periods of length 3.
apriori method
Apriori Method
  • Apriori Property:

Each subpattern of a frequent pattern of period p is itself a frequent pattern of period p.

  • Method:
    • Find F1, the set of frequent 1-patterns of period p.
    • Find all frequent i-patterns of period p, for i from 2 to p, based on the idea of Apriori, and terminate when the candidate i-pattern set is empty.
max subpattern hit set method
Max-Subpattern Hit Set Method
  • Definitions
  • Algorithm
  • Implementation Data Structure
definitions
Definitions
  • A candidate max-patternCmax is the maximal pattern which can be generated from F1 (the set of frequent 1-patterns).
  • Example:
    • If F1 = {a***, *b** , *c** , **d*},
    • Then Cmax = a{b,c}d*
definitions cont
Definitions (Cont.)
  • A subpattern of Cmax is hit in a period segment Si if it is the maximal subpattern of Cmaxin Si.
  • Example:
    • For Cmax = a{b,c}d* and Si = a{b,c}ce,
    • The hit subpattern is: a{b,c}**
  • The hit setH is the set of all hit subpatterns of Cmax in S.
algorithm
Algorithm
  • Scan S once to find F1 and form the candidate max-pattern Cmax.
  • Scan S again, and for each period segment, add its max-subpattern to the hit set setting its count to 1 if it is not exist, or increase its count by 1.
  • Derive the frequent patterns from the hit set.
implementation data structure
Implementation Data Structure

Max-Subpattern Tree

  • The root node is: Cmax.
  • A child node is a subpattern of the parent node with one non-* letter missing. The link is labeled by this letter.
  • A node containing only 2 non-* letters have no children since they are already in F1.
  • Each node has a count field which registers its number of hits.
max subpattern tree example

10

d

a

b

c

0

50

40

32

acd*

abd*

a{b,c}**

*{b,c}d*

a

d

a

d

b

b

c

b

b

c

d

a

2

18

8

0

5

19

*bd*

*{b,c}**

a*d*

ac**

ab**

*cd*

Max-Subpattern Tree (Example)

a{b,c}d*

max subpattern tree construction
Max-Subpattern Tree (Construction)
  • Finding w the max-subpattern in the current segment.
    • Search for w in the tree, starting from the root and following the path corresponds to the missing non-* letters in order.
    • If the node w is found, increase its count by 1. Otherwise, create a new node w (with count 1) and its missing ancestors in the followed path (with count 0).
max subpattern tree construction1
Max-Subpattern Tree (Construction)

*cd*

0

a{b,c}d*

a

0

*{b,c}d*

b

1

*cd*

max subpattern tree traversal
Max-Subpattern Tree (Traversal)
  • After the second scan, the tree will contain all the max subpatterns of the time-series.
  • Now the tree must be traversed to compute the confidence value of each subpattern.
max subpattern tree traversal1
Max-Subpattern Tree (Traversal)
  • The frequency count of each node is the sum of its count and those of all its reachable ancestors.
  • For Example:
    • The frequency count of *cd* is 78.
    • The frequency count of a*d* is 105.
max subpattern tree example1

10

d

a

b

c

0

50

40

32

acd*

abd*

a{b,c}**

*{b,c}d*

a

d

a

d

b

b

c

b

b

c

d

a

2

18

8

0

5

19

*bd*

*{b,c}**

a*d*

ac**

ab**

*cd*

Max-Subpattern Tree (Example)

a{b,c}d*

ad