Loading in 5 sec....

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases PowerPoint Presentation

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases

Download Presentation

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases

Loading in 2 Seconds...

- 154 Views
- Uploaded on
- Presentation posted in: General

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases

Yi-Cheng Chen, Ji-Chiang Jiang, Wen-ChihPeng and Suh-Yin Lee

Department of Computer Science

National Chiao Tung University

Hsinchu, Taiwan 300

{ejen.cs95g, perrys0620.cs96g}@nctu.edu.tw wcpeng@cs.nctu.edu.tw sylee@csie.nctu.edu.tw

CIKM, 2010

- 1.INTRODUCTION
- 2.PROBLEM DEFINITION
- 3.INCISION STRATEGY
- 4.COINCIDENCE REPRESENTATION
- 5.CTMiner ALGORITHM
- 6.EXPERIMENTAL RESULTS
- 7.CONCLUSION AND FUTURE WORK

- All related researches in this domain are based on Allen’s temporal logics.
- Which there are 13 temporal relations between any two event intervals .

Compare with previous works：

- Kam et al. - hierarchical representation.
- Hoppner - scan database by sliding window.
- Papapetrou - Hybrid-DFS algorithm.
- Wu et al. - TPrefixSpan.
- Patel et al. - Augmented Representation(By additional counting information ), and IEMiner.

Propose ：

- Incision strategy
- Coincidence representation
- CTMiner (Coincidence Temporal Miner)

Event interval and event sequence

- E = {e1, e2,…, ek} be the set of event symbols.
- (ei, si, fi), ei∈ E, si , fi,are time points, si < fi
- Event start：ei.tsEvent finish：ei.tf
- {(e1, s1, f1), (e2, s2, f2), …, (en, sn, fn)} where si≤si+1 and si< fi

Temporal database

- Database D = {r1, r2, …, rm}, each record ri, where 1≤ i≤ m
- A record riconsists of a sequence-id and an event interval(start time and finish time).
- Records in the database D with the same client-id are grouped together.
- Database D can be viewed as a collection of event sequences.

Time set and time sequence

- An event sequence q = {(e1, s1, f1), (e2, s2, f2), …, (en, sn, fn)}
- The set T ={s1, f1, s2, f2, …, si, fi,…, sn, fn} is called a time set corresponding to sequence q.
- Order all the elements in T and eliminate redundant element, we got sequence Ts.sequence Ts = {t1, t2, t3, …, tk}where ti∈ T , ti< ti+1.

- Event slice

- Event slice

(en, sn, fn)(B,1,5),(D,8,4),(E,10,13),(F,10,13)

4 event intervals in sequence 2

Corresponding time set T={1,5,8,14,10,13,10,13}{s1, f1, s2, f2, s3, f3, s4, f4 }

Time sequence Ts ={1,5,8,10,13,14}{t1, t2, t3, …, tk}

Event slice

- Let set L = { +, -, *, Φ}, a set of event sequences Q = {q1, q2, …, qi,…}, qi= {(e1, s1, f1), …, (ej, sj, fj) , … (en, sn, fn)}

- Event slice

start slice D＋= (D, 8, 10)intermediate slice D*= (D, 10, 13)finish slice D－= (D, 13, 14)

The event interval B has only one intact slice B = (B, 1, 5)

- Incision example

- Incision example

The incision strategy can totally avoid the generation of intermediate slices. By trimming the intermediate slices, we can still express the relationship between any two intervals correctly.

- Group simultaneously occurring slices together to form the coincidences.
- Concatenation with all coincidences can describe an event sequence effectively.
- Simplify the processing of complex pairwise relationships between all intervals efficiently.

- Good scalability
- Nonambiguity
- Simple is good
- Compact space usage

min_sup = 2

- Runtime performance on synthetic data sets

- Real world dataset analysis

- Coincidence representation is nonambiguous and has several advantages over existing representations .

- Further：mining closed and maximal temporal patterns, incremental temporal patterns mining, and the research of method toward data stream.