1 / 8

CSE 8392 SPRING 1999 DATA MINING: ADVANCED TOPICS Temporal Data

CSE 8392 SPRING 1999 DATA MINING: ADVANCED TOPICS Temporal Data. Professor Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 (214) 768-3087 fax: (214) 768-3085 email: mhd@seas.smu.edu www: http://www.seas.smu.edu/~mhd

Download Presentation

CSE 8392 SPRING 1999 DATA MINING: ADVANCED TOPICS Temporal Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 8392 SPRING 1999DATA MINING: ADVANCED TOPICSTemporal Data Professor Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 (214) 768-3087 fax: (214) 768-3085 email: mhd@seas.smu.edu www: http://www.seas.smu.edu/~mhd April 1999

  2. TEMPORAL DATA OVERVIEW (Fayyad,Ch9) • Databases historically have contained non-temporal data • Records represent attributes at a single point in time (snapshot) • Analysis of temporal (time varying) records presents a unique set of challenges and possibilities • Transaction Time • Valid Time • Other time interpretation??? • Examples • NASA satellites: 1 TB of data per day • Patient monitoring • Financial market monitoring

  3. Pattern Detection in Temporal Data • Detection of patterns is fuzzy • No exact match • Approximation required • Humans are good at detecting such patterns, but machines are not • Related fields of research offer helpful techniques • Spelling Correctors • Statistics • Signal processing • Genetic algorithms • Speech recognition

  4. Pattern-Based Similarity Search (R[2]) • Identifying companies with similar growth patterns • Finding similar weather patterns • Sequence matching for temporal databases • Whole Matching - target and sequence have same length • Subsequence Matching - target may be shorter than sequences in database. Must match starting point. • Similar to pattern matching in texts • Approach differences: • Technique used • Similarity measure • Use of scaling or translation • Optimization (reduce search space or number of comparisons)

  5. Pattern Matching Similarity Measures (R[2]]) • Problem: Given Target X=<x1, x2, … , xn> and Sequence Y=<y1, y2, … , yN>, find D(X,Y). • May assume n=N or n<N and look at all subsequences of length n. • Euclidean Distance - Form. 7.1 p 878 • Linear Correlation - Form. 7.2 p 878 • Discrete Fourier Transform - Form. 7.3 p 878

  6. Dynamic Time Warping (DTW) (Fayyad,Ch9) • Uses dynamic time warping to investigate time series data • Involves matrix calculations • Requires distance measurements |x - y| or (x - y)2 • Warping Path determined based on minimum cumulative distances found • DTW imposed restrictions (p234) Monotonic; Continuous; Windowed; Slope; Boundary • Example - p 235 • Normalization • Convert raw scores (distances) to determine relative scores

  7. Telecom Alarm Databases • Dissertation by Mika Klemettinen Univ of Helsinki, January 22, 1999 • Alarm - message generated by telecom network entity describing a problem. • Uses management network • Correlation - Combining information from multiple alarms to interpret together. (p 16,17) • Telecommunications Alarm Sequence Analyzer (TASA) - Recognize pattern defined by sequence of alarm messages. Based on pattern an action is taken. Window is associated with pattern. (p 20) • Episode Rule (p29) - Generalization of Association Rule

  8. Summary • Pattern prediction with temporal data is challenging • Generalization of nontemporal DM: Classification, Prediction, Association Rules • Complicated by temporal relationship • “Big data” poses significant challenges • Must sift data, then • Detect meaningful patterns • Issues • Testing and validation • Prior trends may not be indicative of future patterns

More Related