1 / 112

Sensor data mining and forecasting

Sensor data mining and forecasting. Christos Faloutsos CMU christos@cs.cmu.edu. Outline. Problem definition - motivation Linear forecasting - AR and AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc. Problem definition.

oni
Download Presentation

Sensor data mining and forecasting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sensor data mining and forecasting Christos Faloutsos CMU christos@cs.cmu.edu

  2. Outline Problem definition - motivation Linear forecasting - AR and AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc C. Faloutsos

  3. Problem definition • Given: one or more sequences x1 , x2 , … , xt , … (y1, y2, … , yt, … … ) • Find • forecasts; patterns • clusters; outliers C. Faloutsos

  4. Motivation - Applications • Financial, sales, economic series • Medical • ECGs +; blood pressure etc monitoring • reactions to new drugs • elderly care C. Faloutsos

  5. Motivation - Applications (cont’d) • ‘Smart house’ • sensors monitor temperature, humidity, air quality • video surveillance C. Faloutsos

  6. Motivation - Applications (cont’d) • civil/automobile infrastructure • bridge vibrations [Oppenheim+02] • road conditions / traffic monitoring C. Faloutsos

  7. Automobile traffic 2000 1800 1600 1400 1200 1000 800 600 400 200 0 Stream Data: automobile traffic # cars time C. Faloutsos

  8. Motivation - Applications (cont’d) • Weather, environment/anti-pollution • volcano monitoring • air/water pollutant monitoring C. Faloutsos

  9. Stream Data: Sunspots #sunspots per month time C. Faloutsos

  10. Motivation - Applications (cont’d) • Computer systems • ‘Active Disks’ (buffering, prefetching) • web servers (ditto) • network traffic monitoring • ... C. Faloutsos

  11. Stream Data: Disk accesses #bytes time C. Faloutsos

  12. Settings & Applications • One or more sensors, collecting time-series data C. Faloutsos

  13. Settings & Applications Each sensor collects data (x1, x2, …, xt, …) C. Faloutsos

  14. Settings & Applications Sensors ‘report’ to a central site C. Faloutsos

  15. Settings & Applications Problem #1: Finding patterns in a single time sequence C. Faloutsos

  16. Settings & Applications Problem #2: Finding patterns in many time sequences C. Faloutsos

  17. Problem #1: Goal: given a signal (eg., #packets over time) Find: patterns, periodicities, and/or compress count lynx caught per year (packets per day; temperature per day) year C. Faloutsos

  18. Problem#1’: Forecast Given xt, xt-1, …, forecast xt+1 90 80 70 60 Number of packets sent ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick C. Faloutsos

  19. Problem #2: • Given: A set of correlatedtime sequences • Forecast ‘Sent(t)’ C. Faloutsos

  20. Differences from DSP/Stat • Semi-infinite streams • we need on-line, ‘any-time’ algorithms • Can not afford human intervention • need automatic methods • sensors have limited memory / processing / transmitting power • need for (lossy) compression C. Faloutsos

  21. Important observations Patterns, rules, compression and forecasting are closely related: • To do forecasting, we need • to find patterns/rules • good rules help us compress • to find outliers, we need to have forecasts • (outlier = too far away from our forecast) C. Faloutsos

  22. Pictorial outline of the talk C. Faloutsos

  23. Outline Problem definition - motivation Linear forecasting AR AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc C. Faloutsos

  24. Mini intro to A.R. C. Faloutsos

  25. Forecasting "Prediction is very difficult, especially about the future." - Nils Bohr http://www.hfac.uh.edu/MediaFutures/thoughts.html C. Faloutsos

  26. Problem#1’: Forecast • Example: give xt-1, xt-2, …, forecast xt 90 80 70 60 Number of packets sent ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick C. Faloutsos

  27. Linear Regression: idea 85 Body height 80 75 70 65 60 55 50 45 40 15 25 35 45 Body weight • express what we don’t know (= ‘dependent variable’) • as a linear function of what we know (= ‘indep. variable(s)’) C. Faloutsos

  28. Linear Auto Regression: C. Faloutsos

  29. 90 80 70 ?? 60 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick Problem#1’: Forecast • Solution: try to express xt as a linear function of the past: xt-2, xt-2, …, (up to a window of w) Formally: C. Faloutsos

  30. Linear Auto Regression: 85 ‘lag-plot’ 80 75 70 65 Number of packets sent (t) 60 55 50 45 40 15 25 35 45 Number of packets sent (t-1) • lag w=1 • Dependent variable = # of packets sent (S[t]) • Independent variable = # of packets sent (S[t-1]) C. Faloutsos

  31. More details: • Q1: Can it work with window w>1? • A1: YES! xt xt-1 xt-2 C. Faloutsos

  32. More details: • Q1: Can it work with window w>1? • A1: YES! (we’ll fit a hyper-plane, then!) xt xt-1 xt-2 C. Faloutsos

  33. More details: • Q1: Can it work with window w>1? • A1: YES! (we’ll fit a hyper-plane, then!) xt xt-1 xt-2 C. Faloutsos

  34. Even more details • Q2: Can we estimate a incrementally? • A2: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Chen+94], or [Yi+00], for details) • Q3: can we ‘down-weight’ older samples? • A3: yes (RLS does that easily!) C. Faloutsos

  35. Mini intro to A.R. C. Faloutsos

  36. goal: capture arbitrary periodicities with NO human intervention on a semi-infinite stream How to choose ‘w’? C. Faloutsos

  37. Outline Problem definition - motivation Linear forecasting AR AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc C. Faloutsos

  38. Problem: • in a train of spikes (128 ticks apart) • any AR with window w < 128 will fail What to do, then? C. Faloutsos

  39. Answer (intuition) • Do a Wavelet transform (~ short window DFT) • look for patterns in every frequency C. Faloutsos

  40. Intuition • Why NOT use the short window Fourier transform (SWFT)? • A: how short should be the window? freq time w’ C. Faloutsos

  41. main idea: variable-length window! wavelets f t C. Faloutsos

  42. Advantages of Wavelets • Better compression (better RMSE with same number of coefficients - used in JPEG-2000) • fast to compute (usually: O(n)!) • very good for ‘spikes’ • mammalian eye and ear: Gabor wavelets C. Faloutsos

  43. f value t time Wavelets - intuition: • Q: baritone/silence/ soprano - DWT? C. Faloutsos

  44. f value t time Wavelets - intuition: • Q: baritone/soprano - DWT? C. Faloutsos

  45. W1,3 t W1,1 W1,4 W1,2 t t t t frequency W2,1 W2,2 = t t W3,1 t V4,1 t time AWSOM xt C. Faloutsos

  46. W1,3 t W1,1 W1,4 W1,2 t t t t frequency W2,1 W2,2 t t W3,1 t V4,1 t time AWSOM xt C. Faloutsos

  47. Wl,t-2 Wl,t-1 Wl,t Wl’,t’-2 Wl’,t’-1 AWSOM - idea Wl,t l,1Wl,t-1l,2Wl,t-2 … Wl’,t’ l’,1Wl’,t’-1l’,2Wl’,t’-2 … Wl’,t’ C. Faloutsos

  48. More details… • Update of wavelet coefficients • Update of linear models • Feature selection • Not all correlations are significant • Throw away the insignificant ones (“noise”) (incremental) (incremental; RLS) (single-pass) C. Faloutsos

  49. Results - Synthetic data AWSOM AR Seasonal AR • Triangle pulse • Mix (sine + square) • AR captures wrong trend (or none) • Seasonal AR estimation fails C. Faloutsos

  50. Results - Real data • Automobile traffic • Daily periodicity • Bursty “noise” at smaller scales • AR fails to capture any trend • Seasonal AR estimation fails C. Faloutsos

More Related