Sensor data mining and forecasting. Christos Faloutsos CMU [email protected] Outline. Problem definition - motivation Linear forecasting - AR and AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc. Problem definition.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Problem definition - motivation
Linear forecasting - AR and AWSOM
Coevolving series - MUSCLES
Fractal forecasting - F4
Other projects
graph modeling, outliers etc
C. Faloutsos
x1 , x2 , … , xt , …
(y1, y2, … , yt, …
… )
C. Faloutsos
C. Faloutsos
C. Faloutsos
C. Faloutsos
Automobile traffic
2000
1800
1600
1400
1200
1000
800
600
400
200
0
# cars
time
C. Faloutsos
C. Faloutsos
#sunspots per month
time
C. Faloutsos
C. Faloutsos
#bytes
time
C. Faloutsos
C. Faloutsos
Each sensor collects data (x1, x2, …, xt, …)
C. Faloutsos
Sensors ‘report’ to a central site
C. Faloutsos
Problem #1:
Finding patterns
in a single time sequence
C. Faloutsos
Problem #2:
Finding patterns
in many time
sequences
C. Faloutsos
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress
count
lynx caught per year
(packets per day;
temperature per day)
year
C. Faloutsos
Given xt, xt-1, …, forecast xt+1
90
80
70
60
Number of packets sent
??
50
40
30
20
10
0
1
3
5
7
9
11
Time Tick
C. Faloutsos
C. Faloutsos
C. Faloutsos
Patterns, rules, compression and forecasting are closely related:
C. Faloutsos
C. Faloutsos
Problem definition - motivation
Linear forecasting
AR
AWSOM
Coevolving series - MUSCLES
Fractal forecasting - F4
Other projects
graph modeling, outliers etc
C. Faloutsos
C. Faloutsos
"Prediction is very difficult, especially about the future." - Nils Bohr
http://www.hfac.uh.edu/MediaFutures/thoughts.html
C. Faloutsos
90
80
70
60
Number of packets sent
??
50
40
30
20
10
0
1
3
5
7
9
11
Time Tick
C. Faloutsos
85
Body height
80
75
70
65
60
55
50
45
40
15
25
35
45
Body weight
C. Faloutsos
C. Faloutsos
90
80
70
??
60
50
40
30
20
10
0
1
3
5
7
9
11
Time Tick
xt
as a linear function of the past: xt-2, xt-2, …,
(up to a window of w)
Formally:
C. Faloutsos
85
‘lag-plot’
80
75
70
65
Number of packets sent (t)
60
55
50
45
40
15
25
35
45
Number of packets sent (t-1)
C. Faloutsos
xt
xt-1
xt-2
C. Faloutsos
xt
xt-1
xt-2
C. Faloutsos
xt
xt-1
xt-2
C. Faloutsos
C. Faloutsos
C. Faloutsos
goal: capture arbitrary periodicities
with NO human intervention
on a semi-infinite stream
C. Faloutsos
Problem definition - motivation
Linear forecasting
AR
AWSOM
Coevolving series - MUSCLES
Fractal forecasting - F4
Other projects
graph modeling, outliers etc
C. Faloutsos
What to do, then?
C. Faloutsos
C. Faloutsos
freq
time
w’
C. Faloutsos
main idea: variable-length window!
f
t
C. Faloutsos
C. Faloutsos
f
value
t
time
C. Faloutsos
f
value
t
time
C. Faloutsos
W1,3
t
W1,1
W1,4
W1,2
t
t
t
t
frequency
W2,1
W2,2
=
t
t
W3,1
t
V4,1
t
time
xt
C. Faloutsos
W1,3
t
W1,1
W1,4
W1,2
t
t
t
t
frequency
W2,1
W2,2
t
t
W3,1
t
V4,1
t
time
xt
C. Faloutsos
Wl,t-2
Wl,t-1
Wl,t
Wl’,t’-2
Wl’,t’-1
Wl,t l,1Wl,t-1l,2Wl,t-2 …
Wl’,t’ l’,1Wl’,t’-1l’,2Wl’,t’-2 …
Wl’,t’
C. Faloutsos
(incremental)
(incremental; RLS)
(single-pass)
C. Faloutsos
AWSOM
AR
Seasonal AR
C. Faloutsos
C. Faloutsos
C. Faloutsos
Space:OlgN + mk2 OlgN
Time:Ok2 O1
C. Faloutsos
C. Faloutsos
Problem definition - motivation
Linear forecasting
AR
AWSOM
Coevolving series - MUSCLES
Fractal forecasting - F4
Other projects
graph modeling, outliers etc
C. Faloutsos
??
C. Faloutsos
Q: what should we do?
C. Faloutsos
Least Squares, with
C. Faloutsos
C. Faloutsos
MUSCLES outperforms AR & “yesterday”
C. Faloutsos
C. Faloutsos
Problem definition - motivation
Linear forecasting
AR
AWSOM
Coevolving series - MUSCLES
Fractal forecasting - F4
Other projects
graph modeling, outliers etc
C. Faloutsos
C. Faloutsos
Value
Time
Given a time series {xt}, predict its future course, that is, xt+1, xt+2, ...
C. Faloutsos
C. Faloutsos
Interpolate these…
To get the final prediction
4-NN
New Point
Lag = 1,k = 4 NN
xt
xt-1
C. Faloutsos
C. Faloutsos
C. Faloutsos
Embedding dimensionality = 3
Intrinsic dimensionality = 1
C. Faloutsos
log( # pairs)
C. Faloutsos
log(r)
x(t)
time
The Logistic Parabola xt = axt-1(1-xt-1) + noise
X(t)
C. Faloutsos
X(t-1)
x(t)
x(t-1)
x(t-2)
x(t)
x(t)
x(t-1)
x(t-1)
x(t-2)
x(t-2)
x(t)
x(t-1)
C. Faloutsos
Fractal dimension
C. Faloutsos
Lag
Fractal Dimension
epsilon
Choose this
Lag (L)
C. Faloutsos
C. Faloutsos
How do we interpolate between thek nearest neighbors?
A3.1: Average
A3.2: Weighted average (weights drop with distance - how?)
C. Faloutsos
A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition)
xt
Xt-1
C. Faloutsos
A4: YES!
C. Faloutsos
C. Faloutsos
P
H
Skip
Example: Lotka-Volterra equations
dH/dt = r H – a H*P dP/dt = b H*P – m P
H is count of prey (e.g., hare)P is count of predators (e.g., lynx)
Suppose only P(t) is observed (t=1, 2, …).
C. Faloutsos
P
H
Skip
P(t)
P(t-1)
C. Faloutsos
C. Faloutsos
x(t)
time
Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]
Lag-plot
C. Faloutsos
x(t)
time
Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]
Lag-plot
ARIMA: fails
C. Faloutsos
Our Prediction from here
Value
Timesteps
C. Faloutsos
Value
Comparison of prediction to correct values
Timesteps
C. Faloutsos
Value
LORENZ: Models convection currents in the air
dx / dt = a (y - x)
dy / dt = x (b - z) - y
dz / dt = xy - c z
C. Faloutsos
Value
Comparison of prediction to correct values
Timesteps
C. Faloutsos
Value
Time
C. Faloutsos
Value
Comparison of prediction to correct values
Timesteps
C. Faloutsos
C. Faloutsos
C. Faloutsos
C. Faloutsos
count
?
avg: 3.3
degree
C. Faloutsos
count
avg: 3.3
degree
C. Faloutsos
count
avg: 3.3
degree
C. Faloutsos
log(count)
log {(out) degree}
C. Faloutsos
Effective Diameter
Count vs Indegree
Count vs Outdegree
Hop-plot
Stress
“Network value”
Eigenvalue vs Rank
C. Faloutsos
Effective Diameter
Count vs Indegree
Count vs Outdegree
Hop-plot
Stress
“Network value”
Eigenvalue vs Rank
C. Faloutsos
NO
MAYBE
YES
C. Faloutsos
C. Faloutsos
C. Faloutsos
C. Faloutsos
finds outliers quickly,
with no human intervention
C. Faloutsos
C. Faloutsos
www.cs.cmu.edu/~christos
C. Faloutsos
C. Faloutsos
C. Faloutsos
http://www.postech.ac.kr/~bkyi/
C. Faloutsos
C. Faloutsos
C. Faloutsos
C. Faloutsos
C. Faloutsos