Dataset Shift Detection in Non-Stationary Environments using EWMA Charts. Prof. Girijesh Prasad Co-authors: Haider Raza, Yuhua Li School of Computing & Intelligent Systems @ Magee , Faculty of Computing & Engineering, Derry~Londonderry . [email protected] Outline. Motivation
Dataset Shift Detection in Non-Stationary Environments using EWMA Charts
Co-authors: Haider Raza, Yuhua Li
School of Computing & Intelligent Systems @ Magee,
Faculty of Computing & Engineering, Derry~Londonderry.
Dataset shift-detection(Shewhart 1939),
(Alippi et al. 2011b),
(Alippi & Roveri 2008a;
Alippi & Roveri 2008b)
(Torres et al.
(M Krauledat 2008),
(Sugiyama et al. 2009)
Is this assumption really true?
Reason :- Non-StationaryEnvironments !
No….!!! Not always true
What is the challenge?
Dataset Shift appears when training and testjoint distributions are different. That is, when (Torres, 2012)
*Note : Relationship between covariates (x) and class label (y)
XY: Predictive model (e.g., spam filtering)
YX: Generative model (e.g., Fault detection )
Types of Dataset Shift
Prior probability shift appears only in YX problems
Concept shifts appears
Detecting abrupt and gradual shifts in time-series data is called the data shift-detection.
Types of Shift-Detection
Types of Control Charts
where λ is the smoothing constant (0<λ≤1).
It is a first-order integrated moving average (ARIMA) model.
Where is a sequence of i.i.d random signal with zero mean and constant variance.
Equation (1) with , is the optimal 1-step-ahead prediction for this process
The 1-step-aheaderror are calculated as
IF the 1-step-ahead erroris normally distributed, then
Dataset 1-Jumping Mean (D1):
where is a noise with mean and standard deviation 1.5. The initial values are set as.
A change point is inserted at every 100 time steps by setting the noise mean at time as
where is a natural number such that.
Dataset 2-Scaling Variance (D2): The change point is inserted at every 100 time steps by setting the noise
standard deviation at time as
where is a natural number such that
Dataset 3-Positive-Auto-correlated (D3): The dataset is consisting of 2000 data-points, the non stationarity
occurs in the middle of the data stream, shifting from to, where denotes the
normal distribution with mean and standard deviation respectively.
Dataset 4-Auto-correlated (D4): The dataset is a time-series consisting of 2000 data-points using 1-D digital filter from matlab. The filter function creates a direct form II transposed implementation of a standard difference equation. In the filter, the denominator coefficient is changed from 2 to 0.5 after producing 1000 number of points.
Real-world Dataset: EEG Based Brain Signals
The real-world data used here are from BCI competition-III dataset (IV-b). This dataset, contains 2 classes,
118 EEG channels (0.05-200Hz), 1000Hz sampling rate which is down-sampled to 100Hz, 210 training trials,
and 420 test trials.
Figure : pdf plot of 3 different sessions’ data taken from the training dataset. It is clear from the plot that, in each session the distribution is changed by shifting the mean from session-to-session transfer.
Figure: Shift detection based on SD-EWMA: Dataset 1 (jumping mean): (a) the shift point is detected at every 100th point. (b) Zoomed view of figure a: shift is detected at 401st sample by crossing the upper control limit.
Figure : Shift detection based on SD-EWMA: (a) Dataset 2 (scaling variance): the shift is detected at 3 points.
(b) Dataset 3 (positive auto-correlated): detects the shift after producing 1000 observations.
(c) Dataset 4 (Auto-correlated): detects the shift after producing 1000 observations.
Table :SD-EWMA shift detection in time-series data
Table : Simulation results on different tests
Figure 4: A window of 2000 samples obtained from real-world dataset.
Table 4: SD-EWMA shift detection in BCI data
Thank You !