Loading in 2 Seconds...

Some Ideas for Detecting Spurious Observations Based on Mixture Models

Loading in 2 Seconds...

- By
**calla** - Follow User

- 111 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Some Ideas for Detecting Spurious Observations Based on Mixture Models' - calla

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Some Ideas for Detecting Spurious Observations Based on Mixture Models

### Some Ideas for Detecting Spurious Observations

Jim Lynch

NISS/SAMSI & University of South Carolina

Work with Dave Dickey and Francisco Vera

Very Preliminary Ideas

Primarily Motivated by Dave’s American Airlines Data and Proschan’s (1963) paper on pooling to explain a decreasing failure rate and, to a lesser extent, M. J. Bayarri talk on Multiple testing

Outline

- 1. Introduction
- 2. Mixture Models
- 3. Some Ideas
- 4. Simulations
- 5. The American Airlines Data

IntroductionSome Motivation – AA Data(Largest Log Vol Removed)

- Some Time Series Diagnostics Suggest That Log Volume Ratio is an MA(1)
- Fit an MA(1) to the log Vol Ratio to the AA Data
- Look At The Residuals

Introduction

- Detecting spurious observations is an important area of research and has implications for anomaly detection (AD).
- The term spurious observation is used to distinguish it from an outlier, since outliers are usually extreme observations in the data while a spurious observation need not be.
- E.g., one could imagine that sophisticated intruders into computer systems would make sporadic intrusions and try to mimic as best as possible normal behavior

Introduction

- Goal
- To develop approaches to detect very transient spurious events where the objectives are
- To detect when there are spurious events present and, if possible,
- To identify them

Introduction

- The Basic Data Analytic Model
- X1,…, Xn iid ~ fp = (1-p) f0 + p f1
- f0 is the background model
- f1 models the spurious behavior
- The likelihood is then

Introduction

- A “More Realistic” Model
- Generate a configuration C with probability p(C)
- Given C, for ieC, Xi are iid ~ f0 and, for ieCc, Xi are iid ~ f1
- Cand Cc model a spatial or temporal (e.g., a change-point) pattern
- You are “pooling” observations based on the configuration C
- The likelihood is then

IntroductionSome Approaches for Analyzing the “MR” Model

- Envision that the data are the effects of pooling observations from f0 and f1.
- Treat the data as if it is from a mixture model and use a mixture model to determine the mle, p*, of the mixing proportion.
- Use p* to test H0: p=0 versus H1: p>0(Under H0 and the mixture model, n-.5p* converges in distribution to X where X=0 with probability .5 and =|N(0,I0-1)| with probability .5)
- If H0 is rejected see if the mixture model can give insights into the configuration Cj
- E.g., do an empirical Bayes with prior p(Cj)=(1-p*)jp*n-j. Then

IntroductionAnother Approach

- Since f1 models the spurious behavior p~0
- p~0 suggest using the locally most powerful(LMP) test statistic for testing H0:p=0 versus H1:p>0 as the basis of discovering if there are spurious observations present
- The test statistic is related essentially to the gradient plot introduced by Lindsay (1983) to determine when a finite mixture mle is the global mixture mle in the mixed distribution model

IntroductionAnother Approach

- The basis of this approach
- use the gradient plot to determine if the one point mixture mle is the global mixture mle
- When it isn’t, this suggest that some spurious behavior is present
- One can then use the components in the mle mixed distribution to calculate “assignment probabilities” to the data to indicate what observations might be considered spurious
- The examples indicate that detecting the presence of spurious observations seems to be considerably simpler than identifying which ones they are

IntroductionMining Data Graphs

- Data (Maguire, Pearson and Wynn, 1952): Time Between Accidents with 10 or more fatalities
- At the right are the gradient plots for the 2 and 3 point mixture mle’s and the assignment function for the 3 pt mle (mixing over exponentials)
- The 2 and 3 pt mixture mle’s
- m: 592.9, 166.2 p: .175, .825
- m: 595.5, 171.6, 29.1 p: .171, .806, .023

Mixture Models

- X1,…, Xn iid ~ fp = (1-p) f0 + p f1
- f0 is the background model
- f1 models the spurious behavior
- Since the spurious observations are sporadic/transient p~0
- Denote the log likelihood by f(f(X1),…, f(Xn)) = f(f) = log Pif(Xi)
- Denote the gradient function of f by

Mixture Models – LMP

- LemmaThe locally most powerful test for testingH0:p=0 versus H1:p>0 is based on F0(f1; f0).
- ProofThe LMP test for testing H0:p= p0 versus H1:p> p0 is based on the statistic

For p=0 this reduces to

Mixture Model

- The FunctionF(f1; f0)
- Plays a prominent role in the analysis of data from mixtures models where it is essentially the gradient function.
- Introduced by Lindsay (1983a&b and 1995) to determine when the mle for the mixing distribution with a finite number of points was the global mixture mle.

Mixture ModelFramework

- Family of densities {fq:q e Q}.
- M is the set of probability measures on Q.
- The mixed distribution over the family with mixing distribution Q by
- For X1,…, Xn be iid from fQ, the likelihood and log likelihood are given by
- L(Q) = PfQ(Xi) and f(fQ) = log PifQ(Xi)
- fQ= (fQ(X1),…, fQ(Xn)).

Mixture ModelFramework

- The Directional Derivative

Mixture ModelA Diagnostic

- Theorem 4.1 of Lindsay (1983a)
- A. The following three conditions are equivalent:
- Q* maximizes L(Q)
- Q* minimizes supq D(q;Q)
- supq D(q;Q*)=0.
- B. Let f*=fQ*. The point (f*,f*) is a saddle point of .i.e.,

F(fQ’;f*) < 0 = F(f*;f*) <F(f*; fQ’’) for Q’, Q’’ e M.

- C. The support of Q* is contained in the set of q for which D(q;Q*)=0.

Simulationsn=10: 5 points N(0,1), 5 points N(1,1)

- 0 -0.34964
- 0 -1.77582
- 0 -0.92900
- 0 0.58061
- 0 -0.36032
- 1 2.51937
- 1 0.59549
- 1 1.16238
- 1 0.76632
- 1 1.57752

AA Data

- Francisco will discuss this and some other simulations in a moment.

Closing Comments

- Is there an analogue (or alternative) of these ideas for the SCAN (or for the SCAN framework)?
- As an alternative, view the problem as having several (two) mechanisms creating observations
- background
- infectious material is present.
- Just consider that the data are a pooling from all these sites. See if the data is a 2-component mixture. If it is, try to “assign” the sites to these components. (You might use a thresh-holding of the assignment function to do this or p in the LMP Test Statistic.)
- Instead of the assignment function, consider the following based on the LMP test statistic. Define Li=(f1(Xi) - f0(Xi))/f0(Xi). Let L(1) <L(2) <…< L(n) and let j(i) denote the inverse rank, i.e., L(i)= Lj(i). For mixture or scanning purposes, consider the sets Ci={j(n),..,j(n-i+1)}={k: L(n-i+1)< Lk}. For mixtures with mle p*, assign Ci to f1 and Cic to f0 where np*~i. For scanning purposes, look through increasing sequence of sets Ci for a spatial pattern to emerge.

REFERENCES

Ferguson, T. S. (1967) Mathematical Statistics: A Decision Theoretical Approach. Academic Press, NY.

Grego, J., Hsi, Hsiu-Li, and Lynch, J. D. (1990). A strategy for analyzing mixed and pooled exponentials. Applied Stochastic Models and Data Analysis, 6, 59-70.

Lindsay, B.G. (1983a). The geometry of mixture likelihoods: a general theory. Ann. Statist., 11, 86-94.

Lindsay, B.G. (1983b). The geometry of mixture likelihoods, Part II: the exponential family. Ann. Statist., 11, 783-792.

Lindsay, B.G. (1995). Mixture Models: Theory, Geometry & Applications, NSF-CBMS lecture series, IMS/ASA

Maguire, B.A., Pearson, E.S., and Wynn, A.H.A. (1952) The time interval between industrial accidents. Biometrika, 39, 168-180.

Proschan, F. (1963). Theoretical explanation of decreasing failure rate. Technometrics, 5, 375-383.

Download Presentation

Connecting to Server..