Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

Simple Method for Outlier Detectionin Fitting Experimental Dataunder Interval Error Sergei Zhilin, sergei@asu.ru Altai State University,Barnaul, Russia

Plan • Fitting under interval error • Simple method for outlier detection • Geometric correction of satellite images • Connections between the proposed approach and other theories • Conclusions

x1 f(x,) x2  y + … xp  Fitting under Interval Error • Black box approach Output variable ymeasured with error Input variables x = (x1,…,xp) measured without error Modeling function with known structure Model parametersto be estimated Measurement error

Fitting under Interval Error • Classical statistical approach often assumes that the measurement error is normal • In real-life applications the error is rather interval than normal • “Interval” means “unknown but bounded”: •   [j, j], where j is error bound in j-th measurement, j=1,…,n • There are no other assumptions about the error

Each row (xj,yj,j) of the measurements table constrains possible values of the parameter with the set • Values of the parameter  consistent with all constraints form the uncertainty set Fitting under Interval Error • The structure of the modeling functionf (x,) is assumed fixed

Fitting under Interval Error • Fitting data with the model y = 1 + 2x In (x, y) domain In (1, 2) domain Uncertainty set A is unbounded = not enough data to build the model y 2 Setof feasible models Uncertainty set A Set of feasible models Uncertainty set A 1 x

Interval estimates of  • Point estimates of  Fitting under Interval Error • Problems that may be stated with respect to the uncertainty set A • Model parameters estimation

Interval estimate of y • Point estimate of y Fitting under Interval Error • Problems that may be stated with respect to the uncertainty set A • Prediction of the output variable value for fixed values of input variables

Fitting under Interval Error • All the above problems make sense only if the uncertainty set is not empty • Possible reasons of the emptiness of the uncertainty set • Presence of outliers in the data set • Wrong structure assumed for the modeling function

Simple method for outlier detection • Core idea • An outlier may be treated as a measurement with the underestimated error (i.e. the actual measurement error is greater than the declared error j for it) • What are the lower bounds j' for actual errors which provide non-empty uncertainty set?

j' Simple method for outlier detection • How much must we stretch the declared error interval in order to «correct» an outlier? In variables domain In parameters domain Let j' = wj·j wj =? y 2 j 1 x

(3) (3) (4) (5) Simple method for outlier detection • Weights wj may be found from the following optimization problem (1) (2) Uncertainty set constraints with movable bounds …or “freeze” some of error intervals We can only enlarge error intervals… Some of the measurements are obtained with equal errors

Simple method for outlier detection Data with outliers which give empty uncertainty set Looks like outlier caused by a blunder. Let’s try to exclude it. Not so explicit.We need to examine the precision of method C 1st attemptSolution of LPP (1)-(3) • Example y = 1 + 2x y x

Simple method for outlier detection 2nd attemptSolution of (1) subject to (2)-(3) and w8 = w9 Is the precision of the method C overestimated on ~14%? • Example y = 1 + 2x Summary In order to correct inconsistent data set we have to answer the following questions: 1. Is the outlier #3 really caused by a blunder? 2. Is the outlier #8 caused by a blunder OR is the precision of the method C overestimated? y x

u x + + + v y + + + + + + + Ground Control Points # Source coordinates Target coordinates x y u v 1 2935 3072 14486.30 5991.49 2 2045 2745 14349.30 5927.55 14 1795 2714 14309.60 5919.30 Geometric correction ofsatellite images Distorted image Target coordinate system Geometric transformation + + + Pointed by operatoron the screen with the error ≥ 1 pixel Obtained usinghigh-precision methods (GPS, large-scale maps) After correction of outliers and building transformation, target image is built Outliers are detected «on the fly» and operator is noticed about error

Positional uncertainty (x  x)+(y  y), pixels Geometric correction ofsatellite images Resulting image with ground control points Resulting image with positional uncertainty map

Connections with other theories • Proposed approach andinconsistent linear programming problems • When outliers are presented in the data, most of the problems with respect to the uncertainty set may be stated as inconsistent linear programming problems • Simple outlier detection method may be regarded as one of the possible ways to correct an inconsistent linear programming problem by building a minimal cost approximation by a proper linear programming problem.

(3) (3') Connections with other theories • Proposed approach and robust estimation (1) (2) Uncertainty set constraints with movable bounds • Solution (*, w*) of (1)-(3') gives • * isM-estimator for parameters  (known as L1) • Weight function: W(x) = 1/|x|. • Residuals: wj*·j. We can only enlarge error intervals… We allow to scale error intervals freely (to expand and to contract)

Conclusions • Outlier detection is necessary tool in fitting experimental data • Interval error model provides effective means of solving outliers detection problem • Proposed approach is based on the simple idea and may be simply implemented • Proposed approach provides flexible way to express and take into account a priori information

Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error