Founded 1348

http://www.fsv.cuni.cz Charles University Founded 1348

Barcelona Barcelona World Congress of the Bernoulli Society World Congress of the Bernoulli Society 25. - 31. 7. 2004 25. - 31. 7. 2004 LEAST WEIGTED SQUARES FOR PANEL DATA LEAST WEIGTED SQUARES FOR PANEL DATA Jan Ámos Víšek Jan Ámos Víšek http://samba.fsv.cuni.cz/~visek/bernoulli http://samba.fsv.cuni.cz/~visek/bernoulli Institute of Economic Studies Faculty of Social Sciences Charles University Prague Institute of Information Theory Institute of Economic Studies Faculty of Social Sciences Charles University Prague Institute of Information Theory and Automation Academy of Sciences of the Czech Republ and Automation Academy of Sciences of the Czech Republ

Topic of presentation ● Paradigm of the robust estimation ( which the Least Weighted Squares fulfill) ● Definition of the Least Weighted Squares ● Their properties ● Algorithm for their evaluation

Robust regression Requirements on an estimator of regression coefficients naturally inherited from the classical statistics Nearly “automatically” fulfilled for “classical” estimators, hence frequently unduly ignored in robust regression Unbiasedness Consistency Asymptotic normality Reasonably high efficiency Nearly impossible to fulfill for robust estimators, hence abandoned Scale- and regression-equivariance E.g. simple M-estimators lack this property, for discussion see Bickel, P.J. (1975): One-step Huber estimates in the linear model. JASA 70, 428-433. Jurečková J., P.K. Sen (1984): On adaptive scale-equivariantM-estimators in linear models. Statistics and Decisions, vol.2(1984), Suppl. IssueNo.1.

Robust regression Requirements on an estimator of regression coefficients naturally stemming from principles of robustness Hampel, F. R., E.M. Ronchetti, P. J. Rousseeuw, W. A. Stahel(1986): Robust Statistics - The Approach Based on Influence Functions. New York: J.Wiley & Sons. Quite low gross-error sensitivity Let’s call these four points Hampel’s paradigm Low local shift sensitivity Preferably finite rejection point Highbreakdown point The applications indicated that “high” should be substituted by “controlable”, see e.g. Víšek, J. Á. (2003): Development of the Czech export in nineties. In: Consolidation of governing and business in the Czech republic and EU I., 193 - 220, ISBN 80-86732-00-2, MatFyz Press. If interested in, ask me for sending by e-mail.

Robust regression Requirements on an estimator of regression coefficients naturally stemming from ..... - a comment Highbreakdown point may be sometimes self-destructive Let us agree, for a while, that the majority of data determines the “true” model. Then a small change even of one observation can cause a large change of estimate. What is the problem ? The method too much relies on selected “true” points ! Hence, it may be preferable to reject observations “smoothly”.

Robust regression Requirements on an estimator of regression coefficients ( nearly) inevitable for successful applications Víšek, J.Á. (2000): A new paradigm of point estimation. Proc. of Data Analysis 2000/II, Modern Statistical Methods - Modeling, Regression, Classification and Data Mining, ISBN 80-238-6590-0, 195 - 230. If interested in, ask me for sending by e-mail. Available diagnostics, sensitivity studies and accompanying procedures Existence of an implementation of the algorithm with acceptable complexity and reliability of evaluation An efficient and acceptable heuristics Let’s discuss them point by point.

Requirements on a robust estimator of regression coefficients ( nearly) inevitable for successfulapplications Available diagnostics, sensitivity studies and ...... as or Víšek, J.Á. (2003): Estimating contamination level. Proc. Fifth Pannonian Sympos.on Math. Statist., Visegrad, Hungary 1985, 401--414. Víšek, J.Á. (1998): Robust specification test. Proc. Prague Stochastics'98 (eds. M. Hušková, P. Lachout, Union of Czechoslovak Mathematicians andPhysicists), 1998, 581 - 586. Víšek, J.Á. (2001): Durbin-Watson statistic for the least trimmed squares. Bulletin of the Czech Econometric Society, vol. 8, 14/2001, 1 – 40. Víšek, J.Á. (2002): White test for the least weigthed squares. COMPSTAT 2002, Berlin, Short Communications and Poster (CD), ISBN 3-00-009819-4 (eds. S. Klinke, P. Ahrend, L. Richter). Víšek, J.Á. (2003): Durbin-Watson statistic in robust regression. Probability and Mathematical Statistics, vol. 23., Fasc. 2(2003), 435 - 483. Kalina, J. (2003): Autocorrelated disturbances of robust regression. European Young Statistician Meeting 2003 – to appear. If interested in, ask me for sending by e-mail.

Requirements on a robust estimator of regression coefficients ( nearly) inevitable for successfulapplications Available diagnostics, sensitivity studies and accompanying procedures or as Jurečková J., J. Á.Víšek (1984): Sensitivity of Chow--Robbins procedure to the contamination. Commun. Statist. -- Sequential Analys. 1984 3 (2), 175--190. Víšek, J.Á. (1986): Sensitivity of the test error probabilities with respect to the level of contamination in general model of contaminacy. J. Statist.Planning and Inference 14,(1986), 281--299. Víšek, J.Á. (1996): Sensitivity analysis of M-estimates. Ann. Inst.Statist. Math., 48(1996), 469-495. Víšek, J.Á. (1997): Contamination level and sensitivity of robust tests. Handbook of Statist. 15, 633 – 642 (eds. G. S. Maddala & C. R.. Rao) Amsterdam: Elsevier Science B. V. Víšek, J.Á. (2002): Sensitivity analysis of M-estimates of nonlinear regression model: Influence of data subsets. Ann. Inst.Statist. Math., 54, 2, 261 - 290. If interested in, ask me for reprints.

Requirements on a robust estimator of regression coefficients ( nearly) inevitable for successfulapplications Available diagnostics, sensitivity studies and accompanying procedures as or Víšek, J.Á. (1998): Robust instruments. Proc. Robust'98 (ed. J. Antoch & G. Dohnal) Union of Czechoslovak Mathematicians and Physicists, 195 - 224. Víšek, J.Á. (2000): Robust instrumental variables and specification test. Proc. PRASTAN 2000, ISBN 80-227-1486-0, 133 - 164.. Víšek, J.Á. (1996): Selecting regression model. Probability and Mathematical Statistics 21,. 2 (2001), 467 – 492. Víšek, J.Á. (1997): Robustifying instrumental variables. Submitted to COMPSTAT 2004. If interested in, ask me for sending by e-mail.

Requirements on a robust estimator of regression coefficients ( nearly) inevitable for successfulapplications Existence of an implementation of the algorithm with acceptable complexity and reliability of evaluation Hettmansperger, T.P., S. J. Sheather (1992): A Cautionary Note on the Method of Least Median Squares. The American Statistician 46, 79-83. Engine knock data - treated by the Least Median of Squares Number of observations: 16 Response variable: Number of knocks of an engine Explanatory variables: - the timing of sparks - air / fuel ratio - intake temperature - exhaust temperature A small change (7.2%) of one value in data caused a large change of the estimates. The results were due to bad algorithm, they used. They are on the next page.

Requirements on a robust estimator of regression coefficients ( nearly) inevitable for successfulapplications Existence of an implementation of the algorithm with .... Minimized squared residual Engine knock data - results by Hettmansperger and Sheather A new algorithm, based on simplex method, was nearly immediately available, although published a bit later. Boček, P., P. Lachout (1995): Linear programming approach to LMS-estimation. Mem. vol. Comput. Statist. & Data Analysis 19 (1995), 129 - 134.. It indicates that the reliability of algorithm and its implementation is crucial.

Requirements on a robust estimator of regression coefficients ( nearly) inevitable for successfulapplications An efficient and acceptable heuristics (?) In 1989 Martin et al. studied estimators minimizing maximal bias of them Martin, R.. D., V.J. Yohai, R.H. Zamar (1989):Min-max bias robust regression. Ann Statist. 17, 1608 - 1630. - maximum was taken over some set of underlying d.f.’s and minimum over possible estimators, - it seems quit acceptable heuristics, unfortunately it does not work, - for the example of data for which the min-max-estimator failed see Víšek, J.Á. (2000): On the diversity of estimates. CSDA 34, (2000) 67 - 89. • the problem is that the method implicitly takes maximum over • “unexpected” set of d.f.’s. But papers like Hansen, L. P. (1982): Large sample properties of generalized method of moments estimators. Econometrica, 50, no 4, 1029 - 1054. hints that, in the case of sufficient “demand for data-processing”, we may “cope” without any heuristics.

The least weighted squares Víšek, J.Á. (2000): Regression withhigh breakdown point. ROBUST 2000, 324 – 356, ISBN 80-7015-792-5. If interested in, ask me for sending by e-mail. non-increasing, absolutely continuous

The least weighted squares Both, in the framework of random carriers Mašíček,, L. (2003): Diagnostika a sensitivita robustního odhadu. (Diagnostics and sensitivity of robust estimators, in Czech) Dissertation on the Faculty of Mathematics, Charles University. Mašíček, L. (2003): Consistency of the least weighted squares estimator. To appear in Kybernetika. as well as for deterministc ones Plát, P. (2003): Nejmenší vážené čtverce. (The Least Weighted Squares, in Czech.) Diploma thesis on the Faculty of Nuclear and Physical Engineering , he Czech Technical University, Prague we have consistency, asymptotic normality and Bahadur representation of the Least weighted Squares. There are also some optimality results Mašíček,, L. (2003): Optimality of the least weighted squares estimator. To appear in the Proceedings of ICORS'2003.

The least weighted squares There is also algorithm for evaluating the LEAST WEIGHTED SQUARES. It is a modification of the algorithm for the LEAST TRIMMED SQUARES which was described and tested in: Víšek, J.Á. (1996): On high breakdown point estimation. Computational Statistics (1996) 11:137-146. Víšek, J.Á. (2000): On the diversity of estimates. CSDA 34, (2000) 67 - 89. If interested in, ask me for sending a copy. Čížek, P., J. Á.Víšek (2000): The least trimmed squares. User Guide of Explore, Humboldt University. (Of course, the algorithm for LTS is available in the package EXPLORE.)

The least weighted squares - algorithm Put A Select randomly p + 1 observations and find regression plane through them. Evaluate squared residuals for all observations, order these squared residuals from the largest one to the smallest, multiply them by the weights and evaluate the sum of these products. No Is this sum of weighted squared residuals smaller than the sum from the previous step? B Yes Order observations in the same order as the squared residuals and apply the classical weighted least squares on them with weights and find new regression plane.

The least weighted squares - algorithm Continued B An arbitrary reasonable number Have we found already 20 identical models or have we exhausted a priori given number of repetitions ? Yes No Return to End of evaluation A In the case when we were able to pass all n! orders of observations ( less than 10 observations), i.e. when we were able to find the LEAST WEIGHTED SQUARES estimator precisely, the algorithm returned the same value. The algorithm is available in MATLAB.

GMM weighted estimation Unifying GMM and robust approach Brief repetition of already introduced framework An assumed “casual” model Observed data We would like to estimate consistently the model it seems that we have to believe that the disturbances are orthogonal to the model ! It is evident that it can’t be generally true we look for some instruments, being close to model, however orthogonal to disturbances !

GMM weighted estimation Unifying GMM and robust approach continued Disturbances Instruments Weight function non-increasing, absolutely continuous Ordered statistics of the squared disturbances

GMM weighted estimation Unifying GMM and robust approach continued Ranks of the squared disturbances Orthogonality conditions Kronecker product This equality defines function

GMM weighted estimation Unifying GMM and robust approach continued Residuals Ordered statistics of the squared residuals Ranks of the squared residuals Empirical counterpart to the orthogonality conditions

GMM weighted estimation Unifying GMM and robust approach continued Empirical counterpart to the orthogonality conditions and its covariance matrix

THANKS for ATTENTION

Founded 1348

Founded 1348

Presentation Transcript

FOUNDED WITH EXPERIENCE.

The Plague 1348

The Black Death, 1348

Founded in 1754

Propel founded by:

Founded by:

Founded by Brendan

Founded in 2005

Founded 1348

Founded 01.02.2009

Colony Year Founded Founder Reason Founded People/Places

The Plague in England 1348-1665

Founded 1941

Founded 1348

Black Death, 1348–1350

Founded 1348

Founded in 1970.

The Black Plague ( 1348-1350)

Founded in 1995