1 / 22

Flash Estimation and Variable Selection Techniques

Flash Estimation and Variable Selection Techniques. “Raiders of the Lost Arch Model”. Foreword: The Flash Estimation Network. In 2006, Eurostat launched a call for proposal for the production of reliable flash estimates for IPI (t+30), GDP (t+30) and LCI (t+45).

perraultj
Download Presentation

Flash Estimation and Variable Selection Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flash Estimation and Variable Selection Techniques “Raiders of the Lost Arch Model”

  2. Foreword: The Flash Estimation Network • In 2006, Eurostat launched a call for proposal for the production of reliable flash estimates for IPI (t+30), GDP (t+30) and LCI (t+45). • Joint answer from 4 NSI’s accepted by Eurostat: France, Germany, Italy and United-Kingdom • Objectives: Check the feasibility and propose methodolog(ies)y for Eurostat and Member States • A 2-year project: • First year, preparatory work: survey to check for existing experiences, databases, bibliography, methodologies; • Second year, run simulations, propose methods and estimates, prepare programs and transfer of knowledge • Here, we focus on the methodology

  3. Foreword • Lots of papers on the same subject. • What is really different? The methodology used • No formula in this presentation but quite technical: lots of statistical techniques are mixed in this strategy to look for a model « Any sufficiently advanced technology is undistinguishable from magic » Arthur C. Clarke, Profiles to the Future, 1961

  4. Outline • What is a “flash estimate”? • What is a “good” model? • The variable selection problem • Variable selection techniques • Where should I dig? A method based on HCA • An example: nowcasting the Eurozone GDP • Data and process • Comparing “usual” and HCA results • Conclusions

  5. What is a “flash estimate”? • Very difficult and sensitive question • Lots of discussions (not to say arguments) • Why? Soft and hard data • An agreement • Pure AR models (no new information) can be used as benchmarks but are not accepted for FE • Still a disagreement • Can a model be based on soft data (BCS) only? • A compromise • FE models must incorporate hard data (as much as possible). Much easier for quarterly data.

  6. What is a “good” model? • Reminder: the model will be used in production! • Main characteristics • Simple (few variables) • Interpretable • Good “statistical properties” including robustness • And last (and least) good nowcasts • Other considerations • Stay close to the user database as much as possible in order to minimize “interactions” during the production process. • Provide the user with a “built-in” but flexible methodology (and program) based on his/her tools.

  7. The variable selection problem • Example: nowcasting the EA GDP • Potential explanatory variables: • IPI, New orders, Energy prices, HICP, Unemployment, Business surveys (Industry, Retail trade, Construction, Services) etc. Easy to find at least 20 variables (monthly and/or quarterly) • 13 countries + EA, 2 lags? • It comes to 20*(13+1)*3=840 potential variables • More than 20 billions possible models with 4 explanatory variables!!!!

  8. Variable selection methods • You must (Do you?) reduce the set of explanatory variables • Trial and Error approach DOES NOT work • NIESR approach (J. Mitchell) • Drastically restrict the set (Expert knowledge) and then evaluate all possible models • GETS (General to specific) approach • Start from an overparametrized model and use statistical tests to reduce it • Dynamic Factor Analysis approach • Summarize the set of variables with uncorrelated factors and use some of them in the model • Cluster Analysis approach

  9. Step 0: Selecting a batch of explanatory variables • Here, we take all the information present in the Euro-Indicator database. • But with some a priori selection • If the Y-variable is SA, we will use SA indicators • You may (or not) exclude some variables (small countries, not reliable series etc.) • Or better, flag them using some a priori knowledge, example: • 0 : I would exclude it • 1 : I do not know, • 2 : could be interesting, • 3 : I would appreciate to see it in a model

  10. Step 1: The dependant variable • Checking the main characteristics of the dependant variable • seasonality, trading-day effect, outliers etc. • Size of the irregular, revision analysis • what can we really expect in terms of precision?

  11. The Y-variable : Euro-area GDP • Quarterly growth rate of the SA series (EA CLV) • Irregular SD: 0.142

  12. Step 2: Preparing the data • Y-variable • Checking for log-transformation • Removing outliers: they will be “re-incorporated” at the end of the flash-estimate process • Removing seasonality if present (annual growth rate) or modeling it • X-variables • Checking for log-transformation and differencing (with Tramo) • Removing seasonality, outliers and TD effects: Seasonal and TD dynamics may be different between variables • Forecasting missing months when nowcasting a quarterly Y-variable

  13. Looking for X-variables • Example • We want to flash-estimate 2007Q4 at 30 days. • Checking date 30/01/2008 • Variables from Euro-Indicators • Monthly and Quarterly variables; • 668 Monthly; 162 Quarterly (B1GM_UK) • For monthly variables, we know at least 1 or 2 months of the quarter to “nowcast” • Cleaning and “forecasting” of X-variables

  14. Step 3: First selection with Cluster analysis • Variable selection • Cluster analysis: clusters of “similar X-variables” (various clustering criteria available) • Selecting a few variables (Medoids and/or Dynamic factors) in each cluster using Hsiao-Granger causality test, coherence between growth-rates, correlation • Example: 6 clusters, 2 variables per cluster, 2 lags  36 potential variables 60,000 models with 4 explanatory variables

  15. Step 4: First selection of models • Looking for the n “best models” using usual OLS regression analysis • Several possible criteria: • R-square, Mallow’s Cp, Adjusted R-square etc. • First statistical evaluation of the models but still “no dynamics” • Possible second selection of the model using “economic” criteria or expert knowledge (which models are “relevant”?)

  16. Step 5: Statistical evaluation of the models • In-depth statistical evaluation of the model using PROC AUTOREG (autoregressive error) • Statistical criteria: • R-square, AIC, BIC, Reset tests, Chow tests, Durbin-Watson tests, Arch tests, Stationarity tests, Godfrey tests etc. • Coherence between growth rate signs • One-step and 2-step ahead forecasting error • Dynamic evaluation of the models using the real-time database

  17. Step 6: Choosing the model(s) • The choice is now based on statistical criteria and “expert knowledge”: • Does the model have good statistical properties? • Is the model simple and robust enough? • Is the model “relevant” from the Economic point of view? USUALLY NOT !!! • But …… could be improved changing a variable for another one, more “relevant” (“flagged” variables) from the same cluster.

  18. First results

  19. Improving the models • Third model: IO_PL, IP_EA13, UE_NL • IO_PL belongs to the same cluster than T_EXP_EA13 • UE_NL close to UN_FR or UN_EA13 • And we have a new model • T_EXP_EA13, IP_EA13, UN_EA13 • RMSE=0.15543

  20. Comparing with « traditional » approach • We estimate 9 different models • Model1: IP, CPI, RET, CARS • Model2: GDP-1, SentInd • Model3: qsind, qsserv • Model4: qsind, qsret, qsconst • Model5: share-1, compe12-2, spread • Model6: GDP-1, cli • Model7: IP, CPI, RET • Model8: GDP-1, cli, CARS • Model9: qsind, qsret, qsserv, qsconst

  21. Results

  22. Examples

More Related