Building Statistical Forecast Models

1 / 22

# Building Statistical Forecast Models - PowerPoint PPT Presentation

Building Statistical Forecast Models. Wes Wilson MIT Lincoln Laboratory April, 2001. Experiential Forecasting. Idea: Base Forecast on observed outcomes in previous similar situations (training data) Possible ways to evaluate and condense the training data Categorization

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Building Statistical Forecast Models' - gamba

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Building Statistical Forecast Models

Wes Wilson

MIT Lincoln Laboratory

April, 2001

Experiential Forecasting
• Idea: Base Forecast on observed outcomes in previous similar situations (training data)
• Possible ways to evaluate and condense the training data
• Categorization
• Seek comparable cases, usually expert-based
• Statistical
• Correlation and significance analysis
• Fuzzy Logic
• Combines Expert and Statistical analysis
• Belief: Incremental changes in predictors relate to incremental changes in the predictand
• Issues
• Requirements on the Training Data
• Development Methodology
• Automation
Outline
• Regression-based Models
• Predictor Selection
• Data Quality and Clustering
• Measuring Success
• An Example
Statistical Forecast Models
• Multi-Linear Regression

F = w0 + S wi Pi

wi = Predictor Weighting

w0 = Conditional Climatology  Mean Predictor Values

F = w0 + S wi fi(Pi)

fi = Structure Function, determined during regression

• PGAM: Pre-scaled Generalized Additive Models

F = w0 + S wi fi(Pi)

fi = Structure Function, determined prior to regression

• The constant term w0 is conditional climatology less the weighted mean bias of the scaled predictors
Models Based on Regression
• Training Data for one predictor
• P vector of predictor values
• E vector of observed events
• Residual
• R2 = || FP – E ||2
• Regression solutions are obtained by adjusting the parametric description of the forecast model (parameters w) until the objective function J(w) = R2 is minimized
• Multi-Linear Regression (MLR)
• J(w) = || Aw – E ||2
• MLR is solved by matrix algebra; the most stable solution is provided by the SVD decomposition of A
Regression and Correlation
• Training Data for one predictor
• P vector of predictor values
• E vector of observed events
• Error Residual: R2 = || FP – E ||2
• Correlation Coefficient r(P, E) = DP •DE / sDPsDE
• Fundamental Relationship. Let F0 be a forecast equation with error residuals E0 (||E0||=R0). Let W0 + W1 P be a BLUE correction for E0, and let F = F0 + E0 . The error residual RF of F satisfies
• RF2 = R02  [ 1 - r(P, E0)2 ]
Model Training Considerations
• Assumption: The training data are representative of what is expected during the implementation period
• Simple models are less likely to capture undesirable (non-stationary) short-term fluctuations in the training data
• The climatology of the training period should match that expected in the intended implementation period (decade scale)
• It is irrational to expect that short training periods can lead to models with long-term skill
• Plan for repeated model tuning
• Design self-tuning into the system
• It is desirable to have many more training cases than model parameters

The only way to prepare for the future is to prepare to be surprised;

that doesn’t mean we have to be flabbergasted. Kenneth Boulding

GAM
• An established statistical technique, which uses the training data to define nonlinear scaling of the predictors
• Standard implementation represents the structure functions as B-splines with many knots, which requires the use of a large set of training data
• The forecast equations are determined by linear regression including the nonlinear scaling of the predictors

F = w0 + Siwi fi(Pi)

• The objective is to minimize the error residual
• The structure functions are influence by all of the predictors, and may change if the predictor mix is altered
• If a GAM model has p predictors and k knots per structure function, then the regression model has np+1 (linear) regression parameters
PGAM: Pre-scaled GAM
• A new statistical technique, which permits the use of training sets that are decidedly smaller than those for GAM
• Once the structure functions are selected, the forecast equations are determined by linear regression of the pre-scaled predictors

F = w0 + S wi fi(Pi)

• Determination of the structure functions is based on enhancing the correlation of the (scaled) predictor with the error residual of conditional climatology
• Maximize r( fi(Pi), DE )
• The structure function is determined for each predictor separately
• Composite predictors should be scaled as composites
• The structure functions often have interpretations in terms of scientific principles and forecasting techniques
Predictors
• Every Method Involves a Choice of Predictors
• The Great Predictor Set: Everything relevant and available
• Possible Reduction based on Correlation Analysis
• Predictor Selection Strategies
• Sequential Deletion
• Ensemble Decision ( SVD )
• Changing the predictor list changes the model weights; for GAM, it also changes the structure functions
Computing Solutions for the Basic Regression Problem
• Setting: Predictor List { Pi }n and observed outcomes b over the m trials of the training set
• Basic Linear Regression Problem

A w = b

where the columns of the m by n matrix A are the lists of observed predictor values over the trials

• Normal Equations: ATA w = ATb
• Linear Algebra: w = (ATA)-1 Atb
• Optimization: Find x to minimize R2 = | Aw – b |2
SVD – Singular Value Decomposition

S

0

[ S | 0 ] T =

• A = U S VT where U and V are orthogonal matrices
• and S = [ S | 0 ]T where S is diagonal with positive diagonal entries
• UT A w = S VT w = UT b
• Set w = VTw, b = [UTb]n
• Restatement of the Basic Problem
• S VT w = b or S w = b
• (original problem space) (VT-transformed problem space)
• Since U is orthogonal, the error residual is not altered by this restatement of the problem

CAUTION: Analysis of Residuals can be misleading unless the

dynamic ranges of the predictor values have been standardized

Structure of the Error Residual Vector

0

0

• Truncated Problem: For i > k , . set wi = 0. This increases the . error residual to
• Rk2 = Sk+1mbi2= R*2+ Sk+1nbi2
• si’s are usually decreasing
• sn > 0, or reduce predictor list
• For i < n, wi = bi / si
• For i > n, there is no solution. This is the portion of the problem that is not resolved by these predictors
• Magnitude of the unresolved portion of the problem: .R*2 = Sn+1mbi2

Sw = b

s1

s2

s3

*

sn

w1

w2

w3

*

wn

b1

b2

b3

*

bn

bn+1

*

*

*

*

bm

=

Controlling Predictor Selection
• SVD / PC analysis provides guidance
• Truncation in w space reduces the degrees of freedom
• Truncation does not provide nulling of predictors: . since 0 components of w. do not lead to 0 components of w = V w
• Seek a linear forecast model of the form
• F( a ) = aT w = S wi ai , a is a vector of predictor values
• Predictor Nulling:
• The ith predictor is eliminated from the problem if wi = 0
• Benefits of predictor nulling
• Provides simple models
• Eliminate designated predictors (missing data problem)
• Quantifies the incremental benefit provided by essential predictors (sensor benefit problem)
Predictor Selection Process
• Gross Predictor Selection (availability & correlation)
• SVD for problem sizing an gross error estimation
• Truncation and Predictor Nulling maximal model(s)

( there may be more than one good solution)

• Successive Elimination in the Original Problem Space

 minimal model (until SD starts to grow rapidly)

• Successive Augmentation in the Original Problem Space
• At this point, the good solutions are bracketed between the maximal and the minimal models; exhaustive searches are probably feasible, cross validation is wise.
Creating 15z Satellite Forecast Models (1)
• 149 marine stratus days from 1996 to 2000
• 51 sectors and 3 potential predictors per sector (153)
• Compute the correlation for each predictor with the residual from conditional climatology
• Retain only predictors, which have correlation greater than .25, reduces the predictor list to 45 predictors
• Separate analysis for two data sets, Raw and PGAM
• Truncate each when SD reduction drops below 1.5 %

RAW:

PGAM:

Raw Data

SVD Raw 6

PGAM Data

SVD PGAM 6

Creating 15z Satellite Forecast Models (2)

Sigma PC 6

1.134

Sigma PC 6

0.999

Sigma

1.148

Sigma

0.999

• SVD Truncate 6  Pred.Nulling
• In the Truncation space:

Null to 7 predictors with acceptable error growth

• Maximal Problems (R-8,P-7)
• Minimal Problems (R-5,P-4)
• Neither problem would accept augmentation according to the strict cross-validation test
• Different predictors were selected
Data Quality and Clustering
• DQA is similar to NWP
• need to do the training set
• probably need to work to tighter standards
• Data Clustering
• During training - manual ++
• For implementation - fully automated
• Conditional Climatology based on Clustering
Satellite Statistical Model (MIT/LL)
• 1-km visible channel (brightness)
• Data pre-processing
• re-mapping to 2 km grid
• 3x3 median smoother
• normalized for sun angle
• calibrated for lens graying
• Grid points grouped into sectors
• topography
• physical forcing
• operational areas
• Sector statistics
• Brightness
• Coverage
• Texture
• 4 year data archive, 153 predictors
• PGAM Regression Analysis

SECTORIZATION

Consensus Forecast

Day Characterization

- Wind direction

- Inversion height

- Forcing influences

COBEL

Forecast Weighting Function

Local SFM

Consensus

Forecast

Regional SFM

Satellite SFM

Conclusions
• PGAM, SVD/PC, and Predictor Nulling provides a systematic way to approach the development of Linear Forecast models via Regression
• This methodology provides a way to investigate the elimination of specific predictors, which could be useful in the development of contingency models
• We are investigating full automation