- 162 Views
- Updated On :
- Presentation posted in: General

Building Statistical Forecast Models. Wes Wilson MIT Lincoln Laboratory April, 2001. Experiential Forecasting. Idea: Base Forecast on observed outcomes in previous similar situations (training data) Possible ways to evaluate and condense the training data Categorization

Building Statistical Forecast Models

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Building Statistical Forecast Models

Wes Wilson

MIT Lincoln Laboratory

April, 2001

- Idea: Base Forecast on observed outcomes in previous similar situations (training data)
- Possible ways to evaluate and condense the training data
- Categorization
- Seek comparable cases, usually expert-based

- Statistical
- Correlation and significance analysis

- Fuzzy Logic
- Combines Expert and Statistical analysis

- Categorization
- Belief: Incremental changes in predictors relate to incremental changes in the predictand
- Issues
- Requirements on the Training Data
- Development Methodology
- Automation

- Regression-based Models
- Predictor Selection
- Data Quality and Clustering
- Measuring Success
- An Example

- Multi-Linear Regression
F = w0 + S wi Pi

wi = Predictor Weighting

w0 = Conditional Climatology Mean Predictor Values

- GAM: Generalized Additive Models
F = w0 + S wi fi(Pi)

fi = Structure Function, determined during regression

- PGAM: Pre-scaled Generalized Additive Models
F = w0 + S wi fi(Pi)

fi = Structure Function, determined prior to regression

- The constant term w0 is conditional climatology less the weighted mean bias of the scaled predictors

- Training Data for one predictor
- P vector of predictor values
- E vector of observed events

- Residual
- R2 = || FP – E ||2

- J(w) = || Aw – E ||2

- Training Data for one predictor
- P vector of predictor values
- E vector of observed events
- Error Residual: R2 = || FP – E ||2

- Correlation Coefficient r(P, E) = DP •DE / sDPsDE
- Fundamental Relationship. Let F0 be a forecast equation with error residuals E0 (||E0||=R0). Let W0 + W1 P be a BLUE correction for E0, and let F = F0 + E0 . The error residual RF of F satisfies
- RF2 = R02 [ 1 - r(P, E0)2 ]

- Assumption: The training data are representative of what is expected during the implementation period
- Simple models are less likely to capture undesirable (non-stationary) short-term fluctuations in the training data
- The climatology of the training period should match that expected in the intended implementation period (decade scale)
- It is irrational to expect that short training periods can lead to models with long-term skill
- Plan for repeated model tuning
- Design self-tuning into the system

- It is desirable to have many more training cases than model parameters

The only way to prepare for the future is to prepare to be surprised;

that doesn’t mean we have to be flabbergasted. Kenneth Boulding

- An established statistical technique, which uses the training data to define nonlinear scaling of the predictors
- Standard implementation represents the structure functions as B-splines with many knots, which requires the use of a large set of training data
- The forecast equations are determined by linear regression including the nonlinear scaling of the predictors
F = w0 + Siwi fi(Pi)

- The objective is to minimize the error residual
- The structure functions are influence by all of the predictors, and may change if the predictor mix is altered
- If a GAM model has p predictors and k knots per structure function, then the regression model has np+1 (linear) regression parameters

- A new statistical technique, which permits the use of training sets that are decidedly smaller than those for GAM
- Once the structure functions are selected, the forecast equations are determined by linear regression of the pre-scaled predictors
F = w0 + S wi fi(Pi)

- Determination of the structure functions is based on enhancing the correlation of the (scaled) predictor with the error residual of conditional climatology
- Maximize r( fi(Pi), DE )

- Every Method Involves a Choice of Predictors
- The Great Predictor Set: Everything relevant and available
- Possible Reduction based on Correlation Analysis
- Predictor Selection Strategies
- Sequential Addition
- Sequential Deletion
- Ensemble Decision ( SVD )

- Changing the predictor list changes the model weights; for GAM, it also changes the structure functions

- Setting: Predictor List { Pi }n and observed outcomes b over the m trials of the training set
- Basic Linear Regression Problem
A w = b

where the columns of the m by n matrix A are the lists of observed predictor values over the trials

- Normal Equations: ATA w = ATb
- Linear Algebra: w = (ATA)-1 Atb
- Optimization: Find x to minimize R2 = | Aw – b |2

S

0

[ S | 0 ] T =

- A = U S VT where U and V are orthogonal matrices
- and S = [ S | 0 ]T where S is diagonal with positive diagonal entries

- S VT w = b or S w = b

CAUTION: Analysis of Residuals can be misleading unless the

dynamic ranges of the predictor values have been standardized

0

0

- Truncated Problem: For i > k , . set wi = 0. This increases the . error residual to
- Rk2 = Sk+1mbi2= R*2+ Sk+1nbi2

- si’s are usually decreasing
- sn > 0, or reduce predictor list
- For i < n, wi = bi / si
- For i > n, there is no solution. This is the portion of the problem that is not resolved by these predictors
- Magnitude of the unresolved portion of the problem: .R*2 = Sn+1mbi2

Sw = b

s1

s2

s3

*

sn

w1

w2

w3

*

wn

b1

b2

b3

*

bn

bn+1

*

*

*

*

bm

=

- SVD / PC analysis provides guidance
- Truncation in w space reduces the degrees of freedom
- Truncation does not provide nulling of predictors: . since 0 components of w. do not lead to 0 components of w = V w
- Seek a linear forecast model of the form
- F( a ) = aT w = S wi ai , a is a vector of predictor values

- The ith predictor is eliminated from the problem if wi = 0

- Provides simple models
- Eliminate designated predictors (missing data problem)
- Quantifies the incremental benefit provided by essential predictors (sensor benefit problem)

- Gross Predictor Selection (availability & correlation)
- SVD for problem sizing an gross error estimation
- Truncation and Predictor Nulling maximal model(s)
( there may be more than one good solution)

- Successive Elimination in the Original Problem Space
minimal model (until SD starts to grow rapidly)

- Successive Augmentation in the Original Problem Space
- At this point, the good solutions are bracketed between the maximal and the minimal models; exhaustive searches are probably feasible, cross validation is wise.

- 149 marine stratus days from 1996 to 2000
- 51 sectors and 3 potential predictors per sector (153)
- Compute the correlation for each predictor with the residual from conditional climatology
- Retain only predictors, which have correlation greater than .25, reduces the predictor list to 45 predictors
- Separate analysis for two data sets, Raw and PGAM
- Truncate each when SD reduction drops below 1.5 %

RAW:

PGAM:

Raw Data

SVD Raw 6

PGAM Data

SVD PGAM 6

Sigma PC 6

1.134

Sigma PC 6

0.999

Sigma

1.148

Sigma

0.999

- SVD Truncate 6 Pred.Nulling
- In the Truncation space:
Null to 7 predictors with acceptable error growth

- Maximal Problems (R-8,P-7)
- Minimal Problems (R-5,P-4)
- Neither problem would accept augmentation according to the strict cross-validation test
- Different predictors were selected

- DQA is similar to NWP
- need to do the training set
- probably need to work to tighter standards

- Data Clustering
- During training - manual ++
- For implementation - fully automated

- Conditional Climatology based on Clustering

- 1-km visible channel (brightness)
- Data pre-processing
- re-mapping to 2 km grid
- 3x3 median smoother
- normalized for sun angle
- calibrated for lens graying

- Grid points grouped into sectors
- topography
- physical forcing
- operational areas

- Sector statistics
- Brightness
- Coverage
- Texture

- 4 year data archive, 153 predictors
- PGAM Regression Analysis

SECTORIZATION

Day Characterization

- Wind direction

- Inversion height

- Forcing influences

COBEL

Forecast Weighting Function

Local SFM

Consensus

Forecast

Regional SFM

Satellite SFM

- PGAM, SVD/PC, and Predictor Nulling provides a systematic way to approach the development of Linear Forecast models via Regression
- This methodology provides a way to investigate the elimination of specific predictors, which could be useful in the development of contingency models
- We are investigating full automation