Statistical Forecasting [Part 1]. 69EG3137 – Impacts & Models of Climate Change. Details for Today: DATE: 25 th November 2004 BY: Mark Cresswell FOLLOWED BY: Literature Tutorial. Lecture Topics. What is statistical forecasting? Simple linear regression Multiple linear regression
69EG3137 – Impacts & Models of Climate Change
Details for Today:
DATE: 25th November 2004
BY: Mark Cresswell
FOLLOWED BY: Literature Tutorial
In nature, observed phenomena are intrinsically linked to each other by physical processes. Such processes are referred to as the causality or causal links.
In statistical forecasting, we can exploit this causality mathematically by replicating the pattern of change observed for a particular set of conditions. The physical processes represent the forcing and the observed pattern (of weather!) is the direct result.
Different sets of conditions (forcing) will give rise to replicable and specific patterns of weather
Example: Unusually warm sea-surface temperature (SST) conditions in the Indian Ocean is usually associated with a greater than normal frequency (and magnitude) of tropical cyclones. The forcing here is the increased SSTs whilst the observed pattern is enhanced cyclogenesis.
Model: Since we know there is a causal link (enhanced energy flux, more evaporation over the ocean, greater convection etc) we can determine a statistical relationship from historical observations
Model: The previous example illustrated how a specific forcing can be seen to alter future weather conditions. We can summarise this relationship mathematically in a regression equation
This type of
Is known as
Model: The regression model informs us of the dependence one variable has on another. Usually, we will select variables that are correlated with one another
We must be careful however when using relationships based purely on a correlation as association is not causation
In children, shoe size may be strongly correlated with reading skills. This does not mean that children who learn to read new words sprout longer feet !
The simple calculation of one variable from another based on a regression equation is known as the method of least squares. Normally we can insert a line of best fit through a scatter-plot of X and Y data pairs. The line that makes the smallest r.m.s (root mean square) error in predicting Y from X is the regression line
The regression line is often referred to as the least squares line. We can use the slope and intercept characteristics of a least squares line to derive constants, m (slope) and b (intercept) that can be used in our linear regression model equation:
The intercept is the height of the least squares line when X is zero. The slope is the rate at which Y increases per unit increase in X
Thus, for any given value of our X variable (and values for m and b which we calculate from observations) we can estimate a value for Y
Often, we will not look at the contribution of a single variable in isolation – but instead a number of predictors will be included in a forecast
Different predictors may be causally related to the same weather phenomena.
If K is the number of predictor variables then:
ŷ = b0 + b1x1 + b2x2 + ····· + bkxk
Linear regression models provide a “fit” for our estimate of y for a number of observations of x
A straight-line fit (simple linear regression) will not go through all points…but a multivariate regression line will be curved thus allowing a better fit and a more accurate estimate of y for a given value of x
Multivariate models are often used on weather prediction to estimate future change based on historical observations of a trend.
Not all objective statistical forecast procedures are based on regression
Some methods were in use prior to the advent of fairly accurate NWP forecasts (12-48hr range). One such method is analog forecasting
Analog forecasting is still in use for long-range (seasonal) forecasting – although the climatological database it uses is deemed to be too short for AF forecasts to be competitive in ordinary short-range weather forecasting
The idea underlying AF is to search the archives of climatological synoptic data for maps closely resembling current observations, and assume that the future evolution of the atmosphere will be similar to the flows that followed the historical analogs
The method is intuitive and gains from the value provided from experienced weather forecasters
AF is limited however as the atmosphere apparently does not exactly repeat itself – so matches can only be approximate
The MOS approach is a preferred method of incorporating NWP forecast information into statistical weather forecasts
The MOS approach has the capacity to include directly into the regression equations the influences of specific characteristics of different NWP models at different time projections in the future
To develop MOS forecast equations it is necessary to have a developmental data set composed of historical records of the predictand, together with archived records of the forecasts produced by the NWP model for the same days on which the predictand was observed
Sometimes we might need to compare sets of variables against patterns of change – and synthesise them
It might be the case that environmental change (shifts in weather patterns) are due to more than one variable. In order to determine the spatial limits of their influence (in a geographical sense) we can use a spatially dependent correlation scheme – called Principal Components Analysis (PCA). The technique allows data reduction
PCA as a technique, became popular following papers by Lorenz in the mid 1950s – who called the technique Empirical Orthogonal Function (EOF) analysis. Both names refer to the same set of procedures
The purpose of PCA is to reduce a data set containing a large number of variables to a new data set containing far fewer new variables – but which nevertheless represent a large fraction of the variability contained in the original data
Following PCA analysis the method provides a number of principal components – which constitute a compact representation of the original data.
PCA can yield substantial insights into both the spatial and temporal variations exhibited by the field or fields being analysed
CCA is a statistical technique that identifies a sequence of pairs of patterns in two multivariate data sets – and constructs sets of transformed variables by projecting the original data onto these patterns
The patterns are chosen such that the new variables defined by projection of the two data sets onto these patterns exhibit maximum correlation
CCA is an extension of multiple regression models. It is often applied to fields – such as SST or heights of pressure.