regression analysis
Download
Skip this Video
Download Presentation
Regression analysis

Loading in 2 Seconds...

play fullscreen
1 / 33

Regression analysis - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Regression analysis. Relating two data matrices/tables to each other. Purpose: prediction and interpretation. Y-data. X-data. Typical examples. Spectroscopy: Predict chemistry from spectral measurements Product development: Relating sensory to chemistry data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Regression analysis' - anatole


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
regression analysis
Regression analysis

Relating two data matrices/tables to each other

Purpose: prediction and interpretation

Y-data

X-data

typical examples
Typical examples
  • Spectroscopy: Predict chemistry from spectral measurements
  • Product development: Relating sensory to chemistry data
  • Marketing: Relating sensory data to consumer preferences
topics covered
Topics covered
  • Simple linear regression
  • The selectivity problem: a reason why multivariate methods are needed
  • The collinearity problem: a reason why data compression is needed
  • The outlier problem: why and how to detect
simple linear regression
Simple linear regression
  • One y and one x. Use x to predict y.
  • Use a linear model/equation and fit it by least squares
data structure
Data structure

X-variable

Y-variable

2

4

1

.

.

.

7

6

8

.

.

.

Objects, same number

in x and y-column

slide6

Least squares (LS) used

for estimation of regression coefficients

y

y=b0+b1x+e

b1

b0

x

Simple linear regression

slide7

Model

Regression analysis

Data (X,Y)

Future X

Prediction

Regression analysis

Interpretation

Outliers?

Pre-processing

slide8

The selectivity problem

A reason why multivariate methods are needed

multiple linear regression
Multiple linear regression
  • Provides
    • predicted values
    • regression coefficients
    • diagnostics
  • If there are many highly collinear variables
    • unstable regression equations
    • difficult to interpret coefficients: many and unstable
slide11

Collinearity, the problem of correlated X-variable

y=b0+b1x1+b2x2+e

Regression in this case is fitting a

plane to the data (open circles)

The two x’s have high correlation

Leads to unstable equation/plane

(in the direction with little variability)

possible solutions
Possible solutions
  • Select the most important wavelengths/variables (stepwise methods)
  • Compress the variables to the most dominating dimensions (PCR, PLS)
  • We will concentrate on the latter (can be combined)
data compression
Data compression
  • We will first discuss the situation with one y-variable
  • Focus on ideas and principles
  • Provides regression equation (as above) and plots for interpretation
slide14

Model for data compression methods

X=TPT+E

Centred X and y

y=Tq+f

T-scores, carrier of information from X to y

P,q –loadings

E,f – residuals (noise)

slide15

x3

PCA

to compress data

x2

ti

x1

y

q

t-score

Regression by data compression

PC1

Regression on scores

slide16

x1

x2

MLR

y

x3

x4

x1

t1

x2

PCR

y

t2

x3

x4

x1

t1

y

x2

PLS

x3

t2

x4

pcr and pls
PCR and PLS

For each factor/component

  • PCR
    • Maximize variance of linear combinations of X
  • PLS
    • Maximize covariance between linear combinations of X and y

Each factor is subtracted before the next is computed

principal component regression pcr
Principal component regression (PCR)
  • Uses principal components
  • Solves the collinearity problem, stable solutions
  • Provides plots for interpretation (scores and loadings)
  • Well understood
  • Outlier diagnostics
  • Easy to modify
  • But uses only X to determine components
pls regression
PLS-regression
  • Easy to compute
  • Stable solutions
  • Provides scores and loadings
  • Often less number of componentsthan PCR
  • Sometimes better predictions
pcr and pls for several y variables
PCR and PLS for several Y-variables
  • PCR is computed for each Y. Each Y is regressed onto the principal components
  • PLS: The algorithm is easily modified. Maximises linear combinations of X and Y.
  • For both methods: Regression equations and plots
validation is important
Validation is important
  • Measure quality of the predictor
  • Determine A – number of components
  • Compare methods
slide23

Prediction testing

Calibration

Estimate coefficients

Testing/validation

Predict y, use the

coefficients

cross validation

Calibrate, find y=f(x)

estimate coefficients

Predict y, use the coefficients

Cross-validation
validation
Validation
  • Compute
  • Plot RMSEP versus component
      • Choose the number of components with best RMSEP properties
  • Compare for different methods
slide26

RMSEP

MLR

NIR calibration of protein in wheat. 6 NIR wavelengths

12 calibration samples, 26 test samples

slide27

Estimation error

Model error

Conceptual illustration of important phenomena

prediction vs cross validation
Prediction vs. cross-validation
  • Prediction testing: Prediction ability of the predictor at hand. Requires much data.
  • Cross-validation: Property of the method. Better for smaller data set.
validation1
Validation
  • One should also plot measured versus predicted y-value
  • Correlation can be computed, but can sometimes be misleading
slide30

Example, plot of y versus predicted y

Plot of measured and predicted protein

NIR calibration

outlier detection
Outlier detection
  • Instrument error or noise
  • Drift of signal (over time)
  • Misprints
  • Samples outside normal range (different population)
outlier detection1
Outlier detection
  • Outliers can be detected because
    • Model for spectral data (X=TPT+E)
    • Model for relationship between X and y (y=Tq+f)
outlier detection tools
Outlier detectiontools
  • Residuals
    • X and y-residuals
    • X-residuals as before, y-residual is difference between measured and predicted y
  • Leverage
    • hi
ad