Linear regression and correlation
Download
1 / 60

Linear Regression and Correlation - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

Linear Regression and Correlation. Fitted Regression Line. Equation of the Regression Line. Least squares regression line of Y on X. Regression Calculations. Plotting the regression line. Residuals. Using the fitted line, it is possible to obtain an estimate of the y coordinate.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Linear Regression and Correlation' - ananda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Equation of the regression line
Equation of the Regression Line

Least squares regression line of Y on X




Residuals
Residuals

  • Using the fitted line, it is possible to obtain an estimate of the y coordinate

  • The “errror” in the fit we term the “residual error”





Other ways to evaluate residuals
Other ways to evaluate residuals

  • Lag plots, plot residuals vs. time delay of residuals…looks for temporal structure.

  • Look for skew in residuals

  • Kurtosis in residuals – error not distributed “normally”.


Model Residuals: constrained

Model Residuals: freely moving

Pairwise model

Pairwise model

Independent model

Independent model

-0.3

0

0.3

-0.3

0

0.3

Pairwise model

Pairwise model

Independent model

Independent model


Parametric interpretation of regression linear models
Parametric Interpretation of regression: linear models

  • Conditional Populations and Conditional Distributions

    • A conditional population of Y values associated with a fixed, or given, value of X.

    • A conditional distribution is the distribution of values within the conditional population above

Population mean Y value for a given X

Population SD of Y value for a given X


The linear model
The linear model

  • Assumptions:

    • Linearity

    • Constant standard deviation


Statistical inference concerning

estimates

estimates

estimates

Statistical inference concerning

  • You can make statistical inference on model parameters themselves


Standard error of slope
Standard error of slope

  • 95% Confidence interval for

where


Hypothesis testing is the slope significantly different from zero
Hypothesis testing: is the slope significantly different from zero?

= 0

Using the test statistic:

df=n-2


Coefficient of determination
Coefficient of Determination from zero?

  • r2, or Coefficient of determination: how much of the variance in data is accounted for by the linear model.



Correlation coefficient
Correlation Coefficient from zero?

  • R is symmetrical under exchange of x and y.


What’s this? from zero?

It adjusts R to compensate for the fact

That adding even uncorrelated variables to

the regression improves R


Statistical inference on correlations
Statistical inference on correlations from zero?

  • Like the slope, one can define a t-statistic for correlation coefficients:



STA example from zero?

  • R2=0.25.

  • Is this correlation significant?

  • N=446, t = 0.25*(sqrt(445/(1-0.25^2))) = 5.45


When is linear regression inadequate
When is Linear Regression Inadequate? from zero?

  • Curvilinearity

  • Outliers

  • Influential points


Curvilinearity
Curvilinearity from zero?


Outliers
Outliers from zero?

  • Can reduce correlations and unduly influence the regression line

  • You can “throw out” some clear outliers

  • A variety of tests to use. Example? Grubb’s test

  • Look up critical Z value in a table

  • Is your z value larger?

  • Difference is significant and data can be discarded.


Influential points
Influential points from zero?

  • Points that have a lot of influence on regressed model

  • Not really an outlier, as residual is small.


Conditions for inference
Conditions for inference from zero?

  • Design conditions

    • Random subsampling model: for each x observed, y is viewed as randomly chosen from distribution of Y values for that X

    • Bivariate random sampling: each observed (x,y) pair must be independent of the others. Experimental structure must not include pairing, blocking, or an internal hierarchy.

  • Conditions on parameters

is not a function of X

  • Conditions concerning population distributions

    • Same SD for all levels of X

    • Independent Observatinos

    • Normal distribution of Y for each fixed X

    • Random Samples



Manova and ancova

MANOVA and ANCOVA from zero?


Manova
MANOVA from zero?

  • Multiple Analysis of Variance

  • Developed as a theoretical construct by S.S. Wilks in 1932

  • Key to assessing differences in groups across multiple metric dependent variables, based on a set of categorical (non-metric) variables acting as independent variables.


Manova vs anova
MANOVA vs ANOVA from zero?

  • ANOVA

    Y1 = X1 + X2 + X3 +...+ Xn

    (metric DV) (non-metric IV’s)

  • MANOVA

    Y1 + Y2 + ... + Yn = X1 + X2 + X3 +...+ Xn

    (metric DV’s) (non-metric IV’s)


Anova refresher
ANOVA Refresher from zero?

Reject the null hypothesis if test statistic is greater than critical F value with k-1

Numerator and N-k denominator degrees of freedom. If you reject the null,

At least one of the means in the groups are different


Manova guidelines
MANOVA Guidelines from zero?

  • Assumptions the same as ANOVA

  • Additional condition of multivariate normality

    • all variables and all combinations of the variables are normally distributed

  • Assumes equal covariance matrices (standard deviations between variables should be similar)


Example
Example from zero?

  • The first group receives technical dietary information interactively from an on-line website. Group 2 receives the same information in from a nurse practitioner, while group 3 receives the information from a video tape made by the same nurse practitioner.

  • User rates based on usefulness, difficulty and importance of instruction

  • Note: three indexing independent variables and three metric dependent variables


Hypotheses
Hypotheses from zero?

  • H0: There is no difference between treatment group (online learners) from oral learners and visual learners.

  • HA: There is a difference.



Manova output 2
MANOVA Output 2 from zero?

Individual ANOVAs not significant


Manova output
MANOVA output from zero?

Overall multivariate effect is signficant




Once more with feeling ancova
Once more, with feeling: ANCOVA from zero?

  • Analysis of covariance

  • Hybrid of regression analysis and ANOVA style methods

  • Suppose you have pre-existing effect differences between subjects

  • Suppose two experimental conditions, A and B, you could test half your subjects with AB (A then B) and the other half BA using a repeated measures design


Why use
Why use? from zero?

  • Suppose there exists a particular variable that *explains* some of what’s going on in the dependent variable in an ANOVA style experiment.

  • Removing the effects of that variable can help you determine if categorical difference is “real” or simply depends on this variable.

  • In a repeated measures design, suppose the following situation: sequencing effects, where performing A first impacts outcomes in B.

    • Example: A and B represent different learning methodologies.

  • ANCOVA can compensate for systematic biases among samples (if sorting produces unintentional correlations in the data).


Example1
Example from zero?


Results
Results from zero?


Second example
Second Example from zero?

  • How does the amount spent on groceries, and the amount one intends to spend depend on a subjects sex?

  • H0: no dependence

  • Two analyses:

    • MANOVA to look at the dependence

    • ANCOVA to determine if the root of there is significant covariance between intended spending and actual spending


MANOVA from zero?


Results1
Results from zero?


Ancova
ANCOVA from zero?


Ancova results
ANCOVA Results from zero?

So if you remove the amount the subjects intend to spend from the equation,

No significant difference between spending. Spending difference not a result

Of “impulse buys”, it seems.


Principal component analysis
Principal Component Analysis from zero?

  • Say you have time series data, characterized by multiple channels or trials. Are there a set of factors underlying the data that explain it (is there a simpler exlplanation for observed behavior)?

  • In other words, can you infer the quantities that are supplying variance to the observed data, rather than testing *whether* known factors supply the variance.




Note how a single component explains series as a spatial vector) to a “position” in the abstract space that minimizes covariance.

almost all of the variance in the 8 EMGs

Recorded.

Next step would be to correlate

these components with some

other parameter in the experiment.


Largest PC series as a spatial vector) to a “position” in the abstract space that minimizes covariance.

Neural firing rates


  • Some additional uses: series as a spatial vector) to a “position” in the abstract space that minimizes covariance.

    • Say you have a very large data set, but believe there are some common features uniting that data set

    • Use a PCA type analysis to identify those common features.

    • Retain only the most important components to describe “reduced” data set.


ad