Understanding the General Linear Model

Regression Understanding the General Linear Model

Relations among variables • A goal of science is prediction and explanation of phenomena • In order to do so we must find events that are related in some way such that knowledge about one will lead to knowledge about the other • In psychology we seek to understand the relationship among variables that are indicators of an innumerable amount of information about human nature in order better understand ourselves and why we do the things we do

Correlation • While we could just use our N of 1 personal experience to try and understand human behavior, a scientific (and better) means of understanding the relationship between variables is by means of assessing correlation • Two variables take on different values, but if they are related in some fashion they will covary • They may do so in a way in which their values tend to move in the same direction, or they may tend to move in opposite directions • The underlying statistic assessing this is covariance, which is at the heart of every statistical procedure you are likely to use inferentially

Covariance and Correlation • Covariance as a statistical construct is unbounded and thus difficult to interpret in its raw form • Correlation (Pearson’s r) is a measure of the direction and degree of a linear association between two variables • Correlation is the standardized covariance between two variables

Regression • Regression allows us to use the information about covariance to make predictions • Given a particular value of x, we can predict y with some level of accuracy • The basic model is that of a straight line (the general linear model) • Only one possible straight line can be drawn once the slope and Y intercept are specified • The formula for a straight line is: • Y = bx + a • Y = the calculated value for the variable on the vertical axis • a = the intercept • b = the slope of the line • X = a value for the variable on the horizontal axis • Once this line is specified, we can calculate the corresponding value of Y for any value of X entered • In more general terms Y = Xb + e, where these elements represent vectors and/or matrices (of the outcome, data, coefficients and error respectively), is the general linear model to which most of the techniques in psychological research adhere to

The Line of Best Fit • Real data do not conform perfectly to a straight line • The best fit straight line is that which minimizes the amount of variation in data points from the line • The common, but by no means the only or only acceptable method attempts to derive a least squares regression line which minimizes the squared deviations from it • The equation for this line can be used to predict or estimate an individual’s score on Y on the basis of his or her score on X

Variable X A Criterion Variable Y B C Variable Z Least Squares Modeling • When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models • The intercept and weight values are called the parameters of the model • While typical regression analysis by itself does not determine causal relations, the assumption indicated by such a model is that the variable on the left-hand side of the previous equation is being caused by the variable(s) on the right side • The arrows explicitly go from the predictors to the outcome, not vice versa*

Parameter Estimation • The process of obtaining the correct parameter values (assuming we are working with the right model) is called parameter estimation • Often, theories specify the form of the relationship rather than the specific values of the parameters • The parameters themselves, assuming the basic model is correct, are typically estimated from data. • We refer to the estimation processes as “calibrating the model” • A method is required for choosing parameter values that will give us the best representation of the data possible • In estimating the parameters of our model, we are trying to find a set of parameters that minimizes the error variance. • With least-squares estimation, we want to be as small as it possibly can be.

Estimates of the constant (a) and coefficient (b)in the simple setting • Estimating the Slope (the regression coefficient) requires first estimating the covariance • Estimating the Y intercept • where and are the means based on the sets of the Y and X values respectively, and b is the estimated slope • These calculations ensure that the regression line passes through the point on the scatterplot defined by the two means

In terms of the Pearson r

What can the model explain? Variance Components • Total variability in the dependent variable (observed – mean) comes from two sources • Variability predicted by the model i.e. what variability in the dependent variable is due to the independent variable • How far off our predicted values are from the mean of Y • Error or residual variability i.e. variability not explained by the independent variable • The difference between the predicted values and the observed values S2y S2 S2(yi - i) Total variance = predicted variance + error variance

We can also show this graphically using a Venn diagram Showing r2 as the proportion of variability shared by two variables (X and Y) The larger the area of overlap, the greater the strength of the association between the two variables R-squared - the coefficient of determination • The square of the correlation, r², is the fraction of the variation in the values of y that is explained by the regression of y on x • R² = variance of predicted values y divided by the variance of observed values y

Predicted variance and r2

How good a fit does our line represent? The error associated with a prediction (of a Y value from a known X value) is a function of the deviations of Y about the predicted point The standard error of estimateprovides an assessment of accuracy of prediction the standard deviation of Y predicted from X In terms of R2, we can see that the more variance we account for the smaller our standard error of estimate will be The Accuracy of Prediction

Interpreting regression: Summary of the basics • Intercept • Value of Y if X is 0 • Often not meaningful, particularly if it’s practically impossible to have an X of 0 (e.g. weight) • Slope • Amount of change in Y seen with 1 unit change in X • Standardized regression coefficient • Amount of change in Y seen in standard deviation units with 1 standard deviation unit change in X • In simple regression it is equivalent to the r for the two variables • Standard error of estimate • Gives a measure of the accuracy of prediction • R2 • Proportion of variance explained by the model

The General Linear Model with Categorical Predictors Extension

Extension • Regression can actually handle different types of predictors, and in the social sciences we are often interested in differences between groups • For now we will concern ourselves with the two independent groups case • E.g. gender, republican vs. democrat etc.

Dummy coding • There are different ways to code categorical data for regression, and in general, to represent a categorical variable you need k-1* coded variables • k = number of categories/groups • Dummy coding involves using zeros and ones to identify group membership, and since we only have two groups, one group will be zero (the reference group) and the other 1 • We will revisit coding with k > 2 after we’ve discussed multiple regression

Dummy coding • Example • The thing to note at this point is that we have a simple bivariate correlation/simple regression setting • The correlation between group and the DV is .76 • This is sometimes referred to as the point biserial correlation (rpb) because of the categorical variable • However, don’t be fooled, it is calculated exactly the same way as before i.e. you treat that 0,1 grouping variable like any other in calculating the correlation coefficient Group DV 0 3 0 5 0 7 0 2 0 3 1 6 1 7 1 7 1 8 1 9

Example • Graphical display • The R-square is .762 = .577 • The regression equation is

Example • Look closely at the descriptive output compared to the coefficients. • What do you see?

The constant • Note again our regression equation • Recall the definition for the slope and constant • First the constant, what does “when X = 0” mean here in this setting? • It means when we are in the 0 group • What is that value? • Y = 4, which is that group’s mean • The constant here is thus the reference group’s mean

The coefficient • Now think about the slope • What does a ‘1 unit change in X’ mean in this setting? • It means we go from one group to the other • Based on that coefficient, what does the slope represent in this case (i.e. can you derive that coefficient from the descriptive stats in some way?) • The coefficient is the difference between means

The regression line • The regression line covers the values represented • i.e. 0, 1, for the two groups • It passes through each of their means • Using least squares regression the regression line always passes through the mean of X and Y • The constant (if we are using dummy coding) is the mean for the zero (reference) group • The coefficient is the difference between means

More to consider • Analysis of variance • Recall that in regression we are trying to account for the variance in the DV • That total variance reflects the sum of the squared deviations of values from the DV mean • Sums of squares • That breaks down into: • Variance we account for • Sums of squares predicted or model or regression • And that which we do not account for • Sums of squares ‘error’ (observed – predicted)

Variance accounted for • What are our predicted values in this case? • We only have 2 values of X to plug in • We already know what Y is if X is zero, and so we’d predict the group mean of 4 for all zero values • The only other value to plug in is 1 for the rest of the cases • In other words for those in the 1 group, we’re predicting their respective mean

Variance accounted for • So in order to get our model summary and F-statistic, we need: • Total variance • Predicted variance • Predicted value minus grand mean of the DV just like it has always been • Note again how our average predicted value is our group average for the DV • Error variance • Essentially each person’s score minus group mean

Variance accounted for • Predicted SS = 5[(4-5.7)2 + (7.4-5.7)2] • 28.9 • Error SS = (3-4)2 + 5-4)2…+ (9-7.4)2 • 21.2 • Total variance to be accounted for = (3-5.7)2+(5-5.7)2+…(9-5.7)2 • Or just Predicted SS + Error SS • 50.1 • Calculate R2 from these values

Regression output • Here is the summary table from our regression • The mean square is derived from dividing our sums of squares by the degrees of freedom • K-1 for the regression • Total = N -1 • Error N-k • The ratio of the mean squares is the F-statistic

ANOVA = Regression • Note the title of the summary table • ANOVA • It is an ANOVA summary table because you have in fact just conducted an analysis of variance, specifically for the two group situation • ANOVA, the statistical procedure as it is so-called, is a special case of regression • Below the first table is the ANOVA, as opposed to regression output.

Eta-squared = R-squared • Note the ‘partial eta-squared’ • Eta-squared has the same interpretation as R-squared and as one can see, is R-squared from our regression • SPSS calls it partial as there is often more than one grouping variable, and we are interested in unique effects (i.e. partial out the effects from other variables) • However it is actually eta-squared here, as there is no other variable effect to partial out

The lowly t-test • The t-test is a special case of ANOVA • ANOVA can handle more than two groups, while the t-test is just for two • However, F = t2 in the two group setting, the p-value is exactly the same

The lowly t-test • Compare to regression • The t, standard error, CI and p-value are the same, and again the coefficient is the difference between means

The Statistical Language • Statistics is a language used for communicating research ideas and findings • We have various dialects with which to speak it and of course pick freely of the words available • Sometimes we prefer to do regression and talk about amount of variance to be accounted for • Sometimes we prefer to talk about mean differences and how large those are • In both cases we are interested in the effect size • Which tool we use reflects how we want to talk about our results

Parameter Estimation example • Let’s assume that we believe there is a linear relationship between X and Y. • Which set of parameter values will bring us closest to representing the data accurately?

Estimation example • We begin by picking some values, plugging them into the equation, and seeing how well the implied values correspond to the observed values • We can quantify what we mean by “how well” by examining the difference between the model-implied Y and the actual Y value • This difference between our observed value and the one predicted, , is often called error in prediction, or the residual

Estimation example • Let’s try a different value of b and see what happens • Now the implied values of Y are getting closer to the actual values of Y, but we’re still off by quite a bit

Estimation example • Things are getting better, but certainly things could improve

Estimation example • Ah, much better

Estimation example • Now that’s very nice • There is a perfect correspondence between the predicted values of Y and the actual values of Y

Understanding the General Linear Model

Understanding the General Linear Model

Presentation Transcript

The General Linear Model

The General Linear Model

General Linear Model

General Linear Model

The General Linear Model (GLM)

The General Linear Model (GLM)

The General Linear Model

General Linear Model

The General Linear Model

The General Linear Model

The General Linear Model

The General Linear Model (GLM)

The General Linear Model

General Linear Model

General Linear Model

The General Linear Model

The General Linear Model

The General Linear Model

General Linear Model

The General Linear Model