Lecture 8 Relationships between Scale variables: Regression Analysis

1 / 42

# Lecture 8 Relationships between Scale variables: Regression Analysis - PowerPoint PPT Presentation

Lecture 8 Relationships between Scale variables: Regression Analysis. Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk. Notices:. Register. Plan:. 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Lecture 8 Relationships between Scale variables: Regression Analysis

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 8Relationships between Scale variables: Regression Analysis

Quantitative Research Methods

Gwilym Pryce

g.pryce@socsci.gla.ac.uk

Notices:
• Register
Plan:
• 1. Linear & Non-linear Relationships
• 2. Fitting a line using OLS
• 3. Inference in Regression
• 4. Ommitted Variables & R2
• 5. Types of Regression Analysis
• 6. Properties of OLS Estimates
• 7. Assumptions of OLS
• 8. Doing Regression in SPSS
1. Linear & Non-linear relationships between variables
• Often of greatest interest in social science is investigation into relationships between variables:
• is social class related to political perspective?
• is income related to education?
• is worker alienation related to job monotony?
• We are also interested in the direction of causation, but this is more difficult to prove empirically:
• our empirical models are usually structured assuming a particular theory of causation
Relationships between scale variables
• The most straight forward way to investigate evidence for relationship is to look at scatter plots:
• put the dependent variable (I.e. the “effect”) on the vertical axis
• or “y axis”
• put the explanatory variable (I.e. the “cause”) on the horizontal axis
• or “x axis”
2. Fitting a line using OLS
• The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation:

Where:

yi = observed value of y

= predicted value of yi

= the value on the line of best fit corresponding to xi

• This criterion yields estimates of the slope b and y-intercept a of the straight line:
• Regressions are usually run on samples, so what can we say about the population relationship between x and y?
• Repeated samples would yield a range of values for estimates of b ~ N(b, sb)
• I.e. b is normally distributed with mean = b = population mean = value of b if regression run on population
• If there is no relationship in the population between x and y, thenb = 0, & this is our H0
Hypothesis test on b:
• (1) H0: b = 0

(I.e. slope coefficient, if regression run on population, would = 0)

H1: b 0

• (2) a = 0.05 or 0.01 etc.
• (3) Reject H0 iff P < a
• (N.B. Rule of thumb: P < 0.05 if tc 2, and P < 0.01 if tc 2.6)
• (4) Calculate P and conclude.
Example using SPSS output:
• (1) H0: no relationship between house price and floor area.

H1: there is a relationship

• (2), (3), (4):
• P = 1- CDF.T(24.469,554) = 0.000000

Reject H0

4. Ommitted Variables & R2Q/ is floor area the only factor?How much of the variation in Price does it explain?
R-square
• R-square tells you how much of the variation in y is explained by the explanatory variable x
• 0 < R2 < 1 (NB: you want R2 to be near 1).
• If more than one explanatory variable, use Adjusted R2
5. Types of regression analysis:
• Univariate regression: one explanatory variable
• what we’ve looked at so far in the above equations
• Multivariate regression: >1 explanatory variable
• more than one equation on the RHS
• Log-linear regression & log-log regression:
• taking logs of variables can deal with certain types of non-linearities & useful properties (e.g. elasticities)
• Categorical dependent variable regression:
• dependent variable is dichotomous -- observation has an attribute or not
• e.g. MPPI take-up, unemployed or not etc.
6. Properties of OLS estimators
• OLS estimates of the slope and intercept parameters have been shown to be BLUE (provided certain assumptions are met):
• Best
• Linear
• Unbiased
• Estimator
“Best” in that they have the minimum variance compared with other estimators (i.e. given repeated samples, the OLS estimates for α and β vary less between samples than any other sample estimates for α and β).
• “Linear” in that a straight line relationship is assumed.
• “Unbiased” because, in repeated samples, the mean of all the estimates achieved will tend towards the population values for α and β.
• “Estimates” in that the true values of α and β cannot be known, and so we are using statistical techniques to arrive at the best possible assessment of their values, given the information available.
7. Assumptions of OLS:

For estimation of a and b to be BLUE and for regression inference to be correct:

• 1. Equation is correctly specified:
• Linear in parameters (can still transform variables)
• Contains all relevant variables
• Contains no irrelevant variables
• Contains no variables with measurement errors
• 2. Error Term has zero mean
• 3. Error Term has constant variance
4. Error Term is not autocorrelated
• I.e. correlated with error term from previous time periods
• 5. Explanatory variables are fixed
• observe normal distribution of y for repeated fixed values of x
• 6. No linear relationship between RHS

variables

• I.e. no “multicolinearity”
8. Doing Regression analysis in SPSS
• To run regression analysis in SPSS, click on Analyse, Regression, Linear:
Select your dependent (i.e. ‘explained’) variable and independent (i.e. ‘explanatory’) variables:
Confidence Intervals for regression coefficients
• Population slope coefficient CI:
• Rule of thumb:

b = 64.6  2  3.8

= 64.6  7.6

95% CI = ( 57, 72)

Confidence Intervals in SPSS:Analyse, Regression, Linear, click on Statistics and select Confidence intervals:
Past Paper: (C2)Relationships (30%)
• Suppose you have a theory that suggests that time watching TV is determined by gregariousness
• the less gregarious, the more time spent watching TV
• Use a random sample of 60 observations from the TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender.
• Carefully interpret the output from this model and discuss the statistical robustness of the results.