Loading in 2 Seconds...

Lecture 8 Relationships between Scale variables: Regression Analysis

Loading in 2 Seconds...

- 113 Views
- Uploaded on

Download Presentation
## Lecture 8 Relationships between Scale variables: Regression Analysis

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Lecture 8Relationships between Scale variables: Regression Analysis

Graduate School

Quantitative Research Methods

Gwilym Pryce

g.pryce@socsci.gla.ac.uk

Notices:

- Register

Plan:

- 1. Linear & Non-linear Relationships
- 2. Fitting a line using OLS
- 3. Inference in Regression
- 4. Ommitted Variables & R2
- 5. Types of Regression Analysis
- 6. Properties of OLS Estimates
- 7. Assumptions of OLS
- 8. Doing Regression in SPSS

1. Linear & Non-linear relationships between variables

- Often of greatest interest in social science is investigation into relationships between variables:
- is social class related to political perspective?
- is income related to education?
- is worker alienation related to job monotony?
- We are also interested in the direction of causation, but this is more difficult to prove empirically:
- our empirical models are usually structured assuming a particular theory of causation

Relationships between scale variables

- The most straight forward way to investigate evidence for relationship is to look at scatter plots:
- traditional to:
- put the dependent variable (I.e. the “effect”) on the vertical axis
- or “y axis”
- put the explanatory variable (I.e. the “cause”) on the horizontal axis
- or “x axis”

… and so a straight line of best fit is not always very satisfactory:

But we can simulate a non-linear relationship by first transforming one of the variables:

2. Fitting a line using OLS

- The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation:

Where:

yi = observed value of y

= predicted value of yi

= the value on the line of best fit corresponding to xi

Regression estimates of a, b:or Ordinary Least Squares (OLS):

- This criterion yields estimates of the slope b and y-intercept a of the straight line:

3. Inference in Regression: Hypothesis tests on the slope coefficient:

- Regressions are usually run on samples, so what can we say about the population relationship between x and y?
- Repeated samples would yield a range of values for estimates of b ~ N(b, sb)
- I.e. b is normally distributed with mean = b = population mean = value of b if regression run on population
- If there is no relationship in the population between x and y, thenb = 0, & this is our H0

Hypothesis test on b:

- (1) H0: b = 0

(I.e. slope coefficient, if regression run on population, would = 0)

H1: b 0

- (2) a = 0.05 or 0.01 etc.
- (3) Reject H0 iff P < a
- (N.B. Rule of thumb: P < 0.05 if tc 2, and P < 0.01 if tc 2.6)
- (4) Calculate P and conclude.

Example using SPSS output:

- (1) H0: no relationship between house price and floor area.

H1: there is a relationship

- (2), (3), (4):
- P = 1- CDF.T(24.469,554) = 0.000000

Reject H0

4. Ommitted Variables & R2Q/ is floor area the only factor?How much of the variation in Price does it explain?

R-square

- R-square tells you how much of the variation in y is explained by the explanatory variable x
- 0 < R2 < 1 (NB: you want R2 to be near 1).
- If more than one explanatory variable, use Adjusted R2

3D Surface Plots:Construction, Price & UnemploymentQ = -246 + 27P - 0.2P2 - 73U + 3U2

5. Types of regression analysis:

- Univariate regression: one explanatory variable
- what we’ve looked at so far in the above equations
- Multivariate regression: >1 explanatory variable
- more than one equation on the RHS
- Log-linear regression & log-log regression:
- taking logs of variables can deal with certain types of non-linearities & useful properties (e.g. elasticities)
- Categorical dependent variable regression:
- dependent variable is dichotomous -- observation has an attribute or not
- e.g. MPPI take-up, unemployed or not etc.

6. Properties of OLS estimators

- OLS estimates of the slope and intercept parameters have been shown to be BLUE (provided certain assumptions are met):
- Best
- Linear
- Unbiased
- Estimator

“Best” in that they have the minimum variance compared with other estimators (i.e. given repeated samples, the OLS estimates for α and β vary less between samples than any other sample estimates for α and β).

- “Linear” in that a straight line relationship is assumed.
- “Unbiased” because, in repeated samples, the mean of all the estimates achieved will tend towards the population values for α and β.
- “Estimates” in that the true values of α and β cannot be known, and so we are using statistical techniques to arrive at the best possible assessment of their values, given the information available.

7. Assumptions of OLS:

For estimation of a and b to be BLUE and for regression inference to be correct:

- 1. Equation is correctly specified:
- Linear in parameters (can still transform variables)
- Contains all relevant variables
- Contains no irrelevant variables
- Contains no variables with measurement errors
- 2. Error Term has zero mean
- 3. Error Term has constant variance

4. Error Term is not autocorrelated

- I.e. correlated with error term from previous time periods
- 5. Explanatory variables are fixed
- observe normal distribution of y for repeated fixed values of x
- 6. No linear relationship between RHS

variables

- I.e. no “multicolinearity”

8. Doing Regression analysis in SPSS

- To run regression analysis in SPSS, click on Analyse, Regression, Linear:

Select your dependent (i.e. ‘explained’) variable and independent (i.e. ‘explanatory’) variables:

Confidence Intervals for regression coefficients

- Population slope coefficient CI:
- Rule of thumb:

e.g. regression of floor area on number of bathrooms, CI on slope:

b = 64.6 2 3.8

= 64.6 7.6

95% CI = ( 57, 72)

Confidence Intervals in SPSS:Analyse, Regression, Linear, click on Statistics and select Confidence intervals:

Our rule of thumb said 95% CI for slope = ( 57, 72). How does this compare?

Past Paper: (C2)Relationships (30%)

- Suppose you have a theory that suggests that time watching TV is determined by gregariousness
- the less gregarious, the more time spent watching TV
- Use a random sample of 60 observations from the TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender.
- Carefully interpret the output from this model and discuss the statistical robustness of the results.

Reading:

- Regression Analysis:
- *Field, A. chapters on regression.
- *Moore and McCabe Chapters on regression.
- Kennedy, P. ‘A Guide to Econometrics’
- Bryman, Alan, and Cramer, Duncan (1999) “Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10.
- Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).

Download Presentation

Connecting to Server..