lecture 8 relationships between scale variables regression analysis l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 8 Relationships between Scale variables: Regression Analysis PowerPoint Presentation
Download Presentation
Lecture 8 Relationships between Scale variables: Regression Analysis

Loading in 2 Seconds...

play fullscreen
1 / 42

Lecture 8 Relationships between Scale variables: Regression Analysis - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Lecture 8 Relationships between Scale variables: Regression Analysis. Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk. Notices:. Register. Plan:. 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Lecture 8 Relationships between Scale variables: Regression Analysis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 8 relationships between scale variables regression analysis

Lecture 8Relationships between Scale variables: Regression Analysis

Graduate School

Quantitative Research Methods

Gwilym Pryce

g.pryce@socsci.gla.ac.uk

notices
Notices:
  • Register
slide3
Plan:
  • 1. Linear & Non-linear Relationships
  • 2. Fitting a line using OLS
  • 3. Inference in Regression
  • 4. Ommitted Variables & R2
  • 5. Types of Regression Analysis
  • 6. Properties of OLS Estimates
  • 7. Assumptions of OLS
  • 8. Doing Regression in SPSS
1 linear non linear relationships between variables
1. Linear & Non-linear relationships between variables
  • Often of greatest interest in social science is investigation into relationships between variables:
    • is social class related to political perspective?
    • is income related to education?
    • is worker alienation related to job monotony?
  • We are also interested in the direction of causation, but this is more difficult to prove empirically:
    • our empirical models are usually structured assuming a particular theory of causation
relationships between scale variables
Relationships between scale variables
  • The most straight forward way to investigate evidence for relationship is to look at scatter plots:
    • traditional to:
      • put the dependent variable (I.e. the “effect”) on the vertical axis
        • or “y axis”
      • put the explanatory variable (I.e. the “cause”) on the horizontal axis
        • or “x axis”
2 fitting a line using ols
2. Fitting a line using OLS
  • The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation:

Where:

yi = observed value of y

= predicted value of yi

= the value on the line of best fit corresponding to xi

regression estimates of a b or ordinary least squares ols
Regression estimates of a, b:or Ordinary Least Squares (OLS):
  • This criterion yields estimates of the slope b and y-intercept a of the straight line:
3 inference in regression hypothesis tests on the slope coefficient
3. Inference in Regression: Hypothesis tests on the slope coefficient:
  • Regressions are usually run on samples, so what can we say about the population relationship between x and y?
  • Repeated samples would yield a range of values for estimates of b ~ N(b, sb)
      • I.e. b is normally distributed with mean = b = population mean = value of b if regression run on population
  • If there is no relationship in the population between x and y, thenb = 0, & this is our H0
hypothesis test on b
Hypothesis test on b:
  • (1) H0: b = 0

(I.e. slope coefficient, if regression run on population, would = 0)

H1: b 0

  • (2) a = 0.05 or 0.01 etc.
  • (3) Reject H0 iff P < a
      • (N.B. Rule of thumb: P < 0.05 if tc 2, and P < 0.01 if tc 2.6)
  • (4) Calculate P and conclude.
example using spss output
Example using SPSS output:
  • (1) H0: no relationship between house price and floor area.

H1: there is a relationship

  • (2), (3), (4):
  • P = 1- CDF.T(24.469,554) = 0.000000

Reject H0

slide22
4. Ommitted Variables & R2Q/ is floor area the only factor?How much of the variation in Price does it explain?
r square
R-square
  • R-square tells you how much of the variation in y is explained by the explanatory variable x
    • 0 < R2 < 1 (NB: you want R2 to be near 1).
    • If more than one explanatory variable, use Adjusted R2
5 types of regression analysis
5. Types of regression analysis:
  • Univariate regression: one explanatory variable
    • what we’ve looked at so far in the above equations
  • Multivariate regression: >1 explanatory variable
    • more than one equation on the RHS
  • Log-linear regression & log-log regression:
    • taking logs of variables can deal with certain types of non-linearities & useful properties (e.g. elasticities)
  • Categorical dependent variable regression:
    • dependent variable is dichotomous -- observation has an attribute or not
      • e.g. MPPI take-up, unemployed or not etc.
6 properties of ols estimators
6. Properties of OLS estimators
  • OLS estimates of the slope and intercept parameters have been shown to be BLUE (provided certain assumptions are met):
      • Best
      • Linear
      • Unbiased
      • Estimator
slide30
“Best” in that they have the minimum variance compared with other estimators (i.e. given repeated samples, the OLS estimates for α and β vary less between samples than any other sample estimates for α and β).
  • “Linear” in that a straight line relationship is assumed.
  • “Unbiased” because, in repeated samples, the mean of all the estimates achieved will tend towards the population values for α and β.
  • “Estimates” in that the true values of α and β cannot be known, and so we are using statistical techniques to arrive at the best possible assessment of their values, given the information available.
7 assumptions of ols
7. Assumptions of OLS:

For estimation of a and b to be BLUE and for regression inference to be correct:

  • 1. Equation is correctly specified:
    • Linear in parameters (can still transform variables)
    • Contains all relevant variables
    • Contains no irrelevant variables
    • Contains no variables with measurement errors
  • 2. Error Term has zero mean
  • 3. Error Term has constant variance
slide32
4. Error Term is not autocorrelated
    • I.e. correlated with error term from previous time periods
  • 5. Explanatory variables are fixed
    • observe normal distribution of y for repeated fixed values of x
  • 6. No linear relationship between RHS

variables

    • I.e. no “multicolinearity”
8 doing regression analysis in spss
8. Doing Regression analysis in SPSS
  • To run regression analysis in SPSS, click on Analyse, Regression, Linear:
select your dependent i e explained variable and independent i e explanatory variables
Select your dependent (i.e. ‘explained’) variable and independent (i.e. ‘explanatory’) variables:
confidence intervals for regression coefficients
Confidence Intervals for regression coefficients
  • Population slope coefficient CI:
  • Rule of thumb:
e g regression of floor area on number of bathrooms ci on slope
e.g. regression of floor area on number of bathrooms, CI on slope:

b = 64.6  2  3.8

= 64.6  7.6

95% CI = ( 57, 72)

slide39
Confidence Intervals in SPSS:Analyse, Regression, Linear, click on Statistics and select Confidence intervals:
past paper c2 relationships 3 0
Past Paper: (C2)Relationships (30%)
  • Suppose you have a theory that suggests that time watching TV is determined by gregariousness
    • the less gregarious, the more time spent watching TV
  • Use a random sample of 60 observations from the TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender.
  • Carefully interpret the output from this model and discuss the statistical robustness of the results.
reading
Reading:
  • Regression Analysis:
    • *Field, A. chapters on regression.
    • *Moore and McCabe Chapters on regression.
    • Kennedy, P. ‘A Guide to Econometrics’
    • Bryman, Alan, and Cramer, Duncan (1999) “Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10.
    • Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).