handling categorical data n.
Skip this Video
Loading SlideShow in 5 Seconds..
Handling Categorical Data PowerPoint Presentation
Download Presentation
Handling Categorical Data

Loading in 2 Seconds...

play fullscreen
1 / 24

Handling Categorical Data - PowerPoint PPT Presentation

  • Uploaded on

Handling Categorical Data. Learning Outcomes. At the end of this session and with additional reading you will be able to: Understand when and how to analyse frequency counts. Analysing categorical variables. Frequencies The number of observations within a given category.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Handling Categorical Data' - prince

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning outcomes
Learning Outcomes
  • At the end of this session and with additional reading you will be able to:
    • Understand when and how to analyse frequency counts
analysing categorical variables
Analysing categorical variables
  • Frequencies
    • The number of observations within a given category
assumptions of chi squared
Assumptions of Chi squared
  • Each observation only contributes to only one cell of the contingency table
  • The expected frequencies should be greater than 5
chi squared ii
Chi Squared II
  • Pearsons Chi squared
  • Assess the difference between observed frequencies and expected frequencies in each cell
  • This is achieved by calculating the expected values for each cell
  • Model = RT x CT


chi squared iii
Chi Squared III
  • Likelihood ratio
    • a comparison of observed frequencies by those predicted by the model (expected)
  • Yates correction
    • with a 2 x 2 contingency table Pearson’s chi squared can produce a type 1 error (subtract .5 from the deviation and square it)
    • this makes it less significant
the contingency table i
The contingency table I
  • Using my case study on stop and search suppose we wanted to ascertain if black males were stopped more in one month than white males
  • One variable
    • (black or white male)
    • What does this tell us
one way chi squared
One-way Chi Squared
  • In a simple one way chi squared we would expect that if we had 148 people they would be evenly split between white and black males so
  • expected values would be 78
the contingency table ii
The contingency table II
  • It would more useful to look at an additional variable lets say age
  • Two variables
  • Males
    • Black/white
  • Age
    • Under 18/over 18
  • Now using the formula calculate the expected values for the consistency table
  • Model = RT x CT


odds ratio
Odds ratio
  • The odds that a given observation is likely to happen
loglinear analysis
Loglinear analysis
  • Loglinear works on backward elimination of a model
  • Saturated first, then removes predictors
    • just like an ANOVA a loglinear assesses the relationship between all variables and describes the outcomes in terms of interactions
loglinear analysis ii
Loglinear analysis II
  • With our previous example we had two variables
    • ethnicity and age
  • If we now added reason for stop and search a loglinear analysis will first assess the 3-way interaction and then assess the varying two-way interactions
assumptions of loglinear analysis
Assumptions of loglinear analysis
  • Similar to those of chi squared
    • observations should fall into one category alone
    • no more than 20% of cells with frequencies less than 5
    • all cells must have frequencies greater than 1
      • if you don’t meet this assumption you need to decide whether to proceed with the analysis or collapse the data across variables
output i
Output I
  • No of cases should equal the no of total observations
  • No of factors (variables)
  • No of levels (sub-divisions within each variable)
  • Saturated model the maximum interaction possible with observed frequencies
  • Goodness of fit and likelihood ration statistics
    • the expected frequencies are significantly different from the observed
    • these should be non significant if model is a good fit
output ii
Output II
  • Goodness fit preferred for large samples
  • Likelihood ration is preferred for small samples
  • K-way higher order is asking
    • if you remove the highest order interaction will the fit of the model be affected
    • the next k-way affect asking if you remove the highest order following by the next order will the fit of the model be affected
    • and so on until all affects are removed
output iii
Output III
  • K-way effects are zero asks the opposite
    • that is whether removing main effects will have an effect on the model
    • the final step is the backward elimination
    • the analysis will keep going until it has eliminated all effects and advises that
      • the best model has generated class