Introduction to Categorical Descriptive Statistics

1 / 43

# Introduction to Categorical Descriptive Statistics - PowerPoint PPT Presentation

Introduction to Categorical Descriptive Statistics. Overview. Contingency tables Notation Descriptive statistics Difference in proportions Relative risk Odds ratio SPSS. Contingency Tables. Two dimensional tables Let X and Y be categorical variables X has I levels and Y has J levels

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Introduction to Categorical Descriptive Statistics' - tanner

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Introduction to Categorical Descriptive Statistics

Overview
• Contingency tables
• Notation
• Descriptive statistics
• Difference in proportions
• Relative risk
• Odds ratio
• SPSS
Contingency Tables
• Two dimensional tables
• Let X and Y be categorical variables
• X has I levels and Y has J levels
• A contingency or cross-classification table is a tabular representation of the frequency counts for each pair of variable levels
Notation
• Cell Counts
Cell Proportions
• Often not as interested in absolute counts within cells as opposed to the relationship between the cell proportions
• To properly analyze cell proportions need to know experimental design and relationship between the variables
• All variables can be considered response variables
• One (or more) response variable and one (or more explanatory variable
• Prospective study
• Retrospective study
Two Variables
• Proportion notation
• {ij} gives the joint distribution
• {i+} and {+j} represent the marginals
• {j|i} is the conditional distribution of Y given level i of X
Example
• 1982 General Social Survey report on attitudes about death penalty and gun registration
• Calculate joint, marginal and conditional distributions
Prospective Study
• Subjects either select or are selected for treatment groups and then response is studied
• Experimental
• Subjects are randomly allocated to treatment groups
• Observational
• Subjects self-select treatment group
• Principal aim is to compare conditional distribution of response for different levels of explanatory variable(s)
Example
• Findings from the Aspirin Component of the Ongoing Physicians’ Health Study
• Calculate conditional distribution
Retrospective Study
• Given response, look back at levels of possible explanatory variables
• Observational studies
• Typically “over-sample” for response level of interest
• If know overall population proportion in each response level could use Bayes theorem to calculate conditional distribution in direction of interest
Example
• England-Wales 1968-1972 study on heart attacks and oral contraceptive use
• Calculate appropriate conditional distribution
Descriptive Statistics
• Comparing proportions for binary responses
• Difference of proportions
• Relative risk
• Odds ratios
• Independence
• X and Y response: pij = pi+p+j, for all i,j
• That is, pj|i = p+j, for all i,j
• X explanatory, Y response: pj|i = pj|h, for each j, for all i,h
Descriptive Statistics
• I x J tables
• No completely satisfactory way to summarize association
• Pairs of odds ratios
• Concentration coefficient
• Uncertainty coefficient
Difference of Proportions
• Binary response variable
• Generally, compare response for different explanatory levels
• p1|i - p1|h
• Difference lies between -1 and 1
• Independence when difference equals 0 for all i,h and response levels j
• Reasonable measure when absolute difference in proportions is relevant
• Also can compare differences between columns
Relative Risk
• Used when relative difference between proportions more relevant than absolute difference
• p1|1 /p1|2
• Relative risk of 1 corresponds to independence
• Comparison on second response different
• Usually can not be directly calculated from retrospective studies
Example

Risk for women having first child at 25 or older = .019 or 1.9%

Risk for women having first child before 25 = .0143 or 1.43%

Relative risk = .019/.0143 = 1.33

Increased risk = 33%

Odds Ratio
• For 2x2 table,
• In row 1, odds of being column 1 instead of column 2: O1 = p1|1 /p2|1
• In row 2, odds of being column 1 instead of column 2: O2 = p1|2 /p2|2
• Odds ratio: O1/O2
Odds Ratio
• Takes values > 0
• Sometimes look at log odds ratio
• Invariant to interchanging rows and columns
• Unnecessary to specify response variable
• Unlike difference of proportions, and relative risk
Odds Ratio
• Multiplicative invariance within given row or column
• Like difference of proportions and relative risk
• Equally valid for retrospective, prospective and cross-sectional studies
Example

Odds for women having first child at 25 or older = 31/1597 = .019/.981 = .0194

Odds for women having first child before 25 = 65/4475 = .0143/.9857 = .0145

Odds ratio = .0194/.0145 = 1.34

Relationship Between Odds Ratio and Relative Risk
• Odds ratio = Relative risk (1-p1|2)/(1-p1|1)
• When probability of outcome of interest is small, regardless of row condition, then can use odds ratio as an estimate of relative risk
Interpreting Risks and Odds
• Assess baseline risk
• Example: Men who drink 16 ounces of beer a day are three times more likely to develop rectal cancer
• Know time period of risk
• Risks accumulate with time
• Example: 1 in 9 women will develop breast cancer over their lifetime. But annual risk of women in their 30’s is 1 in 3700 and women in their 70’s is 1 in 235
• Investigate confounding factors
• Example: Older cars are almost 6 times as likely to be stolen than newer cars
• Survival rates for a standard and a new treatment at two hospitals
Relative Risk
• Hospital A:
• Risk of dying with standard treatment = 95/100 = .95
• Risk of dying with new treatment = 900/1000 = .90
• Relative risk = .95/.90 = 1.06
Relative Risk
• Hospital B:
• Risk of dying with standard treatment = 500/1000 = .5
• Risk of dying with new treatment = 5/100 = .05
• Relative risk = .5/.05 = 10.0
Combined Data
• Group data from both hospitals
• Risk of dying with standard treatment = 595/1100 = .54
• Risk of dying with new treatment = 905/1100 = .82
• Relative risk = .54/.82 = .66
What is Going On?
• When data is combined, lose the information that the patients in Hospital A had BOTH a higher overall death rate AND a higher likelihood of receiving the new treatment
• Misleading to summarize information over groups, especially if subjects were not randomly assigned to groups
Confounding Variables
• Television ownership versus movie attendance
Confounding Variables
• Control for income
More Examples