Loading in 5 sec....

Introduction to Categorical Descriptive StatisticsPowerPoint Presentation

Introduction to Categorical Descriptive Statistics

- 84 Views
- Uploaded on
- Presentation posted in: General

Introduction to Categorical Descriptive Statistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Introduction to Categorical Descriptive Statistics

- Contingency tables
- Notation

- Descriptive statistics
- Difference in proportions
- Relative risk
- Odds ratio

- SPSS

- Two dimensional tables
- Let X and Y be categorical variables
- X has I levels and Y has J levels

- A contingency or cross-classification table is a tabular representation of the frequency counts for each pair of variable levels

- Let X and Y be categorical variables

- Cell Counts

- Often not as interested in absolute counts within cells as opposed to the relationship between the cell proportions
- To properly analyze cell proportions need to know experimental design and relationship between the variables
- All variables can be considered response variables
- One (or more) response variable and one (or more explanatory variable
- Prospective study
- Retrospective study

- Proportion notation
- {ij} gives the joint distribution
- {i+} and {+j} represent the marginals
- {j|i} is the conditional distribution of Y given level i of X

- 1982 General Social Survey report on attitudes about death penalty and gun registration
- Calculate joint, marginal and conditional distributions

- Subjects either select or are selected for treatment groups and then response is studied
- Experimental
- Subjects are randomly allocated to treatment groups

- Observational
- Subjects self-select treatment group

- Principal aim is to compare conditional distribution of response for different levels of explanatory variable(s)

- Experimental

- Findings from the Aspirin Component of the Ongoing Physicians’ Health Study
- Calculate conditional distribution

- Given response, look back at levels of possible explanatory variables
- Observational studies

- Typically “over-sample” for response level of interest
- If know overall population proportion in each response level could use Bayes theorem to calculate conditional distribution in direction of interest

- England-Wales 1968-1972 study on heart attacks and oral contraceptive use
- Calculate appropriate conditional distribution

- Comparing proportions for binary responses
- Difference of proportions
- Relative risk
- Odds ratios

- Independence
- X and Y response: pij = pi+p+j, for all i,j
- That is, pj|i = p+j, for all i,j

- X explanatory, Y response: pj|i = pj|h, for each j, for all i,h

- X and Y response: pij = pi+p+j, for all i,j

- I x J tables
- No completely satisfactory way to summarize association
- Pairs of odds ratios
- Concentration coefficient
- Uncertainty coefficient

- Binary response variable
- Generally, compare response for different explanatory levels
- p1|i - p1|h

- Difference lies between -1 and 1
- Independence when difference equals 0 for all i,h and response levels j
- Reasonable measure when absolute difference in proportions is relevant
- Also can compare differences between columns

- Generally, compare response for different explanatory levels

- Example

- Example

- Example

- Used when relative difference between proportions more relevant than absolute difference
- p1|1 /p1|2
- Relative risk of 1 corresponds to independence
- Comparison on second response different

- Usually can not be directly calculated from retrospective studies

Risk for women having first child at 25 or older = .019 or 1.9%

Risk for women having first child before 25 = .0143 or 1.43%

Relative risk = .019/.0143 = 1.33

Increased risk = 33%

- Example

- For 2x2 table,
- In row 1, odds of being column 1 instead of column 2: O1 = p1|1 /p2|1
- In row 2, odds of being column 1 instead of column 2: O2 = p1|2 /p2|2
- Odds ratio: O1/O2

- Takes values > 0
- Sometimes look at log odds ratio

- Invariant to interchanging rows and columns
- Unnecessary to specify response variable
- Unlike difference of proportions, and relative risk

- Unnecessary to specify response variable

- Multiplicative invariance within given row or column
- Like difference of proportions and relative risk

- Equally valid for retrospective, prospective and cross-sectional studies

Odds for women having first child at 25 or older = 31/1597 = .019/.981 = .0194

Odds for women having first child before 25 = 65/4475 = .0143/.9857 = .0145

Odds ratio = .0194/.0145 = 1.34

- Odds ratio = Relative risk (1-p1|2)/(1-p1|1)
- When probability of outcome of interest is small, regardless of row condition, then can use odds ratio as an estimate of relative risk

- Assess baseline risk
- Example: Men who drink 16 ounces of beer a day are three times more likely to develop rectal cancer

- Know time period of risk
- Risks accumulate with time
- Example: 1 in 9 women will develop breast cancer over their lifetime. But annual risk of women in their 30’s is 1 in 3700 and women in their 70’s is 1 in 235

- Risks accumulate with time
- Investigate confounding factors
- Example: Older cars are almost 6 times as likely to be stolen than newer cars

- Survival rates for a standard and a new treatment at two hospitals

- Hospital A:
- Risk of dying with standard treatment = 95/100 = .95
- Risk of dying with new treatment = 900/1000 = .90
- Relative risk = .95/.90 = 1.06

- Hospital B:
- Risk of dying with standard treatment = 500/1000 = .5
- Risk of dying with new treatment = 5/100 = .05
- Relative risk = .5/.05 = 10.0

- Group data from both hospitals
- Risk of dying with standard treatment = 595/1100 = .54
- Risk of dying with new treatment = 905/1100 = .82
- Relative risk = .54/.82 = .66

- When data is combined, lose the information that the patients in Hospital A had BOTH a higher overall death rate AND a higher likelihood of receiving the new treatment
- Misleading to summarize information over groups, especially if subjects were not randomly assigned to groups

- Television ownership versus movie attendance

- Control for income

- Discrimination in college admission
- Racial bias in death penalty sentences

- Over a given number of years the University of California, Berkeley admitted 44% of all men who applied to any one of six graduate programs and only 30% of women who applied
- Is there evidence of discrimination in graduate admissions at Berkeley?

- Results of 1981 Florida study of whether race of homicide defendant affect likelihood that death penalty would receive death penalty

- Based on data, is race a factor in whether the death penalty is received and if so how is race a factor?