Chapter 3 Graphical and Numerical Summaries of Categorical Data

Chapter 3 Graphical and Numerical Summaries of Categorical Data

Chapter 3 Graphical and Numerical Summaries of Categorical Data. UNIT OBJECTIVES At the conclusion of this unit you should be able to: 1) Construct graphs that appropriately describe data 2) Calculate and interpret numerical summaries of a data set.

UNIT OBJECTIVES

At the conclusion of this unit you should be able to:

• 1) Construct graphs that appropriately describe data
• 2) Calculate and interpret numerical summaries of a data set.
• 3) Combine numerical methods with graphical methods to analyze a data set.

### Displaying Qualitative Data

“Sometimes you can see a lot just by looking.”

Yogi Berra

Hall of Fame Catcher, NY Yankees

• 1. Make a picture—reveals aspects not obvious in the raw data; enables you to think clearly about the patterns and relationships that may be hiding in your data.
• 2. Make a picture —to show important features of and patterns in the data. You may also see things that you did not expect: the extraordinary (possibly wrong) data values or unexpected patterns
• 3. Make a picture —the best way to tellothers about your data is with a well-chosen picture.
• Example: Titanic passenger/crew distribution
Pie Charts: shows proportions of the whole in each category
• Example: Titanic passenger/crew distribution
Example: Top 10 causes of death in the United States 2001

For each individual who died in the United States in 2001, we record what was the cause of death. The table above is a summary of that information.

The number of individuals who died of an accident in 2001 is approximately 100,000.

Top 10 causes of death: bar graph

Each category is represented by one bar. The bar’s height shows the count (or sometimes the percentage) for that particular category.

Top 10 causes of deaths in the United States 2001

Top 10 causes of deaths in the United States 2001

Bar graph sorted by rank

 Easy to analyze

Sorted alphabetically

 Much less useful

Top 10 causes of death: pie chart

Each slice represents a piece of one whole. The size of a slice depends on what

percent of the whole this category represents.

Percent of people dying from

top 10 causes of death in the United States in 2001

Make sure your

labels match

the data.

Make sure

all percents

Percent of deaths from top 10 causes

Percent of deaths from all causes

Child poverty before and after government intervention—UNICEF, 1996
• What does this chart tell you?
• The United States has the highest rate of child poverty among developed nations (22% of under 18).
• Its government does the least—through taxes and subsidies—to remedy the problem (size of orange bars and percent difference between orange/blue bars).
• Could you transform this bar graph to fit in 1 pie chart? In two pie charts? Why?

The poverty line is defined as 50% of national median income.

marg. dist. of survival

710/2201 32.3%

1491/2201 67.7%

885/2201 40.2%

325/2201 14.8%

285/2201 12.9%

706/2201 32.1%

marg. dist. of class

Contingency Tables: Categories for Two Variables
• Example: Survival and class on the Titanic

Marginal distributions

Contingency Tables: Categories for Two Variables (cont.)
• Conditional distributions.

Given the class of a passenger, what is the chance the passenger survived?

Contingency Tables: Categories for Two Variables (cont.)

Questions:

• What fraction of survivors were in first class?
• What fraction of passengers were in first class and survivors ?
• What fraction of the first class passengers survived?

202/710

202/2201

202/325

3-Way Tables
• Example: Georgia death-sentence data