Slides Prepared by JOHN S. LOUCKS St. Edward’s University

1 / 35

# Slides Prepared by JOHN S. LOUCKS St. Edward’s University - PowerPoint PPT Presentation

Slides Prepared by JOHN S. LOUCKS St. Edward’s University. y. x. Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part B. Exploratory Data Analysis Crosstabulations and Scatter Diagrams. Minor mods by McKim. Exploratory Data Analysis.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Slides Prepared by JOHN S. LOUCKS St. Edward’s University

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Slides Prepared by

JOHN S. LOUCKS

St. Edward’s University

y

x

Chapter 2Descriptive Statistics:Tabular and Graphical PresentationsPart B

• Exploratory Data Analysis
• Crosstabulations and

Scatter Diagrams

Minor mods by McKim

Exploratory Data Analysis
• The techniques of exploratory data analysis consist of
• simple arithmetic and easy-to-draw pictures that can
• be used to summarize data quickly.
• One such technique is the stem-and-leaf display.
Stem-and-Leaf Display
• A stem-and-leaf display shows both the rank order
• and shape of the distribution of the data.
• It is similar to a histogram on its side, but it has the
• advantage of showing the actual data values.
• The first digits of each data item are arranged to the
• left of a vertical line.
• To the right of the vertical line we record the last
• digit for each item in rank order.
• Each line in the display is referred to as a stem.
• Each digit on a stem is a leaf.

Example: Hudson Auto Repair

The manager of Hudson Auto

would like to have a better

understanding of the cost

of parts used in the engine

tune-ups performed in the

shop. She examines 50

customer invoices for tune-ups. The costs of parts,

rounded to the nearest dollar, are listed on the next

slide.

Example: Hudson Auto Repair

• Sample of Parts Cost for 50 Tune-ups

Stem-and-Leaf Display

5

6

7

8

9

10

2 7

2 2 2 2 5 6 7 8 8 8 9 9 9

1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9

0 0 2 3 5 8 9

1 3 7 7 7 8 9

1 4 5 5 9

a stem

a leaf

Stretched Stem-and-Leaf Display
• If we believe the original stem-and-leaf display has
• condensed the data too much, we can stretch the
• display by using two stems for each leading digit(s).
• Whenever a stem value is stated twice, the first value
• corresponds to leaf values of 0 - 4, and the second
• value corresponds to leaf values of 5 - 9.

Stretched Stem-and-Leaf Display

5

5

6

6

7

7

8

8

9

9

10

10

2

7

2 2 2 2

5 6 7 8 8 8 9 9 9

1 1 2 2 3 4 4

5 5 5 6 7 8 9 9 9

0 0 2 3

5 8 9

1 3

7 7 7 8 9

1 4

5 5 9

Stem-and-Leaf Display

• Leaf Units
• A single digit is used to define each leaf.
• In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.
• Where the leaf unit is not shown, it is assumed
• to equal 1.
Example: Leaf Unit = 0.1

If we have data with values such as

8.6 11.7 9.4 9.1 10.2 11.0 8.8

a stem-and-leaf display of these data will be

Leaf Unit = 0.1

8

9

10

11

6 8

1 4

2

0 7

Example: Leaf Unit = 10

If we have data with values such as

1806 1717 1974 1791 1682 1910 1838

a stem-and-leaf display of these data will be

Leaf Unit = 10

16

17

18

19

8

The 82 in 1682

is rounded down

to 80 and is

represented as an 8.

1 9

0 3

1 7

Crosstabulations and Scatter Diagrams
• Thus far we have focused on methods that are used
• to summarize the data for one variable at a time.
• Often a manager is interested in tabular and
• graphical methods that will help understand the
• relationship between two variables.
• Crosstabulation and a scatter diagram are two
• methods for summarizing the data for two (or more)
• variables simultaneously.

Crosstabulation

• A crosstabulation is a tabular summary of data for

two variables.

• Crosstabulation can be used when:
• one variable is qualitative and the other is
• quantitative,
• both variables are qualitative, or
• both variables are quantitative.
• In other words, any time!! (Duh.)
• The left and top margin labels define the classes for
• the two variables.

Crosstabulation

• Example: Finger Lakes Homes

The number of Finger Lakes homes sold for each style and price for the past two years is shown below.

qualitative

variable

quantitative

variable

Home Style

Price

Range

Colonial Log Split A-Frame

Total

18 6 19 12

55

45

< \$99,000

> \$99,000

12 14 16 3

30 20 35 15

Total

100

Crosstabulation
• Insights Gained from Preceding Crosstabulation
• The greatest number of homes in the sample (19)
• are a split-level style and priced at less than or
• equal to \$99,000.
• Only three homes in the sample are an A-Frame
• style and priced at more than \$99,000.

Crosstabulation

Frequency distribution

for the price variable

Home Style

Price

Range

Colonial Log Split A-Frame

Total

18 6 19 12

55

45

< \$99,000

> \$99,000

12 14 16 3

30 20 35 15

Total

100

Frequency distribution

for the home style variable

Crosstabulation: Row or Column Percentages
• Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.

Crosstabulation: Row Percentages

Home Style

Price

Range

Colonial Log Split A-Frame

Total

32.73 10.91 34.55 21.82

100

100

< \$99,000

> \$99,000

26.67 31.11 35.56 6.67

Note: row totals are actually 100.01 due to rounding.

(Colonial and > \$99K)/(All >\$99K) x 100 = (12/45) x 100

Crosstabulation: Column Percentages

Home Style

Price

Range

Colonial Log Split A-Frame

60.00 30.00 54.29 80.00

< \$99,000

> \$99,000

40.00 70.00 45.71 20.00

100 100 100 100

Total

(Colonial and > \$99K)/(All Colonial) x 100 = (12/30) x 100

• Data in two or more crosstabulations are often

aggregated to produce a summary crosstabulation.

• We must be careful in drawing conclusions about the
• relationship between the two variables in the
• aggregated crosstabulation.
• Simpson’ Paradox: In some cases the conclusions

based upon an aggregated crosstabulation can be

completely reversed if we look at the unaggregated

data. suggests the overall relationship between the

variables. (Yikes, just ignore that last partial sentence.)

Judge

Verdict

Total

Appealed

Luckett Kendall

129 (86%) 110 (88%)

Upheld

Reversed

239

36

• 21 (14%) 15 (12%)

150 (100%) 125 (100%)

Total (%)

275

Kendall appears to have the better rate of upheld verdicts

Judge Luckett

Verdict

Total

Appealed

Common Pleas Municipal

29 (91%) 100 (85%)

Upheld

Reversed

129

21

• 3 (9%) 18 (15%)

32 (100%) 118 (100%)

Total (%)

150

Judge Kendall

Verdict

Total

Appealed

Common Pleas Municipal

90 (90%) 20 (80%)

Upheld

Reversed

110

15

• 10 (10%) 5 (20%)

100 (100%) 25 (100%)

Total (%)

125

Luckett has the better rate of upheld verdicts in both Common Pleas and Municipal Courts

Yet the aggregate table makes it appear the Kendall has the better rate.

What appears to have happened is that Municipal Court appeals have a higher overturn rate and Luckett just happened to try most of his cases there.

What other important factor is missing from this analysis??

Scatter Diagram and Trendline

• A scatter diagram is a graphical presentation of the
• relationship between two quantitative variables.
• One variable is shown on the horizontal axis and the
• other variable is shown on the vertical axis.
• The general pattern of the plotted points suggests the
• overall relationship between the variables.
• A trendline is an approximation of the relationship.
• Example in book is number of commercials vs sales
Scatter Diagram
• A Positive Relationship

y

x

Scatter Diagram
• A Negative Relationship

y

x

Scatter Diagram
• No Apparent Relationship

y

x

Example: Panthers Football Team
• Scatter Diagram

The Panthers football team is interested

in investigating the relationship, if any,

between interceptions made and points scored.

x = Number of

Interceptions

y = Number of

Points Scored

14

24

18

17

30

1

3

2

1

3

35

30

25

20

15

10

5

0

1

0

2

3

4

Scatter Diagram

y

Number of Points Scored

x

Number of Interceptions

Example: Panthers Football Team

• Insights Gained from the Preceding Scatter Diagram
• The scatter diagram indicates a positive relationship
• between the number of interceptions and the
• number of points scored.
• Higher points scored are associated with a higher
• number of interceptions.
• The relationship is not perfect; all plotted points in
• the scatter diagram are not on a straight line.
Tabular and Graphical Procedures

Data

Qualitative Data

Quantitative Data

Tabular

Methods

Graphical

Methods

Tabular

Methods

Graphical

Methods

• Bar Graph
• Pie Chart
• Frequency
• Distribution
• Rel. Freq. Dist.
• Percent Freq.
• Distribution
• Crosstabulation
• Dot Plot
• Histogram
• Ogive
• Scatter
• Diagram
• Frequency
• Distribution
• Rel. Freq. Dist.
• Cum. Freq. Dist.
• Cum. Rel. Freq.
• Distribution
• Stem-and-Leaf
• Display
• Crosstabulation
Keep in mind…
• Frequency Distribution shows how many.
• Rel. Freq. Distribution shows what fraction.
• Percent Freq. Distribution shows what percentage.
• Can represent any of the above in tables or charts.