1 / 29

# Chapter 3: Frequency Distributions - PowerPoint PPT Presentation

Chapter 3: Frequency Distributions. In Chapter 3:. 3.1 Stemplots 3.2 Frequency Tables 3.3 Additional Frequency Charts. Start by exploring the data with Exploratory Data Analysis (EDA) A popular univariate EDA technique is the stem-and-leaf plot

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Chapter 3: Frequency Distributions' - iolana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Chapter 3: Frequency Distributions

### In Chapter 3:

3.1 Stemplots

3.2 Frequency Tables

3.3 Additional Frequency Charts

Start by exploringthe data with Exploratory Data Analysis (EDA)

A popular univariate EDA technique is the stem-and-leaf plot

The stem of the stemplot is an number-line (axis)

Each leaf represents a data point

Stemplots

You can observe a lot by looking – Yogi Berra

• 10 ages (data sequenced as an ordered array)

05 11 21 24 27 28 30 42 50 52

• Draw the stem to cover the range 5 to 52:

0| 1| 2| 3| 4| 5| ×10  axis multiplier

• Divide each data point into a stem-value (in this example, the tens place) and leaf-value (the ones-place, in this example)

• Place leaves next to their stem value

• Example of a leaf: 21 (plotted)

1

• Plot all data points in rank order:

0|5 1|1 2|1478 3|0 4|2 5|02 ×10

• Here is the plot horizontally

8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5------------Rotated stemplot

Central location

Interpreting Distributions

• “Shape” refers to the distributional pattern

• Here’s the silhouette of our data X X X X X X X X X X ----------- 0 1 2 3 4 5 -----------

• Mound-shaped, symmetrical, no outliers

• Do not “over-interpret” plots when n is small

Consider this large data set of IQ scores

An density curve is superimposed on the graph

Kurtosis (steepness)

 fat tails

Mesokurtic (medium)

Platykurtic (flat)

 skinny tails

Leptokurtic (steep)

Kurtosis is not be easily judged by eye

Gravitational center ≡ arithmetic mean

“Eye-ball method” visualize where plot would balance on see-saw “

around 30 (takes practice)

Arithmetic method = sum values and divide by n

sum = 290

n = 10

mean = 290 / 10 = 29

Gravitational Center (Mean)

8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5 ------------ ^ Grav.Center

Central location: Median

• Ordered array:

05 11 21 24 27 28 30 42 50 52

• The median has depth (n + 1) ÷ 2

• n = 10, median’s depth = (10+1) ÷ 2 = 5.5

• → falls between 27 and 28

• When n is even, average adjacent values Median = 27.5

Current data range is “5 to 52”

The range is the easiest but not the best way to describe spread (better methods described later)

• Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42

• Stem = ones-place

• Leaves = tenths-place

• Truncate extra digit (e.g., 1.47  1.4)

|1|4|2|03|3|4779|4|4(×1)

• Center: median between 3.4 & 3.7 (underlined)

• Spread: 1.4 to 4.4

• Shape: mound, no outliers

Third Illustrative Example (n = 25)

• Data: 14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28, 29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38

• Regular stemplot:

|1|4789|2|223466789|3|000123445678×10

• Too squished to see shape

• Split stem-values into two ranges, e.g., first “1” holds leaves between 0 to 4, and second “1” will holds leaves between 5 to 9

• Split-stem

|1|4|1|789|2|2234|2|66789|3|00012344|3|5678×10

• Negative skew now evident)

• Start with between 4 and 12 stem-values

• Then, use trial and error using different stem multipliers and splits → use plot that shows shape most clearly

Fourth Example: n = 53 body weights

Data range from 100 to 260 lbs:

• ×100 axis multiplier  only two stem-values (1×100 and 2×100)  too few

• ×100 axis-multiplier w/ split stem  4 stem values  might be OK(?)

• ×10 axis-multiplier  16 stem values next slide

Fourth Stemplot Example (n = 53)

10|0166

11|009

12|0034578

13|00359

14|08

15|00257

16|555

17|000255

18|000055567

19|245

20|3

21|025

22|0

23|

24|

25|

26|0

(×10)

Shape: Positive skewhigh outlier (260)

Central Location: L(M) = (53 + 1) / 2 = 27 Median = 165 (underlined)

Spread: from 100 to 260

1*|0000111

1t|222222233333

1f|4455555

1s|666777777

1.|888888888999

2*|0111

2t|2

2f|

2s|6

(×100)

Codes for stem values:

* for leaves 0 and 1 t for leaves two and threef for leaves four and fives for leaves six and seven. for leaves eight and nine

For example, 120 is:1t|2(x100)

SPSS Stemplot, n = 654

Frequency counts

Frequency Stem & Leaf

2.00 3 . 0 9.00 4 . 0000 28.00 5 . 00000000000000 37.00 6 . 000000000000000000 54.00 7 . 000000000000000000000000000 85.00 8 . 000000000000000000000000000000000000000000 94.00 9 . 00000000000000000000000000000000000000000000000 81.00 10 . 0000000000000000000000000000000000000000 90.00 11 . 000000000000000000000000000000000000000000000 57.00 12 . 0000000000000000000000000000 43.00 13 . 000000000000000000000 25.00 14 . 000000000000 19.00 15 . 000000000 13.00 16 . 000000 8.00 17 . 0000 9.00 Extremes (>=18)

Stem width: 1 Each leaf: 2 case(s)

3 . 0 means 3.0 years

Because nlarge, each leaf represents 2 observations

AGE   |  Freq  Rel.Freq  Cum.Freq.

------+----------------------- 3    |     2    0.3%     0.3% 4    |     9    1.4%     1.7% 5    |    28    4.3%     6.0% 6    |    37    5.7%    11.6% 7    |    54    8.3%    19.9% 8    |    85   13.0%    32.9% 9    |    94   14.4%    47.2%10    |    81   12.4%    59.6%11    |    90   13.8%    73.4%12    |    57    8.7%    82.1%13    |    43    6.6%    88.7%14    |    25    3.8%    92.5%15    |    19    2.9%    95.4%16    |    13    2.0%    97.4%17    |     8    1.2%    98.6%18    |     6    0.9%    99.5%19    |     3    0.5%   100.0%------+-----------------------Total |   654  100.0%

• Frequency≡ count

• Relative frequency≡ proportion

• Cumulative [relative] frequency≡proportion less than or equal to current value

• When data sparse, group data into class intervals

• Classes intervals can be uniform or non-uniform

• Use end-point convention, so data points fall into unique intervals: include lower boundary, exclude upper boundary

• (next slide)

Data: 05 11 21 24 27 28 30 42 50 52

For a quantitative measurement only.

Bars touch.

For categorical and ordinal measurements and continuous data in non-uniform class intervals  bars do not touch.