Chapter 3:
Download
1 / 29

Chapter 3: Frequency Distributions - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Chapter 3: Frequency Distributions. In Chapter 3:. 3.1 Stemplots 3.2 Frequency Tables 3.3 Additional Frequency Charts. Start by exploring the data with Exploratory Data Analysis (EDA) A popular univariate EDA technique is the stem-and-leaf plot

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 3: Frequency Distributions' - iolana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
February 12

Chapter 3: Frequency Distributions


In chapter 3

In Chapter 3:

3.1 Stemplots

3.2 Frequency Tables

3.3 Additional Frequency Charts


Stemplots

Start by exploringthe data with Exploratory Data Analysis (EDA)

A popular univariate EDA technique is the stem-and-leaf plot

The stem of the stemplot is an number-line (axis)

Each leaf represents a data point

Stemplots

You can observe a lot by looking – Yogi Berra


Stemplot illustration
Stemplot: Illustration

  • 10 ages (data sequenced as an ordered array)

    05 11 21 24 27 28 30 42 50 52

  • Draw the stem to cover the range 5 to 52:

    0| 1| 2| 3| 4| 5| ×10  axis multiplier

  • Divide each data point into a stem-value (in this example, the tens place) and leaf-value (the ones-place, in this example)

  • Place leaves next to their stem value

  • Example of a leaf: 21 (plotted)

1


Stemplot illustration continued
Stemplot illustration continued …

  • Plot all data points in rank order:

    0|5 1|1 2|1478 3|0 4|2 5|02 ×10

  • Here is the plot horizontally

8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5------------Rotated stemplot


Interpreting distributions

Shape

Central location

Spread

Interpreting Distributions


Shape
Shape

  • “Shape” refers to the distributional pattern

  • Here’s the silhouette of our data X X X X X X X X X X ----------- 0 1 2 3 4 5 -----------

  • Mound-shaped, symmetrical, no outliers

  • Do not “over-interpret” plots when n is small


Shape cont
Shape (cont.)

Consider this large data set of IQ scores

An density curve is superimposed on the graph





Kurtosis steepness
Kurtosis (steepness)

 fat tails

Mesokurtic (medium)

Platykurtic (flat)

 skinny tails

Leptokurtic (steep)

Kurtosis is not be easily judged by eye


Gravitational center mean

Gravitational center ≡ arithmetic mean

“Eye-ball method” visualize where plot would balance on see-saw “

around 30 (takes practice)

Arithmetic method = sum values and divide by n

sum = 290

n = 10

mean = 290 / 10 = 29

Gravitational Center (Mean)

8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5 ------------ ^ Grav.Center


Central location m edian
Central location: Median

  • Ordered array:

    05 11 21 24 27 28 30 42 50 52

  • The median has depth (n + 1) ÷ 2

  • n = 10, median’s depth = (10+1) ÷ 2 = 5.5

  • → falls between 27 and 28

  • When n is even, average adjacent values Median = 27.5


Spread range

For now, report the range (minimum and maximum values)

Current data range is “5 to 52”

The range is the easiest but not the best way to describe spread (better methods described later)

Spread: Range


Stemplot second example
Stemplot – Second Example

  • Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42

  • Stem = ones-place

  • Leaves = tenths-place

  • Truncate extra digit (e.g., 1.47  1.4)

|1|4|2|03|3|4779|4|4(×1)

  • Center: median between 3.4 & 3.7 (underlined)

  • Spread: 1.4 to 4.4

  • Shape: mound, no outliers


Third illustrative example n 25
Third Illustrative Example (n = 25)

  • Data: 14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28, 29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38

  • Regular stemplot:

    |1|4789|2|223466789|3|000123445678×10

  • Too squished to see shape


Third illustration split stem
Third Illustration; Split Stem

  • Split stem-values into two ranges, e.g., first “1” holds leaves between 0 to 4, and second “1” will holds leaves between 5 to 9

  • Split-stem

    |1|4|1|789|2|2234|2|66789|3|00012344|3|5678×10

  • Negative skew now evident)


How many stem values
How many stem-values?

  • Start with between 4 and 12 stem-values

  • Then, use trial and error using different stem multipliers and splits → use plot that shows shape most clearly


Fourth example n 53 body weights
Fourth Example: n = 53 body weights

Data range from 100 to 260 lbs:


Data range from 100 to 260 lbs
Data range from 100 to 260 lbs:

  • ×100 axis multiplier  only two stem-values (1×100 and 2×100)  too few

  • ×100 axis-multiplier w/ split stem  4 stem values  might be OK(?)

  • ×10 axis-multiplier  16 stem values next slide


Fourth stemplot example n 53
Fourth Stemplot Example (n = 53)

10|0166

11|009

12|0034578

13|00359

14|08

15|00257

16|555

17|000255

18|000055567

19|245

20|3

21|025

22|0

23|

24|

25|

26|0

(×10)

Shape: Positive skewhigh outlier (260)

Central Location: L(M) = (53 + 1) / 2 = 27 Median = 165 (underlined)

Spread: from 100 to 260


Quintuple split stem values
Quintuple-Split Stem Values

1*|0000111

1t|222222233333

1f|4455555

1s|666777777

1.|888888888999

2*|0111

2t|2

2f|

2s|6

(×100)

Codes for stem values:

* for leaves 0 and 1 t for leaves two and threef for leaves four and fives for leaves six and seven. for leaves eight and nine

For example, 120 is:1t|2(x100)


Spss stemplot n 654
SPSS Stemplot, n = 654

Frequency counts

Frequency Stem & Leaf

2.00 3 . 0 9.00 4 . 0000 28.00 5 . 00000000000000 37.00 6 . 000000000000000000 54.00 7 . 000000000000000000000000000 85.00 8 . 000000000000000000000000000000000000000000 94.00 9 . 00000000000000000000000000000000000000000000000 81.00 10 . 0000000000000000000000000000000000000000 90.00 11 . 000000000000000000000000000000000000000000000 57.00 12 . 0000000000000000000000000000 43.00 13 . 000000000000000000000 25.00 14 . 000000000000 19.00 15 . 000000000 13.00 16 . 000000 8.00 17 . 0000 9.00 Extremes (>=18)

Stem width: 1 Each leaf: 2 case(s)

3 . 0 means 3.0 years

Because nlarge, each leaf represents 2 observations


Frequency table
Frequency Table

AGE   |  Freq  Rel.Freq  Cum.Freq.

------+----------------------- 3    |     2    0.3%     0.3% 4    |     9    1.4%     1.7% 5    |    28    4.3%     6.0% 6    |    37    5.7%    11.6% 7    |    54    8.3%    19.9% 8    |    85   13.0%    32.9% 9    |    94   14.4%    47.2%10    |    81   12.4%    59.6%11    |    90   13.8%    73.4%12    |    57    8.7%    82.1%13    |    43    6.6%    88.7%14    |    25    3.8%    92.5%15    |    19    2.9%    95.4%16    |    13    2.0%    97.4%17    |     8    1.2%    98.6%18    |     6    0.9%    99.5%19    |     3    0.5%   100.0%------+-----------------------Total |   654  100.0%

  • Frequency≡ count

  • Relative frequency≡ proportion

  • Cumulative [relative] frequency≡proportion less than or equal to current value


Class intervals
Class Intervals

  • When data sparse, group data into class intervals

  • Classes intervals can be uniform or non-uniform

  • Use end-point convention, so data points fall into unique intervals: include lower boundary, exclude upper boundary

  • (next slide)


Class intervals freq table
Class Intervals Freq Table

Data: 05 11 21 24 27 28 30 42 50 52


Histogram
Histogram

For a quantitative measurement only.

Bars touch.


Bar chart
Bar Chart

For categorical and ordinal measurements and continuous data in non-uniform class intervals  bars do not touch.