1 / 24

# STA 291 Fall 2009 - PowerPoint PPT Presentation

STA 291 Fall 2009. Lecture 4 Dustin Lueker. Bar Graph (Nominal/Ordinal Data). Histogram: for interval (quantitative) data Bar graph is almost the same, but for qualitative data Difference:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about ' STA 291 Fall 2009' - jorryn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### STA 291Fall 2009

Lecture 4

Dustin Lueker

• Histogram: for interval (quantitative) data

• Bar graph is almost the same, but for qualitative data

• Difference:

• The bars are usually separated to emphasize that the variable is categorical rather than quantitative

• For nominal variables (no natural ordering), order the bars by frequency, except possibly for a category “other” that is always last

STA 291 Fall 2009 Lecture 4

Pie Chart(Nominal/Ordinal Data)

• First Step

• Create a frequency distribution

STA 291 Fall 2009 Lecture 4

We could display this data in a bar chart…

• Bar graph

• If the data is ordinal, classes are presented in the natural ordering

STA 291 Fall 2009 Lecture 4

• Pie is divided into slices

• Area of each slice is proportional to the frequency of each class

STA 291 Fall 2009 Lecture 4

STA 291 Fall 2009 Lecture 4

• Write the observations ordered from smallest to largest

• Looks like a histogram sideways

• Contains more information than a histogram, because every single observation can be recovered

• Each observation represented by a stem and leaf

• Stem = leading digit(s)

• Leaf = final digit

STA 291 Fall 2009 Lecture 4

7

Stem Leaf #

20 3 1

19

18

17

16

15

14

13 135 3

12 7 1

11 334469 6

10 2234 4

9 08 2

8 03469 5

7 5 1

6 034689 6

5 0238 4

4 46 2

3 0144468999 10

2 039 3

1 67 2

----+----+----+----+

STA 291 Fall 2009 Lecture 4

8

• Useful for small data sets

• Less than 100 observations

• Practical problem

• What if the variable is measured on a continuous scale, with measurements like

1267.298, 1987.208, 2098.089, 1199.082 etc.

• Use common sense when choosing “stem” and “leaf”

• Can also be used to compare groups

• Back-to-Back Stem and Leaf Plots, using the same stems for both groups.

• Murder Rate Data from U.S. and Canada

• Note: it doesn’t really matter whether the smallest stem is at top or bottom of the table

STA 291 Fall 2009 Lecture 4

9

AGE

PRESIDENT

AGE

PRESIDENT

AGE

Washington

67

Fillmore

74

Roosevelt

60

Adams

90

Pierce

64

Taft

72

Jefferson

83

Buchanan

77

Wilson

67

Madison

85

Lincoln

56

Harding

57

Monroe

73

Johnson

66

Coolidge

60

Adams

80

Grant

63

Hoover

90

Jackson

78

Hayes

70

Roosevelt

63

Van Buren

79

Garfield

49

Truman

88

Harrison

68

Arthur

56

Eisenhower

78

Tyler

71

Cleveland

71

Kennedy

46

Polk

53

Harrison

67

Johnson

64

Taylor

65

McKinley

58

Nixon

81

Stem and Leaf Plot

Ford 93

Reagan 93

STA 291 Fall 2009 Lecture 4

10

Summary of Graphical and Tabular Techniques

• Discrete data

• Frequency distribution

• Continuous data

• Grouped frequency distribution

• Small data sets

• Stem and leaf plot

• Interval data

• Histogram

• Categorical data

• Bar chart

• Pie chart

• Grouping intervals should be of same length, but may be dictated more by subject-matter considerations

STA 291 Fall 2009 Lecture 4

11

• Present large data sets concisely and coherently

• Can replace a thousand words and still be clearly understood and comprehended

• Encourage the viewer to compare two or more variables

• Do not replace substance by form

• Do not distort what the data reveal

STA 291 Fall 2009 Lecture 4

12

• Don’t have a scale on the axis

• Have a misleading caption

• Distort by using absolute values where relative/proportional values are more appropriate

• Distort by stretching/shrinking the vertical or horizontal axis

• Use bar charts with bars of unequal width

STA 291 Fall 2009 Lecture 4

13

• Frequency distributions and histograms exist for the population as well as for the sample

• Population distribution vs. sample distribution

• As the sample size increases, the sample distribution looks more and more like the population distribution

STA 291 Fall 2009 Lecture 4

14

• The population distribution for a continuous variable is usually represented by a smooth curve

• Like a histogram that gets finer and finer

• Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate

• Symmetric distributions

• Bell-shaped

• U-shaped

• Uniform

• Not symmetric distributions:

• Left-skewed

• Right-skewed

• Skewed

STA 291 Fall 2009 Lecture 4

15

Symmetric

Right-skewed

Left-skewed

STA 291 Fall 2009 Lecture 4

• Center of the data

• Mean

• Median

• Mode

• Dispersion of the data

• Sometimes referred to as spread

• Variance, Standard deviation

• Interquartile range

• Range

STA 291 Fall 2009 Lecture 4

• Mean

• Arithmetic average

• Median

• Midpoint of the observations when they are arranged in order

• Smallest to largest

• Mode

• Most frequently occurring value

STA 291 Fall 2009 Lecture 4

• Sample size n

• Observations x1, x2, …, xn

• Sample Mean “x-bar”

STA 291 Fall 2009 Lecture 4

• Population size N

• Observations x1 , x2 ,…, xN

• Population Mean “mu”

• Note: This is for a finite population of size N

STA 291 Fall 2009 Lecture 4

• Requires numerical values

• Only appropriate for quantitative data

• Does not make sense to compute the mean for nominal variables

• Can be calculated for ordinal variables, but this does not always make sense

• Should be careful when using the mean on ordinal variables

• Example “Weather” (on an ordinal scale)

Sun=1, Partly Cloudy=2, Cloudy=3,

Rain=4, Thunderstorm=5

Mean (average) weather=2.8

• Another example is “GPA = 3.8” is also a mean of observations measured on an ordinal scale

STA 291 Fall 2009 Lecture 4

• Center of gravity for the data set

• Sum of the values above the mean is equal to the sum of the values below the mean

STA 291 Fall 2009 Lecture 4

• Mean

• Sum of observations divided by the number of observations

• Example

• {7, 12, 11, 18}

• Mean =

STA 291 Fall 2009 Lecture 4

• Highly influenced by outliers

• Data points that are far from the rest of the data

• Not representative of a typical observation if the distribution of the data is highly skewed

• Example

• Monthly income for five people

1,000 2,000 3,000 4,000 100,000

• Average monthly income =

• Not representative of a typical observation

STA 291 Fall 2009 Lecture 4