Lecture unit 2 graphical and numerical summaries of data
Download
1 / 58

Sections 2.1 and 2.2 Displaying Qualitative and Quantitative Data - PowerPoint PPT Presentation


  • 730 Views
  • Uploaded on

Lecture Unit 2 Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: 1) Construct graphs that appropriately describe data 2) Calculate and interpret numerical summaries of a data set.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sections 2.1 and 2.2 Displaying Qualitative and Quantitative Data' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture unit 2 graphical and numerical summaries of data l.jpg
Lecture Unit 2Graphical and Numerical Summaries of Data

UNIT OBJECTIVES

At the conclusion of this unit you should be able to:

  • 1) Construct graphs that appropriately describe data

  • 2) Calculate and interpret numerical summaries of a data set.

  • 3) Combine numerical methods with graphical methods to analyze a data set.

  • 4) Apply graphical methods of summarizing data to choose appropriate numerical summaries.

  • 5) Apply software and/or calculators to automate graphical and numerical summary procedures.


Displaying qualitative data l.jpg

Displaying Qualitative Data

Section 2.1

“Sometimes you can see a lot just by looking.”

Yogi Berra

Hall of Fame Catcher, NY Yankees


The three rules of data analysis won t be difficult to remember l.jpg
The three rules of data analysis won’t be difficult to remember

  • 1. Make a picture—reveals aspects not obvious in the raw data; enables you to think clearly about the patterns and relationships that may be hiding in your data.

  • 2. Make a picture —to show important features of and patterns in the data. You may also see things that you did not expect: the extraordinary (possibly wrong) data values or unexpected patterns

  • 3. Make a picture —the best way to tellothers about your data is with a well-chosen picture.


Bar charts show counts or relative frequency for each category l.jpg
Bar Charts: show counts or relative frequency for each category

  • Example: Titanic passenger/crew distribution


Pie charts shows proportions of the whole in each category l.jpg
Pie Charts: shows proportions of the whole in each category category

  • Example: Titanic passenger/crew distribution


Example top 10 causes of death in the united states 2001 l.jpg
Example: Top 10 causes of death in the United States 2001 category

For each individual who died in the United States in 2001, we record what was the cause of death. The table above is a summary of that information.


Slide7 l.jpg

The number of individuals who died of an accident in 2001 is approximately 100,000.

Top 10 causes of death: bar graph

Each category is represented by one bar. The bar’s height shows the count (or sometimes the percentage) for that particular category.

Top 10 causes of deaths in the United States 2001


Slide8 l.jpg

Top 10 causes of deaths in the United States 2001 approximately 100,000.

Bar graph sorted by rank

 Easy to analyze

Sorted alphabetically

 Much less useful


Slide9 l.jpg

Top 10 causes of death: pie chart approximately 100,000.

Each slice represents a piece of one whole. The size of a slice depends on what

percent of the whole this category represents.

Percent of people dying from

top 10 causes of death in the United States in 2001


Slide10 l.jpg

Make sure your approximately 100,000.

labels match

the data.

Make sure

all percents

add up to 100.

Percent of deaths from top 10 causes

Percent of deaths from all causes


Child poverty before and after government intervention unicef 1996 l.jpg
Child poverty before and after government intervention—UNICEF, 1996

  • What does this chart tell you?

    • The United States has the highest rate of child poverty among developed nations (22% of under 18).

    • Its government does the least—through taxes and subsidies—to remedy the problem (size of orange bars and percent difference between orange/blue bars).

  • Could you transform this bar graph to fit in 1 pie chart? In two pie charts? Why?

The poverty line is defined as 50% of national median income.


Unnecessary dimension in a pie chart l.jpg
Unnecessary dimension in a pie chart intervention—UNICEF, 1996


Contingency tables categories for two variables l.jpg

marg. dist. of survival intervention—UNICEF, 1996

710/2201 32.3%

1491/2201 67.7%

885/2201 40.2%

325/2201 14.8%

285/2201 12.9%

706/2201 32.1%

marg. dist. of class

Contingency Tables: Categories for Two Variables

  • Example: Survival and class on the Titanic

Marginal distributions


Marginal distribution of class bar chart l.jpg
Marginal distribution of class. intervention—UNICEF, 1996Bar chart.


Marginal distribution of class pie chart l.jpg
Marginal distribution of class: Pie chart intervention—UNICEF, 1996


Contingency tables categories for two variables cont l.jpg
Contingency Tables: Categories for Two Variables (cont.) intervention—UNICEF, 1996

  • Conditional distributions.

    Given the class of a passenger, what is the chance the passenger survived?



Contingency tables categories for two variables cont19 l.jpg
Contingency Tables: Categories for Two Variables (cont.) intervention—UNICEF, 1996

Questions:

  • What fraction of survivors were in first class?

  • What fraction of passengers were in first class and survivors ?

  • What fraction of the first class passengers survived?

202/710

202/2201

202/325


3 way tables l.jpg
3-Way Tables intervention—UNICEF, 1996

  • Example: Georgia death-sentence data


Uc berkeley lawsuit l.jpg
UC Berkeley Lawsuit intervention—UNICEF, 1996


Lawsuit cont l.jpg
LAWSUIT (cont.) intervention—UNICEF, 1996


Simpson s paradox l.jpg
Simpson’s Paradox intervention—UNICEF, 1996

  • The reversal of the direction of a comparison or association when data from several groups are combined to form a single group.


Fly alaska airlines the on time airline l.jpg
Fly intervention—UNICEF, 1996Alaska Airlines, the on-time airline!


American west wins you re a hero l.jpg
American West Wins! intervention—UNICEF, 1996You’re a Hero!


Section 2 2 displaying quantitative data l.jpg

Section 2.2 intervention—UNICEF, 1996Displaying Quantitative Data

Histograms

Stem and Leaf Displays


Relative frequency histogram of exam grades l.jpg
Relative Frequency Histogram of Exam Grades intervention—UNICEF, 1996

.30

.25

.20

Relative frequency

.15

.10

.05

0

40

50

60

70

80

90

100

Grade


Frequency histograms l.jpg
Frequency Histograms intervention—UNICEF, 1996


Frequency histograms29 l.jpg
Frequency Histograms intervention—UNICEF, 1996

A histogram shows three general types of information:

  • It provides visual indication of where the approximate center of the data is.

  • We can gain an understanding of the degree of spread, or variation, in the data.

  • We can observe the shape of the distribution.


All 200 m races 20 2 secs or less l.jpg
All 200 m Races 20.2 secs or less intervention—UNICEF, 1996


Histograms showing different centers l.jpg
Histograms Showing Different Centers intervention—UNICEF, 1996


Histograms same center different spread l.jpg
Histograms - Same Center, Different Spread intervention—UNICEF, 1996


Frequency and relative frequency histograms l.jpg
Frequency and Relative Frequency Histograms intervention—UNICEF, 1996

  • identify smallest and largest values in data set

  • divide interval between largest and smallest values into between 5 and 20 subintervals called classes

    * each data value in one and only one class

    * no data value is on a boundary


How many classes l.jpg
How Many Classes? intervention—UNICEF, 1996


Histogram construction cont l.jpg
Histogram Construction (cont.) intervention—UNICEF, 1996

* compute frequency or relative frequency of observations in each class

* x-axis: class boundaries;

y-axis: frequency or relative frequency scale

* over each class draw a rectangle with height corresponding to the frequency or relative frequency in that class


Ex no of daily employee absences from work l.jpg
Ex. No. of daily employee absences from work intervention—UNICEF, 1996

  • 106 obs; approx. no of classes=

    {2(106)}1/3 = {212}1/3 = 5.69

    1+ log(106)/log(2) = 1 + 6.73 = 7.73

  • There is no single “correct” answer for the number of classes

  • For example, you can choose 6, 7, 8, or 9 classes; don’t choose 15 classes


Excel histogram l.jpg
EXCEL Histogram intervention—UNICEF, 1996


Absences from work cont l.jpg
Absences from Work (cont.) intervention—UNICEF, 1996

  • 6 classes

  • class width: (158-121)/6=37/6=6.17 7

  • 6 classes, each of width 7; classes span 6(7)=42 units

  • data spans 158-121=37 units

  • classes overlap the span of the actual data values by 42-37=5

  • lower boundary of 1st class: (1/2)(5) units below 121 = 121-2.5 = 118.5


Excel histogram39 l.jpg
EXCEL histogram intervention—UNICEF, 1996


Grades on a statistics exam l.jpg
Grades on a statistics exam intervention—UNICEF, 1996

Data:

75 66 77 66 64 73 91 65 59 86 61 86 61

58 70 77 80 58 94 78 62 79 83 54 52 45

82 48 67 55


Frequency distribution of grades l.jpg
Frequency Distribution of Grades intervention—UNICEF, 1996

Class Limits Frequency

40 up to 50

50 up to 60

60 up to 70

70 up to 80

80 up to 90

90 up to 100

Total

2

6

8

7

5

2

30


Relative frequency distribution of grades l.jpg
Relative Frequency Distribution of Grades intervention—UNICEF, 1996

Class Limits Relative Frequency

40 up to 50

50 up to 60

60 up to 70

70 up to 80

80 up to 90

90 up to 100

2/30 = .067

6/30 = .200

8/30 = .267

7/30 = .233

5/30 = .167

2/30 = .067


Relative frequency histogram of grades l.jpg
Relative Frequency Histogram of Grades intervention—UNICEF, 1996

.30

.25

.20

Relative frequency

.15

.10

.05

0

40

50

60

70

80

90

100

Grade


Stem and leaf displays l.jpg
Stem and leaf displays intervention—UNICEF, 1996

  • Have the following general appearance

    stem leaf

    1 8 9

    2 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4


Stem and leaf displays45 l.jpg
Stem and Leaf Displays intervention—UNICEF, 1996

  • Partition each no. in data into a “stem” and “leaf”

  • Constructing stem and leaf display

    1) deter. stem and leaf partition (5-20 stems)

    2) write stems in column with smallest stem at top; include all stems in range of data

    3) only 1 digit in leaves; drop digits or round off

    4) record leaf for each no. in corresponding stem row; ordering the leaves in each row helps


Example employee ages at a small company l.jpg
Example: employee ages at a small company intervention—UNICEF, 1996

18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39; stem: 10’s digit; leaf: 1’s digit

  • 18: stem=1; leaf=8; 18 = 1 | 8

    stem leaf

    1 8 9

    2 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4


Suppose a 95 yr old is hired l.jpg
Suppose a 95 yr. old is hired intervention—UNICEF, 1996

stem leaf

1 8 9

2 1 2 8 9 9

3 2 3 8 9

4 0 1

5 6 7

6 4

7

8

9 5


Number of td passes by nfl teams 2009 season stems are 10 s digit l.jpg
Number of TD passes by NFL teams: 2009 season intervention—UNICEF, 1996(stems are 10’s digit)


Pulse rates n 138 l.jpg
Pulse Rates n = 138 intervention—UNICEF, 1996


Advantages disadvantages of stem and leaf displays l.jpg
Advantages/Disadvantages of Stem-and-Leaf Displays intervention—UNICEF, 1996

  • Advantages

    1) each measurement displayed

    2) ascending order in each stem row

    3) relatively simple (data set not too large)

  • Disadvantages

    display becomes unwieldy for large data sets


Population of 185 us cities with between 100 000 and 500 000 l.jpg
Population of 185 US cities with between 100,000 and 500,000 intervention—UNICEF, 1996

  • Multiply stems by 100,000



Interpreting graphical displays shape l.jpg

Symmetric distribution 1999, 2009

  • A distribution is skewed to the rightif the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the leftif the left side of the histogram extends much farther out than the right side.

Skewed distribution

Complex, multimodal distribution

  • Not all distributions have a simple overall shape, especially when there are few observations.

Interpreting Graphical Displays: Shape

  • A distribution is symmetricif the right and left sides of the histogram are approximately mirror images of each other.


Outliers l.jpg
Outliers 1999, 2009

An important kind of deviation is an outlier. Outliersare observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.

The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population.

A large gap in the distribution is typically a sign of an outlier.

Alaska

Florida


Other graphical methods for economic data l.jpg
Other Graphical Methods for Economic Data 1999, 2009

  • Time plots

    plot observations in time order, with time on the horizontal axis and the vari-able on the vertical axis

    ** Time series

    measurements are taken at regular intervals (monthly unemployment, quarterly GDP, weather records, electricity demand, etc.)




End of section 2 2 l.jpg
End of Section 2.2 1999, 2009


ad