chapter 1 examining distributions n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 1: Examining Distributions PowerPoint Presentation
Download Presentation
Chapter 1: Examining Distributions

Loading in 2 Seconds...

play fullscreen
1 / 49

Chapter 1: Examining Distributions - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 1: Examining Distributions' - august


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide3
Many public health efforts are directed toward increasing levels of physical activity. “Physical Activity in Urban White, African American, and Mexican American Women” (Medicine and Science in Sports and Exercise [1997]) reported on physical activity patterns in urban women. The accompanying data set given the preferred leisure-time physical activity for each of 30 Mexican American Women. The following coding is used; W=walking, T=weight training, C=cycling, G=gardening, A=aerobics.

W T A W G T W W C W

T W A T T W G W W C

A W A W W W T W W T

Construct what you think is an appropriate graph to display this information.

slide4

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

slide5

Individual

Definition:

Pg. 4-19

Variable

Definition:

Categorical

Definition:

Examples:

Types of graphs used:

Quantitative

Definition:

Examples

Types of graphs used:

slide6

Individual

Definition: object described by a set of data

Pg. 4-19

Variable

Definition: characteristic of an individual

Categorical

Definition: placing into group or category

Examples: gender, race, smoker, marital status

Types of graphs used: bar graph; pie chart

Quantitative

Definition: Numerical values as a result of a measurement

Examples: age, blood pressure, salary

Types of graphs used: histogram, stemplot, time plot

categorical variable
Bar Graph (pictograph)

What does the height show?

count or %

Does graph need to include all categories?

no

Pg 8 #1.3

Pie Chart

Shows?

Visual for comparison with whole group

Does graph need to include all categories?

yes

Pg 8 #1.4 Can we make a pie chart from data?

Categorical Variable
histogram
Histogram
  • Divide data into classes of equal width (5-15)
  • Count number in each class
  • Draw bar graph with no space between bars
  • Example: NCAA
slide9

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

interpreting histograms
Interpreting histograms
  • Look for overall pattern & striking deviations
  • Describe shape, center, and spread
    • Symmetric
    • Skewed to the right –

right side extends much

farther out than the left

side

quantitative variable cont
Quantitative variable cont.
  • Stemplot
    • For small data sets
    • Quicker to make and presents more detailed info
    • Stem consists of all but final, rightmost digit, and leaf is the final digit
    • Example: NCAA
  • Time plot
    • To show a change over time
    • Example: pg 19 #1.10
slide12

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

what kind of graph would be appropriate
What kind of graph would be appropriate?
  • Whether a spun penny lands “heads” or “tails”
  • The number of calories in a fast food sandwich
  • The life expectancy of a nation
  • The occupational background of a Civil War general
  • The weight of an automobile
  • For whom an American voted in the 1992 Presidential election
  • The age of a bride on her wedding day
  • The average low temperature in January for Appleton
slide15
In trying to make the graph more visually interesting by replacing the bars of a bar chart with milk buckets, areas are distorted.
slide17

Another common distortion occurs when a third dimension is added to bar charts or pie charts. The 3-D version distorts the areas, and as a consequence, is much more difficult to interpret correctly.

slide18

It is common to see scatterplots with broken axes, but be cautious of time plots, bar graphs, or histograms with broken axes. Broken axes in time plots can exaggerate the magnitude of change over time.

slide19

In bar graphs and histograms, the vertical axis should never be broken. For example, by starting the vertical axis at 50 exaggerates the gain. The area for the rectangle representing 68 is more than three times the area of the rectangle representing 55.

what might be wrong with the following
What might be wrong with the following?
  • Only 3% of the men surveyed read cosmopolitan magazine.
  • Since most automobile accidents occur within 15 miles of a person’s residence, it is safer to make long trips.
  • A television commercial claims that “our razor blades are manufactured to such high standards that they will give you a shave that is 50% closer”.
  • A national health food magazine claims that “95% of its subscribers who follow the magazines recommendation and take megadoses of vitamin C are healthy and vigorous”.
  • During 1990 there were 234 accidents involving drunken drivers and 15,897 accidents involving drunken pedestrians reported in Danville. Can we conclude that it is safer in Danville to be a drunken driver than a drunken pedestrian?
slide24
Population – the entire group of individuals that we want information about
  • Sample – part of the population that we actually examine in order to gather information and make conclusions
slide25
Mean
  • Measure of its center or average
  • µ used for population mean

or

median
Median
  • Midpoint of distribution
  • To find median:
  • Symmetrical distribution – mean and median are close together
  • Skewed distribution – the mean is farther out in the long tail than is the median
slide27
Mode
  • Data that is repeated most often
quartiles
Quartiles
  • Spread of the middle half of data
  • To calculate
    • arrange data in ascending order and locate median
    • lower quartile (Q1) is the median of the low half of data
    • upper quartile (Q3) is the median of the upper half
    • Q1 is larger than 25% of data
    • Q2 is larger than 50% of data
    • Q3 is larger than 75% of data
slide29

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

5 number summary and boxplot
5 number summary and boxplot
  • 5 number summary – minimum, Q1, Q2, Q3, maximum
  • Boxplot – graph of 5 number summary
    • Best used for side-by-side comparison of more than one set of data
    • Include numerical scale in the graph
outliers
Outliers
  • An unusually small or large data value
  • Calculate interquartile range (Q3 – Q1)
  • An observation is an outlier if it falls more than 1.5 times the IQR above Q3 or below Q1
standard deviation
Standard Deviation
  • Measures spread by looking at how far the observations are from their mean
  • Variance formula:
  • Standard deviation formula:
  • s used for sample data; σ is used for population (equation is slightly different)
choosing a summary
Choosing a summary
  • The five number summary is used for describing a skewed distribution or a distribution with outliers
  • Use mean for reasonably symmetric distributions that are free of outliers
1 3 normal distributions

1.3 Normal Distributions

Compact picture of the overall pattern of the data

density curve
Density curve

pg 46 & 47

Scores on national tests often have a regular distribution

symmetrical

partial area represents % of total “students” (observations)

make total area under curve equal one

normal distributions
Normal Distributions

pg 51-52

  • What are they?
    • Density curves that are symmetrical, single-peaked, and bell-shaped
  • Curve is described by its . . .
    • mean µ and standard deviation σ
  • Where is the mean located?
    • at the center of the curve
  • What controls how spread out the curve is?
    • Standard deviation controls the spread; the larger the σ the more spread out the data
  • Where is the σ on the curve?
    • at the points of change of curvature
why are normal curves important
Why are normal curves important?
  • Good descriptions for some distributions of real data (scores on tests, measurements of same quantity, characteristics of biological populations)
  • Good approximations to the results of many kinds of chance outcomes (tossing coin, rolling die)
68 95 99 7 rule
68-95-99.7 rule

In a normal distribution:

  • 68% of the observations fall within 1 of the mean
  • 95% of the observations fall within 2 of the mean
  • 99.7% of the observations fall within 3 of the mean
example light bulbs x 1600 hrs s 100 hr
example: Light bulbs: x = 1600 hrs, s = 100 hr
  • 68% of light bulbs last:
  • 95% of light bulbs last:
  • 99.7% of light bulbs last:
standard normal curve
Standard normal curve
  • standardizing a normal curve is making all normal distributions the same
  • normal distribution with mean = 0 and standard deviation = 1
  • z-score (# of standard deviations a value is away from the mean)
    • Formula:
  • any question about what proportion of observations lie in some range of values can be answered by finding the area under the curve (percentage)
what of the population has a z score
What % of the population has a z-score. . .
  • Less than -1.76
    • Shaded area = .0392 or 3.92%
  • Less than 0.58
    • Shaded area = .7190 or 71.90%
  • Greater than 1.96
    • Lower area = .9750 so shaded area = .0250 or 2.50%
  • Between -1.76 and .58
    • .7190 - .0392 = .6798 or 67.98%
in a standard normal distribution find the z score that cuts off the
In a standard normal distribution, find the z-score that cuts off the
  • bottom 10%
    • .1003 is z = -1.28
  • top 15%
    • .8508 is z = 1.04

.10

.85

.15

if the probability of getting less than a certain z value is 1190 what is the z value
If the probability of getting less than a certain z-value is .1190, what is the z-value?
  • z = -1.18

.1190

if the probability of getting larger than a certain z value is 0129 what is the z value
If the probability of getting larger than a certain z-value is .0129, what is the z-value?
  • 1 - .0129 = .9871
  • z = 2.23

.0129

in a normal distribution 25 and 5 what is the probability of obtaining a value
In a normal distribution µ=25 and =5. What is the probability of obtaining a value
  • greater than 30?
    • z = (30-25)/5 = 1
    • 1-.8413 = .1587 or 15.87%
  • less than 15?
    • z = (15-25)/5 = -2
    • .0228 or 2.28%
  • between 20 and 30?
    • z = -1 and z = 1
    • .8413-.1587 = .6826 or 68.26%

30

15

20

30

slide49
The Flatt Tire Corporation claims that the useful life of its tires is normally distributed with a mean life of 28,000 miles and with a standard deviation of 4000 miles. What percentage of the tires are expected to last more than 35,000 miles?
  • z = (35000-28000) / 4000 = 1.75
  • 1 - .9599 = .0401 or 4.01%

35000