Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures
Download
1 / 58

Introduction to Biostatistics (Pubhlth 540) Lecture 3: Numerical Summary Measures - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Introduction to Biostatistics (Pubhlth 540) Lecture 3: Numerical Summary Measures. Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material. Reading/Home work. -See WEB site. For after all, what is man in nature?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Biostatistics (Pubhlth 540) Lecture 3: Numerical Summary Measures' - gabriel-kinney


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures
Introduction to Biostatistics(Pubhlth 540) Lecture 3: Numerical Summary Measures

Acknowledgement: Thanks to Professor Pagano

(Harvard School of Public Health) for lecture material


Reading home work
Reading/Home work

  • -See WEB site


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

For after all, what is man in nature?

A Nothing in relation to the infinite,

All in relation to nothing,

A central point between nothing and all,

And infinitely far from understanding either.

Blaise Pascal, (1623-1662)

Pensees (1660)








Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of central tendency

  • Population Parameters

  • Sample Statistics

  • Mean

  • Median

  • Mode


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of central tendency

  • Population Parameters


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of central tendency: Mean

Example: FEV per second in 13 adolescents with asthma

2.3, 2.15, 3.50, 2.60, 2.75, 2.82, 4.05,

2.25, 2.68, 3.00, 4.02, 2.85 (n=13)


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

If we collect a man's urine during twenty four hours and mix all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of average European urine!

Claude Bernard (1813-1878)


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Examples all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Approx 4 million singleton births, 1991 :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Examples all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Approx 4 million singleton births, 1991 :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Examples all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Approx 4 million singleton births, 1991 :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Examples all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Approx 4 million singleton births, 1991 :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Examples all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Approx 4 million singleton births, 1991 :

Of 31,417 singleton births resulting

in death :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Properties all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

26.4 years

years


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean: Properties all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Note what happens when one number,

4.02 say, becomes large, say 40.2 :

2.3, 2.15, 3.50, 2.60, 2.75, 2.82, 4.05, 2.25, 2.68, 3.00, 40.2, 2.85

(versus 2.95, from before)

Mean is sensitive to every observation,

it is not robust.


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of central tendency: Median all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

More robust, but not sensitive enough.

Definition: At least 50% of the observations are greater than or equal to the median, and at least 50% of the observations are less than or equal to the median.

2.15, 2.25, 2.30 --- median = 2.25

2.15, 2.25, 2.30, 2.60 ---

(2.25 + 2.30) = 2.275

median =


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Comparing mean and median all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Singleton births, 1991 :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean = 3359 Median = 3374 all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean = 30.4 Median = 30 all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mean = 49.4 Median=7 all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Comparing mean and median all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

When to use mean or median:

Use both by all means.

Mean performs best when we have a

symmetric distribution with thin tails.

If skewed, use the median.

Remember: the mean follows the tail.


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures
Mode all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

  • Mode is defined as the observation that occurs most frequently

  • When the distribution is symmetric, all three measures of central tendency are equal


Comparing mean median and mode
Comparing mean, median and mode all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Bimodal distribution

Mean, Median

Modes


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

  • Range:

    • Simple to calculate

    • Very sensitive to extreme observations

  • Inter Quartile Range (IQR)

    • More robust than the range

  • Variance (Standard Deviation):

    • Quantifies the amount of variability around the mean


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread: Range all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Singleton births, 1991 :


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread: Variance all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread: Variance all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread: Variance all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread: Variance all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

e.g.


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Measures of spread: Variance all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Standard deviation takes on the same unit as the mean


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Empirical Rule:

If dealing with a unimodal and

symmetric distribution, then

Mean ± 1 sd covers approx 67% obs.

Mean ± 2 sd covers approx 95% obs

Mean ± 3 sd covers approx all obs


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Mother’s age: all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Variance & Standard deviation all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a symmetric, unimodal distribution – mean, SD

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± k s.d.s




Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a symmetric, unimodal distribution – mean, SD

Mother’s age: mean = 26.4 yrs

s.d. = 5.84 yrs

Table of

± ks.d.s


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a distribution – Chebychev’s inequality SD

Chebychev’s Inequality

Table of

± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a distribution – Chebychev’s inequality SD

Chebychev’s Inequality

Table of

± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a distribution – Chebychev’s inequality SD

Chebychev’s Inequality

Table of

± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a distribution – Chebychev’s inequality SD

Chebychev’s Inequality

Table of

± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)


Introduction to biostatistics pubhlth 540 lecture 3 numerical summary measures

Characterizing a distribution – Chebychev’s inequality SD

Chebychev’s Inequality

Table of

± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution)


Summary
Summary SD

  • Distributions can be described using:

    • Measures of central tendency

    • Measures of dispersion

  • Measures of central tendency:

    • Mean, Median, Mode

  • Measures of dispersion:

    • Range, IQR, Variance, Standard Deviation

  • Characterizing distributions:

    • Chebyshev’s inequality

    • Empirical rule for symmetric, unimodal distributions


Questions
Questions SD

  • In a certain real estate market, the average price of a single family home was $325,000 and the median price was $225,000. Percentiles were computed for this distribution. Is the difference between the 90th and 50th percentile likely to be bigger than, about the same as, or less than the difference between the 50th and 10th percentile? Explain briefly.

http://www.stat.berkeley.edu/users/rice/Stat2/Chapt4.pdf


Questions1
Questions SD

http://www.stat.berkeley.edu/users/rice/Stat2/Chapt4.pdf


Questions2
Questions SD

  • 1. The average high temperature for Minneapolis is closest to

    (a) 45 degrees (b) 60 degrees (c) 75 degrees (d) 85 degrees

  • 2. The SD of the high temperatures for Minneapolis is closest to (a) 1 degree (b) 3 degrees (c) 5 degrees (d) 20 degrees

  • 3. The average high temperature for Minneapolis is --------- _the average high temperature for Belle Glade. (a) at least ten degrees less than (b) about the same as (c) at least ten degrees higher than

  • 4. The average high temperature for Minneapolis is --------_the average high temperature for Olga. (a) at least ten degrees less than (b) about the same as (c) at least ten degrees higher than

  • 5. The SD of the high temperatures for Minneapolis is -------- the SD of the high temperatures for Belle Glade. (a) about half of (b) about the same as (c) about twice

http://www.stat.berkeley.edu/users/rice/Stat2/Chapt4.pdf