Biostatistics i pubh 6450
1 / 57

Biostatistics I PubH 6450 - PowerPoint PPT Presentation

  • Uploaded on

Biostatistics I PubH 6450. Fall 2005. PubH 6450 – Biostatistics I. Instructor: Susan Telke email: [email protected] (office hours: 3:20pm – 4:20pm (T and TH), location – lecture hall or by appointment, location -A349 Mayo building) Teaching Assistants:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Biostatistics I PubH 6450' - byron-harper

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Biostatistics i pubh 6450

Biostatistics IPubH 6450

Fall 2005

Pubh 6450 biostatistics i
PubH 6450 – Biostatistics I

Instructor: Susan Telke

email: [email protected] (office hours: 3:20pm – 4:20pm (T and TH), location – lecture hall or by appointment, location -A349 Mayo building)

Teaching Assistants:

Pei Li – email: [email protected]

Xiaoxiao Kong – email: [email protected]

Jianmin Liu – email: [email protected]

Xiaobo Liu – email: [email protected]

Ran Li – email: [email protected]

Jia Xu – email: [email protected]

Jay Pottala – email: [email protected]

Book for 6450
Book for 6450

Introduction to the Practice of Statistics -(Moore and McCabe)

Web page
Web Page

Information on the web:

  • General class information

  • Syllabus

  • Course notes (updated weekly)

  • Homework

  • Computer Help

Computer labs
Computer Labs

  • Mayo C381 (Biostatistics Lab)

    Teaching Assistants will have computer sessions located in the mayo lab to help you with your homework assignments.

  • Deihl Hall (Medical Library)

Pc sas

Primary computing environment will be


  • PC SAS is available in computing lab MAYO C381

  • PC SAS can be purchased at the bookstore (one year agreement is about $50).

  • SAS (not PC SAS) is available using the UNIX version of SAS by telnet to the biostat workstation saturn.

Exams and homework
Exams and Homework

  • There will be weekly homework assignments

  • There will be two midterms and one final exam.

    • Students who get an “A” on all exams get an “A” in the course.

    • For all other students the midterms account for 25% each and the final accounts for 30% of the course grade. The remaining 20% is based on homework (best 9)

Introduction to pubh 6450
Introduction to PubH 6450

  • The study of statistics explores the collection, organization, analysis and interpretation of numerical data.

  • When the focus of the analysis is on the biological and health sciences it is called Biostatistics.

Trial by jury a familiar scenario
Trial by Jury:A Familiar Scenario

  • You have a crime.

  • You have a suspect.

  • A police investigation collects evidence against the suspect.

  • A prosecutor presents summarized evidence to a jury.

Trial by jury the process
Trial by Jury:The Process

  • The Jury reaches a verdict based on their judgment of the evidence presented.

  • Rules for determining a verdict:

  • The accused is innocent until proven guilty

  • The evidence must be sufficient to convict beyond all reasonable doubt

  • Decision must be unanimous

Trial by jury the need
Trial by Jury:The Need

Why is the Trial by Jury process needed?

The truth is unknown or uncertain because of :

  • Variability: Every case is different.

  • Incomplete information: Some evidence may be missing.

Trial by jury rationale
Trial by Jury:Rationale

  • Trial by Jury is the way our society deals with uncertainties related to criminal justice.

  • Its goal is to minimize errors/mistakes within the limits of human understanding.

  • It is impossible to eliminate all mistakes in verdicts made based on uncertain, incomplete evidence.

Trial by jury dealing with uncertainty
Trial by Jury:Dealing with Uncertainty

  • A hypothesis (assumption) is stated: “Every person is innocent until proven guilty”

  • Data is collected: Evidence against the hypothesis – not against the suspect.

  • A verdict is reached based on the evidence about whether the hypothesis should be rejected. (If hypothesis rejected – verdict is guilty)

Trial by jury elements of a successful trial
Trial by Jury:Elements of a Successful Trial

  • A probable cause (a crime and a suspect).

  • A thorough investigation (by police).

  • An efficient presentation (by D.A.’s office attorneys – organization and summarization of evidence).

  • A fair & impartial assessment by the jury.

Trial by jury how does this relate to biostatistics
Trial by Jury:How does this relate to Biostatistics?

  • A probable cause: The crime is lung cancer & the suspect is cigarette smoking.

  • A thorough investigation: A clinic trial or case control study to gather information.

  • An efficient presentation: Using biostatistics tools to organize and summarize data.

  • A fair & impartial assessment by the jury: Making proper statistical inference based on data collected.

Areas of biostatistics
Areas of Biostatistics

Experimental Designs:

How will the data be collected?

Descriptive Statistics:

Organization of data

Summary statistics of data

Effective graphical representation of data

Statistical Inference

The science of drawing statistical conclusions from specific data using a knowledge of probability.

Goals …

By the end of the course you should be able to use the following aspects of statistical thinking:

  • Critically read the literature in your field that makes use of statistical analysis.

  • Read about new statistical techniques and understand how they may apply to your field.

  • Create and analyze descriptive statistics based on data.

  • Develop hypotheses and use appropriate statistics to evaluate these hypotheses.

The language of statistics definitions
The Language of Statistics:Definitions

  • Population: The entire group of people, animals or things about which we want information. (e.g. population of the U.S.)

  • Individuals(units): The objects described by a set of data. (e.g. People)

  • Sample: A part of the population from which we actually collect information, used to draw conclusions about the whole population. (e.g. sample=1000 people)

The language of statistics definitions1
The Language of Statistics:Definitions

  • Variable: Any characteristic of an individual. A variable can take different values for different individuals. Also, a variable can take different values for the same individual at different times. (e.g. Height, age, gender)

Two types of variables
Two “Types” of Variables

  • Quantitative Variable: measures that are recorded on a naturally occurring numerical scale. Operations such as adding and “averaging” make sense. (e.g. Height, time, test scores)

  • Qualitative Variable (Categorical): Variables that are classified into one of a group of categories. Arithmetic operations do NOT make sense with this type of variable. (e.g. Geographical location, gender)


  • Age in years

  • ID #

  • Temperature in degrees

  • Political party

  • Smoking status

  • Length in cm

  • Gender

  • Blood pressure

Two methods for describing sets of data
Two Methods for Describing Sets of Data

Exploratory Data analysis: examining data in order to describe their main features.



Displaying distributions with graphs
Displaying Distributions with Graphs

  • Distribution: The distribution of a variable tells us what values it takes on and how often it takes on these values.

Describing categorical variables with graphs
Describing Categorical Variables with Graphs

Bar Graphs

NOTE: 668 children living in crack/cocaine households were categorized based on race

Describing categorical variables with graphs1
Describing Categorical Variables with Graphs

Pie Chart

NOTE: 668 children living in crack/cocaine households were categorized based on race

Describing quantitative data
Describing Quantitative Data

  • Stemplots

  • Histograms

  • Time Plots

  • Box Plots (section 1.2)


Quick easy way to see distribution of 40 or less data points

  • How to make a stemplot

    • Create Leaf

    • Order Data

    • Arrange Stems

    • Place Leaves

Stemplots an example
Stemplots:An Example

Average Monthly Temperature. Source: World Almanac 1996 p.180

Stemplots an example1
Stemplots:An Example


  • Histograms are useful to display the distribution of large amounts of data.

  • Steps for creating a histogram

    • Divide range into classes of equal width

    • Count number of observations in each class

    • Draw histogram

Histogram an example
Histogram:An Example

  • Weights of 92 Penn State Students:

    • Females

      140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108

    • Males

      140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155

Number of intervals
Number of Intervals

  • There is no clear-cut rule on the number of intervals or classes that should be used.

  • Too many intervals – the data may not be summarized enough for a clear visualization of how they are distributed.

  • Too few intervals – the data may be over-summarized and some of the details of the distribution may be lost.

Pictures of data histograms
Pictures of Data: Histograms

  • Blood pressure data on a sample of 113 men

Histogram of the Systolic Blood Pressure for 113 men. Each bar spans a width of 5 mmHg on the horizontal axis. The height of each bar represents the number of individuals with SBP in that range.

Pictures of data histograms1
Pictures of Data: Histograms

Another histogram of the blood pressure of 113 men. In this graph, each bar has a width of 20 mmHg, and there are a total of only 4 bars making it hard to characterize the distribution of blood pressures in the sample.

Pictures of data histograms2
Pictures of Data: Histograms

Yet another histogram of the same BP information on 113 men. Here, the bin width is 1 mmHg, perhaps giving more detail than is necessary.

Width of intervals
Width of Intervals

  • Without some specific reason (i.e. showing infant death) the intervals should all be the same width.

  • Common width =W=

    • R = range of the data

    • k = the number of intervals

Consideration when determining width
Consideration when Determining Width

  • Width should be chosen so that it is convenient to use or easy to recognize (multiples of 5 or 1).

  • The beginning of the first interval must be low enough so that the first interval includes the smallest observation.

  • If the data has x decimal places, the interval limits should also have x decimal places.

Data example
Data Example

  • Weight in pounds of 57 school children at a day-care center:

    68 63 42 27 30 36 28 32 79 27

    22 23 24 25 44 65 43 23 74 51

    36 42 28 31 28 25 45 12 57 51

    12 32 49 38 42 27 31 50 38 21

    16 24 69 47 23 22 43 27 49 28

    23 19 46 30 43 49 12

Data example step 1
Data Example – Step 1

  • From the data we have:

    • Minimum = 12

    • Maximum = 79

    • R = 79-12 = 67

  • If we use k=5 and 15 we get:

    • W= 69/5 = 13.4

    • W= 69/15 = 4.5

  • Since the dataset is not large, we will choose w=10 to have fewer intervals.

  • Data example step 2
    Data Example – Step 2

    • Next we have to construct the intervals.

    • With w = 10 and minimum=12 choose the first interval to start at 10.

      INTERVALS (in lbs): 10-19







    Data example step 3
    Data Example – Step 3

    Examine the values one at a time and tally the number in each interval.

    Data example step 4
    Data Example – Step 4

    Calculate Relative Frequencies:

    Relative freq. = frequency in interval

    # obs in dataset


    • Horizontal scale represents the value of the variable

    • The vertical scale represents the frequency or relative frequency in each interval

    • Rectangular bars are joined together

    Consider distibutions
    Consider Distibutions

    • If the data are homogeneous, the graphs usually show a unimodal pattern with one peak in the middle.

    • The plots can be used to determine if the data is symmetric. A symmetric distribution is one in which the distribution has the same shape on both sides of the peak.

    Shapes of the distribution
    Shapes of the Distribution

    • Three common shapes of frequency distributions:




    Symmetrical and bell shaped

    Positively skewed or skewed to the right

    Negatively skewed or skewed to the left

    Shapes of the distribution1
    Shapes of the Distribution

    • Three less common shapes of frequency distributions:








    • Data is displayed over time.

    • Data may show seasonal, yearly or changes in environment over time.

    • Timeplot data can give different impressions depending on the scales used on the x and y axis.

    Timeplots an example
    Timeplots:An Example

    Time series data can display effects of changes in government policy. The table shows the data on motor vehicle deaths in the U.S. (death rate per 100 million miles driven).

    Timeplots an example1
    Timeplots:An Example

    Timeplots an example2
    Timeplots:An Example

    Timeplots an example3
    Timeplots:An Example

    • During these years, safety requirements for motor vehicles became stricter and interstate highways replaced old roads.

    • In 1974 the national speed limit was lowered to 55 miles an hour. In the mid 1980’s most states raised speed limits to 65 miles an hour. Some say lower speed limits saved lives. Is this evident in our plot?