Probability and Statistics Today’s Goals

1 / 36

# Probability and Statistics Today’s Goals - PowerPoint PPT Presentation

Probability and Statistics Today’s Goals. Understand what constitutes an acceptable sample for a scientific study Understand the pitfalls in observational studies HW #10 (due Wed. April 22) Ch 5: 27,68, 69; Plus One web problem. Article will be due April 24. What is statistics?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Probability and Statistics Today’s Goals

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Probability and StatisticsToday’s Goals
• Understand what constitutes an acceptable sample for a scientific study
• Understand the pitfalls in observational studies
• HW #10 (due Wed. April 22) Ch 5: 27,68, 69; Plus One web problem.
• Article will be due April 24
What is statistics?
• Statistics are numbers that summarize the results of a study.
• Statistical inference formalizes the process of learning through observation.
• Statistics is the field that studies how to efficiently collect informative data, explore and interpret these data and draw conclusions based on them.
Statistical Inference
• Designing Experiments
• What data should be collected?
• Sampling questions
• Analyzing data from experiments
• Sample statistics
• Pictorial representations
• Confidence intervals
• Hypothesis testing.
• Regressions for correlation analysis
Example 1
• An engineer starting a company has developed a new coating for pipes to resist corrosion. He believes that it will perform better than standard coatings and wants to convince clients that the new coating is better.
• He obtains a measure of corrosion by applying new coating to one section of the pipe and standard coating to a second section of the pipe.
• First pipe (new coating) corrosion free 256 days,
• second pipe (standard coating) corrosion free 243 days.
• Can we conclude that the new coating works better than the old coating?
• What would you do differently?
What can we do to improve things?
• Collect multiple observations.
• So suppose we collect 100 observations of the coating effectiveness. (Controlling for the exposure for coated and non-coated)
• We then compute the mean number of corrosion free days for both treatments, the new coating and the standard coating.
• Suppose now that the mean number of corrosion free days for the new coating is 250 and the mean number of corrosion free days for the standard coating is 242.
• Can we conclude that the coating with the better mean effectiveness will definitely outperform the coating with the worse mean effectiveness?
Example 2
• A parts' supplier tells us that only 1 out of every 100 batteries shipped to us will have a lifetime of less than 100 hours.
• We are shipped a lot of 100,000 batteries.
• We could look at all of the batteries, but to do so would cost us way too much money.
• What might we decide to do?
Example 2
• Suppose we decided to randomly sample 200 batteries.
• And suppose 10 of the 200 batteries have life less than 100 hours.
• What can we say to the manufacturer that might convince him or her that the shipment we received does not meet the specifications?
• In this case, we want to say something like the following:
• If only one out of every 100 batteries has a lifetime of less than 100 hours, then the probability that in a random sample of 200 batteries 10 would have lifetimes less than 100 hours is very small, say less than 1 in 1000.
• How could we have gotten such a bad sample?
Notation
• Population:collection of all potential observations.
• Sample: the subset of the population that is actually observed. A collection of observations.
• Observing all units in the population (called census) is usually expensive or infeasible.
• Statistical inference extrapolates from a sample to the population being sampled.
Examples:
• Population: Diameters of all shafts in a lot.

Sample: Diameters of the shafts that are actually measured.

• Population: Employment status of all eligible adults in the US.

Sample: Employment status of subjects who are interviewed.

• Population: Lifetimes of the items made by a certain manufacturing process.

Sample: Lifetimes of the subset of items tested

Inferential Statistics
• Make inferences about a given population based on a sample drawn from it. The inferences support decisions.
• E.g.: Deciding whether or not to accept the lot based on the diameters of the sample taken.
• Issues:
• The sample must be representative of the population.
• Random sample
• The inferences/decisions based on a sample may be in error Sampling Error
• Quantified by the tools of probability.
Exercise 1
• A consumer magazine article asks,

HOW SAFE IS THE AIR IN AIRPLANES? and then says that its study of air quality is based on measurements taken on 158 different flights of U.S. based airlines.

• Identify the population and the sample.
Exercise 2
• For each of the following situations, identify the population and the sample and comment on whether you think that the sample is or isn't representative.
• A member of Congress wants to know what his constituents think of proposed legislation on health insurance. His staff reports that 228 letters have been received on the subject, of which 193 oppose the legislation.
• A machinery manufacturer purchases voltage regulators from a supplier.There are reports that variation in the output voltage of the regulators is affecting the performance of the finished products. To assess the quality of the supplier's production, the manufacturer subjects a sample of 5 regulators from the last shipment to careful laboratory analysis.
Exercise 2
• For each of the following situations, identify the population and the sample and comment on whether you think that the sample is or isn't representative.
• A member of Congress wants to know what his constituents think of proposed legislation on health insurance. His staff reports that 228 letters have been received on the subject, of which 193 oppose the legislation.
• True if the sample is representative
• False if not
Exercise 2
• For each of the following situations, identify the population and the sample and comment on whether you think that the sample is or isn't representative.
• A machinery manufacturer purchases voltage regulators from a supplier.There are reports that variation in the output voltage of the regulators is affecting the performance of the finished products. To assess the quality of the supplier's production, the manufacturer subjects a sample of 5 regulators from the last shipment to careful laboratory analysis.
• True if sample is representative
• False if not
Sampling
• In order for the inferences to be reliable, we must have a representative sample
• A haphazard or opportunistic sample is prone to bias

Sampling designs

• Simple Random Sample (SRS):
• Independence: the selection of one unit has no influence on the selection of other units
• Lack of bias: each unit has the same chance of being chosen
• All possible samples are equally likely.
What might cause samples to not be random?
• A professor uses students.
• A survey is done by phone.
• who might get missed?
• A survey is done on the internet.
• who might get missed?
• A survey is left in a doctors office.
• who might choose to complete or not complete?
Sampling Designs
• Stratified random sample
• Divide the population into relatively homogeneous subpopulations, called strata
• Take a SRS from each stratum
• e.g.: estimation of out-of-spec raw material can be done by taking separate SRS from the materials supplied by different vendors
• More accurate estimate plus separate estimates for each vendor
Example
• You want to determine the relationship between the weather and the number of children driven to school in cars.
• Is it better to observe everyday for a week; or
• five Thursday’s in a row?
• Why?
• What if you had the resources to observe on 50 days?
Designing Experiments:controls
• A key question in all scientific explorations is “compared to what?”
• Having a “control” is essential for interpreting the results of a study.
• A controlled study is one of the following
• A study containing a control group
• A study in which the investigator assigns treatment and non-treatment (“control”).
Designing Experiments:controls
• A key question in all scientific explorations is “compared to what?”
• Having a “control” is essential for interpreting the results of a study.
• A controlled study is one of the following
• A study containing a control group
• A study in which the investigator assigns treatment and non-treatment (“control”).
• When there is no control, evidence is often called anecdotal, meaning that, while it may be true, it doesn’t prove anything.
Exercise 3
• Which of the following are based on a sample, and not just anecdotal?
• Seatbelts are a bad safety measure: 10 drivers saved their lives by being thrown free of their car wrecks last year in Massachusetts.
• Out of 102 customers at a quick-stop market, 31 made their purchases with a credit card.
• Out of 100 cars manufactured in the night shift, 95 were fully defect-free.
• Hoan says that adding a few drops of almond flavoring makes her cookies taste better.
Anecdotal vs sample
• “Seatbelts are a bad safety measure: 10 drivers saved their lives by being thrown free of their car wrecks last year in Massachusetts.”
• What kind of data would we want to look at to see if seatbelts are a good idea or not?
Designing Experiments:Randomized versus Observational
• In a randomized study the researcher assigns treatment randomly.
• In an observational study, the researcher observes differences between two groups – treatment and control – but does not assign treatment.
Designing Experiments:Randomized versus Observational
• Observational studies can be very hard to interpret, since the subjects have often “self-selected” into the treatment and control groups.
Observational Studies: Example 1
• “Pet a day keeps the doctor away”
• Those who owned pets had contact with a doctor 8.42 times in a year.
• Those who did not own pets had contact 9.49 times.
• What interpretation is being made?
• Could there be another interpretation?
Observational Studies: Example 1
• “Pet a day keeps the doctor away”
• Those who owned pets had contact with a doctor 8.42 times in a year.
• Those who did not own pets had contact 9.49 times.
• Does Pet cause good health; or
• Does good health cause pet; or
• Does something else cause both?

?

Good Health

Pet Ownership

?

Good Health

Pet Ownership

A desire to care for others

Good Health

?

Pet Ownership

Do storks bring babies?

?

Human baby births

Stork Nests

Do babies causing stork nests?

?

Human baby births

Stork Nests

month of year

?

Human baby births

Stork Nests

Observational Studies: Example 2
• Likely to have a nervous breakdown?
Observational Studies: Example 2
• Likely to have a nervous breakdown?

Does marriage make women crazy?

Do crazy women make a point of getting married?

Is there a 3rd issue, say rich woman are more likely to marry and more likely to be crazy.

Observational Studies: Example 3
• A common finding is that African Americans do not go to college at the same rates as the general population.
• It has been hypothesized that this could be explained by income.
• But, it is consistently found that African Americans do not go to college at the same rates, even controlling for income.
• Discussion over?