Statistics for Science Journalists. Steve Doig Cronkite School of Journalism. Journalists hate math. Definition of journalist: A do-gooder who hates math. “ Word person, not a numbers person. ” 1936 JQ article noting habitual numerical errors in newspapers
Statistics for Science Journalists
Cronkite School of Journalism
Statistics are tools to help us work with measurements that vary
If you found a wallet with $20, would you:
(23% would keep it)
(13% would keep it)
People routinely say they have voted when they actually haven’t, that they don’t smoke when they do, and that they aren’t prejudiced.
One study six months after an election:
Washington Post poll : “Some people say the 1975 Public Affairs Act should be repealed. Do you agree or disagree that it should be repealed?”
Later Washington Post poll: “President Clinton says the 1975 Public Affairs Act should be repealed. Do you agree or disagree that it should be repealed?”
95% of the time, a random sample’s characteristics will differ from the population’s by no more than about
where N= sample size
*Unless the sample is a significant fraction of the population.
Some measures of variability:
Your percentilefor a particular measure (like height or IQ) is the percentage of the population that falls belowyou.
Compared to other American males:
Therefore, I am older and heavier than I am tall.
A standardized score(also called the z-score) is simply the number of standard deviations a particular value is either above or below the mean.
The standardized score is:
Useful for defining data points as outliers.
For any normal curve, approximately:
Correlation (also called the correlation coefficient or Pearson’s r) is the measure of strength of the linear relationship between two variables.
Think of strength as how closely the data points come to falling on a line drawn through the data.
r = +.4
r = +1
r = +.8
r = +.1
r = -.4
r = -.1
r = -.8
r = -1
r = 0
r = 0
r = .8
r = .8
Correlation does not imply causation.
(Churches and liquor stores, shoe size and reading ability)
Example: pollen counts and percent of population suffering allergies, intercourse and babies
Example: hotel occupancy and advertising spending, divorce and alcohol abuse
Example: birth complications and violence, gun in home and homicide, hours studied and grade, diet and cancer
Example: SAT score and GPA, hot chocolate and tissues, storks and babies, fire losses and firefighters, WWII fighter opposition and bombing accuracy
Example: divorces and drug offenses, divorces and suicides
Example: clusters of disease, brain cancer from cell phones
The only way to confirm is with a designed (randomized double-blind) experiment.But non-statistical evidence of a possible connection may include:
In addition to figuring the strength of the relationship, we can create a simple equation that describes the best-fit line (also called the “least-squares” line) through the data.
This equation will help us predict one variable, given the other.
x = horizontal axis y = vertical axis
Equation for a line:
y = slope * x + intercept
or as it often is stated:
y =mx + b
Confusing these two:
When the incidence of some disease or condition is very low, and the test for it is not perfect, there will be a high probability that a positive test result is false positive.
Consider this scenario:
(50% of positives are FALSE!)