1 / 41

# Answering research questions - PowerPoint PPT Presentation

Answering research questions. Stuff we’ll cover. Type I and Type II Error Statistical concepts: power, effect size, significance. Hypothesis Testing Mean, median and mode Types of Distribution and Relationships Bayes rule (time permitting) Sampling (intuition only). Types of errors.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Stuff we’ll cover
• Type I and Type II Error
• Statistical concepts: power, effect size, significance.
• Hypothesis Testing
• Mean, median and mode
• Types of Distribution and Relationships Bayes rule (time permitting)
• Sampling (intuition only).
Types of errors
• Suppose that we are building a sensor network to detect whether people are in a given area (Josh and Isha are doing these kinds of things).
• The sensor gives us data points (independent variables) and our dependent variable is a boolean variable: Someone is here/is not here.
What problems do we face?
• Statistical error –is therandom errorcaused by fluctuations and the random sampling. For example, what is the probability of snow in Jan in Boston? This is why we want a large sample size. If the sample size is the whole month for many years we’ll be ok.
Systematic error
• Systematic error – is error caused by the measurement methodology. For example, using a miscalibrated scope. Hopefully, if you can identify (or measure) this error you can remove it.
• Examples of systematic errors which were discussed last class: experimenter bias, sampling bias.
The basic problem that that we are attempting to address is whether a given data point belongs to the class we are looking for.
• In machine learning there are two types of errors: false positives and false negatives.
Type I error
• False positives – we (falsely) think that someone is here (i.e., belongs to our class and therefore we are adding this data point) although in fact the data point does not belong (null). This is also called error of the first kind or type I error.
Type II error
• False negatives – the inverse. We (falsely) think that nobody is here although in fact someone is here. This is also called error of the second kind or type II error.
• Often there is a tradeoff between type I and type II. Obviously, we can get either kind of error to be zero but…
Which type of error is more important?
• Depends what you are looking for.
• For example, when screening for terrorists you really don’t want any false negatives
• In criminal justice the legal maxim "Better that a thousand guilty men go free than one innocent man be imprisoned.“ …
Numerical example
• Suppose you have a rare disease which hits 0.001 of the population. Suppose you have a very accurate test which is correct 99% of the time on healthy patients but is always correct for sick patients.
• This means that the false positive rate is 1%.
• Now given someone who tests positive, what is the probability that the person actually has the disease?
The probability that the person has the disease is 0.001. Therefore the probability that the person has the disease and shows up positive in the test is 0.001.
• OTOH for health people there is a 1% chance of false positives so that the probability of a healthy person showing up and being positive is 0.999*0.01 which is ten times more likely!
Errors even have symbols
• Alpha – is the false positive rate which is the number of false positives divided by the number of negatives.
• 1- alpha is often called the specificity of the test. More specific tests means less false positives but more false negatives.
• Beta – number of false negatives divided by number of positives. 1-beta is called the power.
Effect size
• Effect size measures how likely your measurements are merely due to noise.
• Example, suppose you are observing height difference between men and women. The effect size is the difference in height.
The bigger the effect size the better. It means that you can use a smaller sample size to detect that there really is a difference.
• There are tests to measure effect size.
• One of the most standard (which will be discussed in the next class) is Pearsons correlations.
Pearson correlation
• Is obtained by dividing the covariance of the two variables by the product of their standard deviations.
• Most statistics packages will let you calculate the correlation without needing to know the formula.
Power
• We can think of the power of a statistical test as the probability that we will not be able to find what we are looking for i.e., that we will have a type II error.
• Calculating the power requires first specifying the effect size you want to detect. The greater the effect size, the greater the power.
Although there are no formal standards for power, most researchers who assess the power of their tests use 0.80 as a standard for adequacy. IOW they will waste their time and give up on the study (which they shouldn’t have) only with probability .2. They will also make a false claim of success with the same prob.
• The most common way for increasing power is to increase sample size.
Power mistakes
• Power is a property of the hypothesisnot of the experiment.
• If you are checking multiple hypotheses you have to consider the power you need for each one separately.
• For example, looking at male/female height and weight differences as well as the correlation between the differences in male/female. This is 3 different hypothesis and therefore has 3 different powers.
Significance
• A result is called significant if it is unlikely to have occurred by random chance. "A statistically significant difference" simply means there is statistical evidence that there is a difference
• Note that this does not measure effect size
For example, suppose that European men are .001cm taller than American men. Suppose also that height is a normal distribution. Not a very large difference (formally, the effect size is small) but if we look at a large enough sample we can see that the effect is significant.
• Note that there need not be a relation between effect size, significance and sample size. It all depends on the distributions.
When is something significant?
• Since significance is a continuous variable there is no theoretical justification for not merely stating the significance.
• However in practice there are two widely used cutoffs the .05 and the .01 levels.
• Saying something is significant to .05 means that there is less than a 5% chance that this difference is due to random chance.
Dangers
• Commonly used significance is for a single test – if you use multiple tests (or comparisons) you need to be careful.
• Significance can be found or not. If you can’t show that something is significant it does not mean that there is no difference. It just means you didn’t find a difference.
Some simple things to look for
• Suppose we have n data points a1,…an ordered from the smallest to the largest:
• Mean – a fancy name for average.
• Median – the value of an/2
• Mode – the value that appears the most often.
Numerical example
• Suppose our data set is {1,1,1,4,8,9,11}
• The mean is:(1+1+1+4+8+9+11)/7=35/7=5
• The median is 4 since there are 3 values larger than it and 3 values smaller than it.
• The mode is 1 since that values appears 3 times which is more than every other value.
Hypothesis testing
• Hypothesis testing is the test of whether what we are looking for is actually maintained by the data.
• In order to do hypothesis testing we must do the following:
• The hypothesis must be stated in mathematical/statistical terms. This was taught last class.
A test statistic must be chosen that will summarize the information in the sample that is relevant to the hypothesis. Such a statistic is known as a sufficient statistic. Some examples are the mean, median and mode.
• The distribution of the test statistic is used to approximate the probability
Among all the sets of possible values, we must choose one that we think represents the most extreme evidence against the hypothesis. The probability of the test statistic falling in the critical region when the hypothesis is correct is called the alpha value (or size) of the test.
• Think of this argument as the devils advocates stage. We are trying to find alternative explanations.
We calculate the test statistic on the data. Note that you should have the test statistic before you actually start measuring.
• If the test statistic is significant, then our conclusion is one of the following:
• The hypothesis is correct.
• An event of probability less than or equal to alpha has occurred.
• If the alpha is small enough we can go publish our paper.
If the test statistic is not significant, the only conclusion is that

There is not enough evidence to reject the hypothesis.(But it could still be true – maybe we should redesign the experiment)

Which is most important
• Depends on what you are looking for
• Average - average shopper.
• Median – is important for e.g., median salary when we want to cut off the extremely wealthy in a long tailed distribution.
Types of distributions
• The most commonly used distribution is the normal or Gaussian distribution. Any variable that can be modeled as a sum of many small independent variables is approximately normal (central limit).
• Uniform distribution – every point has equal probability.
Bernoulli – which takes value 1 with probability p and value 0 with probability q = 1 − p.
• Many more distributions..
• Joint distributions.
How can we tell which distribution we are looking at?
• If we know what we are looking at and we are (very) lucky then we can show that our distribution is of a particular kind.
• In general – we can’t.
• EM – out of scope 
How can we describe our data?
• If the distribution is nice (and we know that) we are in luck.
• In general we need to give all of the data points.
• It is easy to construct two different distributions with the same mean.
Basically we can always construct two distributions which appear the same for any finite number of observations.
• In practice – people generally state expectation and variance and assume that the distribution is normal.
Bayes rule
• Bayes rule is a law that relates the conditional probabilities to the marginal probabilities:
• Pr(A|B)=Pr(A and B)/Pr(B)
• There are many situations where you know two of these and can therefore get the third.
Numerical example
• Suppose we go back to our medical test. Now suppose the test is correct .99 on both people with the disease and people without.
• Now suppose someone tests positive. What is the correct prob?
More stuff you should know
• We will not cover sampling techniques due to lack of time. You should read up on them.
• Random sampling.
• Cluster sampling.
• Gibbs sampling.
• Multistage sampling.
• Quota sampling.
• Survey sampling.