Exploring Statistical Concepts: Hypothesis Testing and Signal Detection Theory

Hypothesis Testing:(working from incomplete information) Jury deliberations Binomial distribution Poisson distribution and quantal release Normal distribution: standard deviation Stdev of samples of size N Estimating population statistics from small samples student’s t-test Predicting the future non-parametric statistics: Difference of Proportions The Black Swan… Correlation Fuzzy Logic and fuzzy controllers…

handouts: • Selected pages from chapters 7 and 10 of Loftus & Loftus, Essence of Statistics, 2nd Ed (Knopf, 1988) • You may have seen some of this material from AM0650, or AM1650…

http://www.stat.brown.edu/ • 12 biostatisticians (ScM level) on call • “Our mission is to foster research and statistical education at Brown Medical School and the University at large. Center faculty and staff conduct methodologic research in Biostatistics and interdisciplinary research in a broad range of areas of Medicine and Public Health. The Center is home to the graduate program in Biostatistics and the undergraduate statistics concentration at Brown, and organizes the Brown Statistics Seminar Series.” • my guy: Brad Snyder…

Hypothesis matrix for a jury

Null hypothesis is true(innocent verdict for innocent person)

Alternative hypothesis is true(guilty verdict for guilty criminal)

Alternative hypothesis is true(innocent verdict for guilty criminal)

Alternative hypothesis is true(guilty verdict for innocent citizen)

Signal detection theory: http://wise.cgu.edu/sdtmod/index.asp http://teachline.ls.huji.ac.il/72633/SDT_intro.pdf

SDT: All about a “detector” making decisions • Is the “detector” (who can be a human making decisions/judgments) prone to false positives or misses? (mistakes) • Hits and rejections are correct answers… • Misses  a “conservative” detector • False Positives  aggressive, optimistic, paranoid, indefatigable… • Not just about finding significant differences between samples A and B… • A Grand Jury can be classified as prone to “guilty” or prone to “innocent”…

from WISE cgu.edu SDT website: • “SDT is a method of modeling the decision making process for someone who decides between different classes of items (e.g., friend or [foe]) and their bias to favor a particular type of response.” • Jury selection (voir dire); jury consultants; hung juries, mistrials; Louisiana v. Morgan—”jury of peers;” civil rights…

from WISE SDT website , p.3: • “Note that Misses and Correct Rejections are redundant with Hits and False Alarms. • The miss rate is 10/50 which is .20 or simply (1 - "hit rate") and the Correct Rejection rate is 45/50 or .90 or (1 - "false alarm rate"). • Therefore, you can perfectly describe all four measures of a person's performance in a signal detection experiment through their Hit and False Alarm rates.”

Detector sensitivity: d’ • “The most commonly used SDT measure of sensitivity is d' (d prime), which is the standardized difference between the means of the Signal Present and Signal Absent distributions. To calculate d', we need only know a person's hit and false alarm rates. • The formula for d' is as follows: d' = z(FA) - z(H) • where FA and H are the False Alarm and Hit rates, respectively, that correspond to right-tail probabilities on the normal distribution.”

Criterion • Criterion is a measure of the willingness of a respondent to say 'Signal Present' in an ambiguous situation. • The choice of a criterion may depend on perceived consequences of outcomes. • For example, if the consequences are costly for saying 'Signal Present' when the signal actually is absent, then a respondent may generally be less willing to say 'Signal Present.' • On the other hand, if the consequences are more costly for failing to detect a signal when it is present, then a respondent may be more willing to say 'Signal Present.' • Positive Criterion » more willing to say “yes, I saw it” • The ROC is the locus of Criterion points…

SDT Summary • “Signal Detection Theory (SDT) allows an analyst to separate sensitivity from response bias. Observers are assumed to make decisions based upon information derived from two distributions. The first (Signal Absent) is assumed to represent a background level of "noise." The second distribution (Signal Present) represents an increase to a background level of noise caused by the introduction of a stimulus. That is why the second distribution is sometimes referred to as the 'Signal + Noise‘ distribution. • An observer's sensitivity, as indexed by d', is how well the observer can differentiate items coming from the Signal Absent and Signal Present distributions.Criterion (i.e., response bias) represents the minimum level of internal certainty needed for the observer to decide that a signal was present. • ROCs represent the relationship between hits and false alarms, and can be used to describe performance in terms of d'. SDT has applications in fields such as medical diagnosis, bioinformatics, psychology, and engineering.”

Receiver Operating Characteristics (ROCs) • “The receiver-operating characteristic (ROC) is a fundamental plot in signal detection theory. A ROC is essentially a scatterplot that shows the relationship between false alarm rates on the x-axis, and hit rates on the y-axis. ROCs describe the relationship between the underlying Signal Absent and Signal Present distributions.”

Null hypothesis, the scientific method, and troubleshooting • Some independent variable (input) has been changed in the experiment. • The output is the dependent variable. • The null hypothesis: That the independent variable has no affect on the dependent variable. • You want to design an experiment to test whether the null or alternative hypothesis is true. • Something goes wrong with a circuit: Test your hypothesis as to why. (Lab ADA example…) • Horace Barlow: direction selectivity in rabbit retinal ganglion cells: 2 alt hypotheses, test between them, so as not to favor one…

ordered combinations and Pascal’s triangle • An ordered combination deals with items that have individual labels, such as their place in a row… • The number of ordered combinations of N things taken r at a time is Pascal’s triangle shows C(N, r) with N as the row and r as the “column” http://ptri1.tripod.com/

Binomial formula Consider a random variable that can be in one of two states: “success” or “failure” The probability of exactly r successes out of N attempts is where p is the probability of success and q of failure another use of the formula:

Binomial distribution in EXCEL • The probability that 3 or fewer coin flips come up heads out of 10 tosses of a fair coin: ↓cumulative factor = BINOMDIST(3, 10, 0.5, 1) = 0.172 • Also see =COMBIN(10, 2) for EXCEL version of number of combinations of 10 things taken 2 at a time… • or try MATLAB function nchoosek(N, k) say nchoosek(10, 2)...

Or solve using Pascal’s triangle: • find the row with “10” as the second number • where 1, 10, 45, 120 are the number of combinations of 10 things taken “0”, 1, 2, 3 at time • The number 0 represents that none of the coinscame up heads--there’s only 1 way that can happen…

Why roulette is my favorite form of gambling • C:\MatlabR12\work\JDD\roulette13.m • Pascal’s catastrophe… • Thomas Bass, Newtonian Casino, Penguin (1991)

Poisson distribution where n*p is the average for N trials… Or try EXCEL =poisson(0, 4.7, 1)=0.009095looking at the probability that there would be no release for one stimulation where n*p = 4.7, and say n=470 and p = 0.01 compare to =binomdist(0, 470, 0.01, 1) = 0.008883 …close

Vesicle release from synapsework of Bernard Katz (Nobel Prize, 1970) Example: epsp = "excitatory post-synaptic potential". Of 198 stimulus impulses 18 resulted in no epsp (failure of release). (image from Tepper, at Rutgers...)presynaptic bouton on left. 78 spontaneous epsp’s were observed (next slide) average height 0.4mV.

for the event count of m*0.4mV add up 3 neighboring bins, except for 0. Spontaneous vs evoked epsp’s …

Fitting the data • Katz’ data are very well fit by a Poisson distribution with n*p = 2.33, the only free parameter in the equation. • What is n in n*p? Not 198, the number of shocks to the presynaptic axon. • n is the number of vesicles in the presynaptic synapse. est: n = 800so p = 2.3/800 = 0.002875

How many vesicles are there per pre-synaptic bouton? • Anywhere from hundreds to thousands. • One estimate says 987 vesicles per cubic micron. • There are docking vesicles ready to be released, and reserve vesicles--recently reconstituted, and away from the membrane that is facing the synpatic cleft. • At any rate, p is the probability that one vesicle will be released (due to one pre-synaptic shock...)

Quiz example of quantal release question: • Suppose there are 700 vesicles at a synapse and each has 0.002 probability of being released by one pre-synaptic shock. What is the expected number of shocks out of 200 that will result in no vesicles being released? • n*p = 700*0.002 = 1.4, the mean released… • =POISSON(0, 1.4, 1) = 0.25 • 0.25*200 = 50 shocks will result in no vesicle being released • =BINOMDIST(0, 700, 0.002, 1) = 0.246252

A giant has swallowed 6 dwarfs numbered Di; you hit him on the back and he coughs up N D’s; how many he coughs up fits a binom dist: avg 3Binomial example of 2 giants coughing… • % Binom_samp_sze2 11.4.14 • % compare std of sample size 2 from binomdist of 6 % assume 50% probability of success • % possible to cough up 0… • pasc_7 = [ 1 6 15 20 15 6 1] % total of 64… • to_6 = [ 0 1 2 3 4 5 6 ] % avg = 21/7 = 3 • dot_prod = sum(pasc_7 .* to_6) • avg1 = dot_prod/sum(pasc_7) • OR Prob of two times of 5 or 6 = (7/64)^2 ≈ 1%

Normal (Gaussian, Bell-shaped) Distribution Say the mean of the data is μ and the standard deviation is σ

cumulative normal probability density function0 mean, 1 stdev from z= -1.96 to +1.96 is 95% of the area under the curve

The Black Swan*: How can you tell if your data are NOT normally distributed? • mean ≠ median, or • CPDF not sigmoid-shaped, or • PDF has “barbell” distribution or • Fat-tailed asymmetric distribution or • Data “fails” Chi-squared test… *The Black Swan: The Impact of the Highly Improbable, Nicholas Taleb, Random House (2007)

Binomial becomes Normal (SAT?) • Consider a binomial distribution with p =0.5 • p(x) vs x will be a symmetric up-down staircase curve • As the number of “coin flips” N in the binomial data set increases, the curve will look smooth and “normal” • standard deviation of binomial dist = connecting the dots of a 40-point binomial plot…

Are you smarter than a 10th grader? • Sample of one: • Suppose you score 600 on the SAT math test • The average 10th grader scores 500 • Standard deviation of the SAT = 100 • What's the probability that you're smarter than a 10th grader? • You did receive a higher score, but (in EXCEL) • =NORMDIST(600, 500, 100, 1) = 0.84 • 1-0.84 = 16% • 16% = one-tailed probability that someone will score 600 or more. • You're in the 84th percentile. • You’re not significantly smarter than a 10th grader…

Are you and your 9 left-handed friends smarter than a 10th grader? Say the mean of sample size 10 is 600… Best est. of mean of the means of many samples of size 10 is the mean of the one sample, 600… What is the best estimate of the variance of the means of many samples 10 scoresdrawn from a SAT distribution? Note: it doesn’t matter what the particular standard deviation of the one sample is…if pop. σ known • Sample of 10, with (average of the 10) = 600 on SAT math test • The standard deviation of samples size 10 is sqrt(10000/10) = sqrt(1000) = 31.6 • =NORMDIST(600, 500, 31.6, 1) = 0.9992, wildly significant • Example of testing sample against a known population

Comparing two variants of a population • What about comparing two experimental groups from a known population? • Form a normalized z term as shown below: Ms1 and Ms2 are the means of the two groups. • We are interested in the difference of the means here

Comparing two variants of a population(cont) • Suppose it's known that the average area of maple leaves on the ground in October is 28 cm-sq, with a standard deviation of 5 cm-sq. • A sample of 12 Japanese maple leaves has an average area of 34 cm-sq, std 4 cm^2 • Someone else comes in and says that a sample of 18 “big leaf” maple leaves had an average area of 38 cm-sq, unknown standard deviation. • Is it significant at the 5% level that big leaf maple leaves are larger than Japanese maple leaves? (When was the hypothesis conceived?) • From the formula on the previous slide, the estimated std_dev is 1.86, • →z = 4/1.86 = 2.14 and without having to actually calculate z, =NORMDIST(4, 0, 1.86, 1) = 0.984 • The significance is 1.6% < 5% Answer: yes the difference is significant.

More Maple Leafs (evs?) • work\fold23\MapleLeafSizeScript12 • MapleTST.xls • Tools\Data Analysis\z-test for 2 means

The paradox of two tails The area in yellow must be less than 5% of the total for the two-tailed test to be significant. A two tailed-test is 2x more difficult to pass than a one-tailed

Digression for November elections • A qualifier seen in news articles about political polling: 3.1% margin of error… • Suppose X voters out of N sampled will vote for your candidate. What is the number of voters N needed in a sample to insure that 95% of the time the actual percentage of voters underlying your candidate's percentage of X/N will be within ±3.1 percent of X/N? • This question, whose answer is N=1000 and whose derivation is here, is different from the question: • What should be N such that you're confident at the 95% level that the range of poll percentages is ±3 percent of X/N if you repeated the poll many times?

Number of voters needed in a poll

Men’s height (age 20-40)and %-age of 7-footers in NBA • “guys who are just tall…” • http://www.truthaboutit.net/2012/05/true-or-false-half-of-all-7-footers-are-in-the-nba.html • CDC data: for age 20, mean = 69.8”std_dev = 2.8” • (84-69.8)/2.8 = 5.07 = z (num of std_dev out) • 1-NORMDIST(84, 69.8, 2.8, 1) = 2 x 10-7 • 320M/2 = 160M; ¼*160 = 40M men age 20-40 • →40x106 * 2 x10-7 = 8, too small of a number: • fat tail distribution…

Estimating unknown population variance • Suppose the statistics of the underlying population are unknown… • What is the best estimate of population variance? • Remember from AM65?

t-distributions • Once we enter a world of unknown population statistics, where we rely on the small sample data alone, we end up dealing with t-distributions--examples below for 3 and 6 deg of freedom… • Contained within EXCEL are the “t-tables” for each degree of freedom N-1.

Two tails example: Comparing to a standardwith TDIST • Suppose I do an experiment to see how close people can come to guessing my weight. I ask ten people. • I know my exact weight, but don't know the standard deviation of all guesses, only the stdev of a sample of 10 guesses. • Next, I estimate the variance of the population from • Then I divide the est. variance by N=6 to find the est. variance ofthe means of sample size 10 = σ2. • Now I calculate t = (diff_mean – wt)/σ and have EXCEL compute • =TDIST(diff_mean, t, 9, 1) • But what if I don’t care if they’re high or low, just wrong in either direction? Time for 2-tails? see weight sheet… • WeightEst12.xls on the screen

Two tailed test example (cont) • Suppose all the guesses are too high. • Can I do a one-tailed test concerning the hypothesis that people overestimate my weight? • NO! • The hypothesis was conceived after collecting the data. • The two-tailed criterion must be used. • Whatever the one-tailed (normal) significance, I must multiply it by 2. (considering significance to be a small number…) • The hypothesis: It is significant that people are wrong about my weight, guessing either too high or too low. • May lead to a Difference of Proportions test with threshold. • Or what about using the absolute value of the “error” as the data?

Example: femurs in lemurs Suppose a sample of 9 femurs from ring-tailed lemurs show their mean length to be 20 cm, And that the variance of the length is 9 cm. What is the probability that the ringtail lemur femurs are NOT from a population of mean length 24 cm? A one-tailed test?

Example 10-5 from Loftus & Loftus, then use =TTEST(A1, A2, tails, type)

How can you use the EXCEL tools if all you have is sample size, mean and stdev? • Create your own sample with the same mean and standard deviation: • The cpdf--cumulative probability density function--is the integral of the probability distribution. • Sample at equal intervals of the cumulative probability y-axis; pick off the associated z values, then un-normalize the z’s: x = σ*z+μ • See test code in folder fold23 function [pdfx, pdfy, samp, samp3, std1, std3] = pdf_tst12(52, 100, 30, 1); • The result will be a sample with a slightly smaller variance than the underlying population. • Tweak that data to get the exact stdev, and use EXCEL on the resulting synthetic sample. • rev12 has rand(div, 1) generate from a UNIFORM distribution…

How long to find the T?

Exploring Statistical Concepts: Hypothesis Testing and Signal Detection Theory