1 / 41

Lecture 6

Lecture 6. Bootstraps Maximum Likelihood Methods. Boostrapping. A way to generate empirical probability distributions Very handy for making estimates of uncertainty. 100 realizations of a normal distribution p(y) with y=50 s y =100. What is the distribution of y est = S i y i ?.

tucker
Download Presentation

Lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6 Bootstraps Maximum Likelihood Methods

  2. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of uncertainty

  3. 100 realizations of a normal distribution p(y) withy=50 sy=100

  4. What is the distribution ofyest = Si yi? 1 N

  5. We know this should be a Normal distribution withexpectation=y=50and variance=sy/N=10 p(y) y p(yest) yest

  6. Here’s an empirical way of determining the distributioncalledbootstrapping

  7. Random integers in the range 1-N N original data N resampled data y1 y2 y3 y4 y5 y6 y7 … yN 4 3 7 11 4 1 9 … 6 y’1 y’2 y’3 y’4 y’5 y’6 y’7 … y’N Compute estimate 1 Si y’i N Now repeat a gazillion times and examine the resulting distribution of estimates

  8. Note that we are doingrandom sampling with replacementof the original dataset yto create a new dataset y’ Note: the same datum, yi, may appear several times in the new dataset, y’

  9. pot of an infinite number of y’s with distribution p(y) Does a cup drawn from the potcapture the statistical behavior of what’s in the pot? cup of N y’s drawn from the pot

  10. Pour into new pot Duplicate cup an infinite number of times p(y) Take 1 cup p(y) More or less the same thing in the 2 pots ?

  11. Random sampling easy to code in MatLabyprime = y(unidrnd(N,N,1)); vector of N random integers between 1 and N original data resampled data

  12. The theoretical and bootstrap results match pretty well ! theoretical Bootstrap with 105 realizations

  13. Obviouslybootstrapping is of limited utility when we know the theoretical distribution(as in the previous example)

  14. but it can be very useful when we don’tfor examplewhat’s the distribution of syestwhere (syest)2 = 1/(N-1) Si (yi-yest)2and yest= (1/N) Si yi(Yes, I know a statistician would know it follows Student’s T-distribution …)

  15. To do the bootstrapwe calculatey’est= (1/N) Si y’i(sy’est)2 = 1/(N-1) Si (y’i-y’est)2 and sy’est = (sy’est)2many times – say 105 times

  16. Here’s the bootstrap result … p(syest) I numerically calculate an expected value of 92.8 and a variance of 6.2 Note that the distribution is not quite centered about the true value of 100 This is random variation. The original N=100 data are not quite representative of the an infinite ensemble of normally-distributed values Bootstrap with 105 realizations syest sytrue

  17. So we would be justified saying sy92.6 ± 12.4 that is, 26.2, the 95% confidence interval

  18. The Maximum Likelihood Distribution A way to fit parameterized probability distributions to data very handy when you have good reason to believe the data follow a particular distribution

  19. Likelihood Function, L The logarithm of the probable-ness of a given dataset

  20. N data y are all drawn from the same distribution p(y) the probable-ness of a single measurement yi is p(yi) So the probable-ness of the whole dataset is p(y1)  p(y2)  …  p(yN) = Pi p(yi) L = ln Pi p(yi) = Si ln p(yi)

  21. Now imagine that the distribution p(y) is known up to a vector m of unknown parameters write p(y; m) with semicolon as a reminder that its not a joint probabilty The L is a function of m L(m) = Si ln p(yi; m)

  22. The Principle of Maximum Likelihood Chose m so that it maximizes L(m) L/mi = 0 the dataset that was in fact observed is the most probable one that could have been observed

  23. Example – normal distribution of unknown mean y and variance s2 p(yi) = (2p)-1/2s-1 exp{ -½s-2 (yi-y)2 } L = Si ln p(yi) = -½Nln(2p) –Nln(s) -½s-2Si (yi-y)2 L/y = 0 = s-2Si (yi-y) L/s = 0 = - N s-1 + s-3Si (yi-y)2 N’s arise because sum is from 1 to N

  24. Solving for y and s 0 = s-2Si (yi-y) y = N-1Siyi 0 = -Ns-1 + s-3Si (yi-y)2 s2 = N-1Si (yi-y)2

  25. Interpreting the results y = N-1Siyi s2 = N-1Si (yi-y)2 Sample mean is the maximum likelihood estimate of the expected value of the normal distribution Sample variance (more-or-less*) is the maximum likelihood estimate of the variance of the normal distribution *issue of N vs. N-1 in the formula

  26. Example – 100 data drawn from a normal distribution truey=50s=100

  27. L(y,s) maxaty=62s=107 s y

  28. Another Example – exponential distribution Is this parameter really the expectation ? p(yi) = ½ s-1 exp{ - s-1 |yi-y| } Check normalization … use z= yi-y p(yi)dy = ½s-1-+ exp{ - s-1 |yi-y| } dyi = ½s-1 2 0+ exp{ - s-1 z } dz = s-1 (-s) exp{-s-1z}|0+ = 1 Is this parameter really variance ?

  29. Is y the expectation ? E(yi) = -+ yi½ s-1 exp{ - s-1 |yi-y| } dyi use z= yi-y E(yi) = ½ s-1-+(z+y) exp{ - s-1|z| } dz = ½ s-1 2 y o+exp{ - s-1 z } dz = - y exp{ - s-1 z }|o+ = y z exp(-s-1|z|) is odd function times even function so integral is zero YES !

  30. Is s the variance ? var(yi) = -+(yi-y)2½ s-1 exp{ - s-1 |yi-y| } dyi use z= s-1(yi-y) E(yi) = ½ s-1-+ s2 z2 exp{ -|z| } s dz = s20+ z2 exp{ -z } dz = 2 s2  s2 CRC Math Handbook gives this integral as equal to 2 Not Quite …

  31. Maximum likelihood estimate |x| L = Nln(½) – Nln(s) - s-1Si |yi-y| L/y = 0 = - s-1Si sgn (yi-y) L/s = 0 = - N s-1 + s-2Si |yi-y| y such that Si sgn (yi-y) = 0 x d|x|/dx +1 x -1 Zero when half the yi’s bigger than y, half of them smaller y is the median of the yi’s

  32. Once y is known then … L/s = 0 = - N s-1 + s-2Si |yi-y| s = N-1Si |yi-y| with y = median(y) Note that when N is even, y is not unique, but can be anything between the two middle values in a sorted list of yi’s

  33. Comparison Normal distribution: best estimate of expected value is sample mean Exponential distribution best estimate of expected value is sample median

  34. yi median mean Comparison Normal distribution: short tailed outlier extremely uncommon expected value should be chosen to make outliers have as small a deviation as possible Exponential distribution: relatively long-tailed outlier relatively common expected value should ignore actual value of outliers outlier yi median mean

  35. another important distribution Gutenberg-Richter distribution (e.g. earthquake magnitudes) for earthquakes greater than some threshhold magnitude m0, the probability that the earthquake will have a magnitude greater than m is –b (m-m0) or P(m) = exp{ – log(10) b (m-m0) } = exp{-b’ (m-m0) } with b’= log(10) b P(m)=10

  36. This is a cumulative distribution, thus the probability that magnitude is greater than m0 is unity P(m) = exp{ –b’ (m-m0) } = exp{0} = 1 Probability density distribution is its derivative p(m) = b’ exp { –b’ (m-m0) }

  37. Maximum likelihood estimate of b’ is L(m) = N log(b’) – b’ Si (mi-m0) L/b’ = 0 = N/b’ - Si (mi-m0) b’ = N / Si (mi-m0)

  38. Originally Gutenberg & Richtermade a mistake … slope = -b Log10 P(m) least-squares fit magnitude, m … by estimating slope, b using least-squares, and not the Maximum Likelihood formula

  39. yet another important distribution Fisher distribution on a sphere (e.g. paleomagnetic directions) given unit vectors xi that scatter around some mean direction x, the probability distribution for the angle q between xi and x (that is, cos(q)=xix) is p(q) = sin(q) exp{ k cos(q) } k 2 sinh(k) k is called the “precision parameter”

  40. Rationale for functional form p(q)  exp{ k cos(q) } For q close to zero q 1 – ½q2 so p(q)  exp{ k cos(q) } = exp{k} exp{ – ½q2 } which is a gaussian

  41. I’ll let you figure out themaximum likelihood estimate ofthe central direction, x,and the precision parameter, k

More Related