Maximum Likelihood Estimates and the EM Algorithms II

Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm

Part 1Computation Tools

Include Functions in R • source("file path") • Example • In MME.R: • In R: MME=function(y1, y2, y3, y4) { n=y1+y2+y3+y4; phi1=4.0*(y1/n-0.5); phi2=1-4*y2/n; phi3=1-4*y3/n; phi4=4.0*y4/n; phi=(phi1+phi2+phi3+phi4)/4.0; print("By MME method") return(phi); # print(phi); } > source(“C:/MME.R”) > MME(125, 18, 20, 24) [1] "By MME method" [1] 0.5935829

Part 2Motivation Examples

A a b B A b a A a A a B b B B b Example 1 in Genetics (1) • Two linked loci with alleles A and a, and B and b • A, B: dominant • a, b: recessive • A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab 1/2 1/2 1/2 1/2

A a b B A b a A a A a B b B B b Example 1 in Genetics (2) • Probabilities for genotypes in gametes 1/2 1/2 1/2 1/2

Example 1 in Genetics (3) • Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92. • More: http://en.wikipedia.org/wiki/Geneticshttp://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf

Example 1 in Genetics (4)

Example 1 in Genetics (5) • Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. • A*: the dominant phenotype from (Aa, AA, aA). • a*: the recessive phenotype from aa. • B*: the dominant phenotype from (Bb, BB, bB). • b*: the recessive phenotype from bb. • A*B*: 9 gametic combinations. • A*b*: 3 gametic combinations. • a*B*: 3 gametic combinations. • a*b*: 1 gametic combination. • Total: 16 combinations.

Example 1 in Genetics (6) • Let , then

Example 1 in Genetics (7) • Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: We know that and So

Example 1 in Genetics (8) • Suppose that we observe the data of which is a random sample from Then the probability mass function is

Maximum Likelihood Estimate (MLE) • Likelihood: • Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros. • Under regular conditions, the MLE is consistent, asymptotic efficient and normal! • More: http://en.wikipedia.org/wiki/Maximum_likelihood

MLE for Example 1 (1) • Likelihood • MLE:

MLE for Example 1 (2) A B C

MLE for Example 1 (3) • Checking: • 1. • 2. • 3. Compare ?

Part 3Numerical Solutions for the Score Equations of MLEs

A Banach Space • A Banach space is a vector space over the field such that • Every Cauchy sequence of converges in (i.e., is complete). • More: http://en.wikipedia.org/wiki/Banach_space

Lipschitz Continuous • A closed subset and mapping • is Lipschitz continuous on with if • is a contraction mapping on if is Lipschitz continuous and • More: http://en.wikipedia.org/wiki/Lipschitz_continuous

Fixed Point Theorem (1) • If is a contraction mapping on if is Lipschitz continuous and • has an unique fixed point such that • initial

Fixed Point Theorem (2) • More: http://en.wikipedia.org/wiki/Fixed-point_theorem http://www.math-linux.com/spip.php?article60

Applications for MLE (1) • Numerical solution is a contraction mapping : initial value that Then

Applications for MLE (2) • How to choose s.t. is a contraction mapping? • Optimal ?

Parallel Chord Method (1) • Parallel chord method is also called simple iteration.

Parallel Chord Method (2)

Plot Parallel Chord Method by R (1) ### Simple iteration ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Parallel chord method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0)

Plot Parallel Chord Method by R (2) phi0 = 0.25 # Given the initial value 0.25 # segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "red") phi1 = -b0/f2(phi0) # Find the closer phi # segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red")

Define Functions for Example 1 in R • We will define some functions and variables for finding the MLE in Example 1 by R # Fist, second and third derivatives of log likelihood # f1 = function(y1, y2, y3, y4, phi){y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 = function(y1, y2, y3, y4, phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} f3 = function(y1, y2, y3, y4, phi) {2*y1*(2+phi)^(-3)-2*(y2+y3)*(1-phi)^(-3)+2*y4*(phi)^(-3)} # Fisher Information # I = function(y1, y2, y3, y4, phi) {(-1)*(y1+y2+y3+y4)*(1/(4*(2+phi))+1/(2*(1-phi))+1/(4*phi))} y1 = 125; y2 = 18; y3 = 20; y4 = 24; initial = 0.9

Parallel Chord Method by R (1) > fix(SimpleIteration) function(y1, y2, y3, y4, initial){ phi = NULL; i = 0; alpha = -1.0/f2(y1, y2, y3, y4, initial); phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-5){ i = i+1; phi1 = phi2; phi2 = alpha*f1(y1, y2, y3, y4, phi1)+phi1; phi[i] = phi2; } print("By parallel chord method(simple iteration)"); return(list(phi = phi2, iteration = phi)); }

Parallel Chord Method by R (2) > SimpleIteration(y1, y2, y3, y4, initial)

Parallel Chord Method by C/C++

Newton-Raphson Method (1) • http://math.fullerton.edu/mathews/n2003/Newton'sMethodMod.html • http://en.wikipedia.org/wiki/Newton%27s_method

Newton-Raphson Method (2)

Plot Newton-Raphson Method by R (1) ### Newton-Raphson Method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Newton-Raphson method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0)

Plot Newton-Raphson Method by R (2) # Given the initial value 0.25 # phi0 = 0.25 segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "purple", lwd = 2) # Find the closer phi # phi1 = -b0/f2(phi0) segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2)

Plot Newton-Raphson Method by R (3) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red") curve(f2(phi1)*x-f2(phi1)*phi1+f1(phi1), add = T, col = "blue", lwd = 2) legend(0.45, 250, c("Newton-Raphson", "Parallel chord method"), col = c("blue", "red"), lty = c(1, 1))

Newton-Raphson Method by R (1) > fix(Newton) function(y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Newton-Raphson method"); return (list(phi = phi2, iteration = phi)); }

Newton-Raphson Method by R (2) > Newton(125, 18, 20, 24, 0.9) [1] "By Newton-Raphson method" $phi [1] 0.577876 $iteration [1] 0.8193054 0.7068499 0.6092827 0.5792747 0.5778784 0.5778760 0.5778760

Newton-Raphson Method by C/C++

Halley’s Method • The Newton-Raphson iteration function is • It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley (1656-1742): (http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html)

Halley’s Method by R (1) > fix(Halley) function( y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1)*1.0/(1.0-f1(y1, y2, y3, y4, phi1)*f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3, y4, phi1)*f2(y1, y2, y3, y4, phi1)*2.0)); phi[i] = phi2; } print("By Halley method"); return (list(phi = phi2, iteration = phi)); }

Halley’s Method by R (2) > Halley(125, 18, 20, 24, 0.9) [1] "By Halley method" $phi [1] 0.577876 $iteration [1] 0.5028639 0.5793692 0.5778760 0.5778760

Halley’s Method by C/C++

Bisection Method (1) • Assume that and that there exists a number such that . If and have opposite signs, and represents the sequence of midpoints generated by the bisection process, thenand the sequence converges to . • That is, . • http://en.wikipedia.org/wiki/Bisection_method

Bisection Method (2) 1

Plot the Bisection Method by R ### Bisection method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 f <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} x = c(1:100)*0.01 y = f(x) plot(x, y, type = 'l', main = "Bisection method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0) abline(v = 0.5, col = "red") abline(v = 0.75, col = "red") text(0.49, 2200, labels = "1") text(0.74, 2200, labels = "2")

Bisection Method by R (1) > fix(Bisection) function(y1, y2, y3, y4, A, B) # A, B is the boundary of parameter # { Delta = 1.0E-6; # Tolerance for width of interval # Satisfied = 0; # Condition for loop termination # phi = NULL; YA = f1(y1, y2, y3, y4, A); # Compute function values # YB = f1(y1, y2, y3, y4, B); # Calculation of the maximum number of iterations # Max = as.integer(1+floor((log(B-A)-log(Delta))/log(2))); # Check to see if the bisection method applies # if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){ print("The values of function in boundary point do not differ in sign.");

Bisection Method by R (2) print("Therefore, this method is not appropriate here."); quit(); # Exit program # } for(K in 1:Max){ if(Satisfied == 1) break; C = (A+B)/2; # Midpoint of interval YC = f1(y1, y2, y3, y4, C); # Function value at midpoint # if((K-1) < 100) phi[K-1] = C; if(YC == 0){ A = C; # Exact root is found # B = C; } else{ if((YB*YC) >= 0 ){

Bisection Method by R (3) B = C; # Squeeze from the right # YB = YC; } else{ A = C; # Squeeze from the left # YA = YC; } } if((B-A) < Delta) Satisfied = 1; # Check for early convergence # } # End of 'for'-loop # print("By Bisection Method"); return(list(phi = C, iteration = phi)); }

Bisection Method by R (4) > Bisection(125, 18, 20, 24, 0.25, 1) [1] "By Bisection Method" $phi [1] 0.5778754 $iteration [1] 0.4375000 0.5312500 0.5781250 0.5546875 0.5664062 0.5722656 0.5751953 [8] 0.5766602 0.5773926 0.5777588 0.5779419 0.5778503 0.5778961 0.5778732 [15] 0.5778847 0.5778790 0.5778761 0.5778747 0.5778754

Maximum Likelihood Estimates and the EM Algorithms II

Maximum Likelihood Estimates and the EM Algorithms II

Presentation Transcript

Maximum likelihood (ML)

Maximum Likelihood

Maximum Likelihood

4. Maximum Likelihood

Maximum Likelihood Estimates and the EM Algorithms II

Maximum Likelihood

Maximum Likelihood Estimation

Maximum Likelihood

The maximum likelihood method

Maximum likelihood

Maximum likelihood decoding

Maximum likelihood (cont.)

Maximum Likelihood Estimation

Maximum Likelihood and the Information Bottleneck

Estimation Maximum Likelihood Estimates Industrial Engineering

Maximum likelihood (cont.)

Maximum Likelihood

Maximum Likelihood

Maximum Likelihood

Maximum likelihood estimates

Maximum Likelihood Estimates and the EM Algorithms I

Maximum Likelihood Estimation