Maximum Likelihood Estimates and the EM Algorithms II

1 / 78

# Maximum Likelihood Estimates and the EM Algorithms II - PowerPoint PPT Presentation

Maximum Likelihood Estimates and the EM Algorithms II. Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm. Part 1 Computation Tools. Include Functions in R. source("file path") Example In MME.R:.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Maximum Likelihood Estimates and the EM Algorithms II

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Maximum Likelihood Estimates and the EM Algorithms II

Henry Horng-Shing Lu

Institute of Statistics

National Chiao Tung University

hslu@stat.nctu.edu.tw

http://tigpbp.iis.sinica.edu.tw/courses.htm

### Part 1Computation Tools

Include Functions in R
• source("file path")
• Example
• In MME.R:
• In R:

MME=function(y1, y2, y3, y4)

{

n=y1+y2+y3+y4;

phi1=4.0*(y1/n-0.5);

phi2=1-4*y2/n;

phi3=1-4*y3/n;

phi4=4.0*y4/n;

phi=(phi1+phi2+phi3+phi4)/4.0;

print("By MME method")

return(phi); # print(phi);

}

> source(“C:/MME.R”)

> MME(125, 18, 20, 24)

[1] "By MME method"

[1] 0.5935829

### Part 2Motivation Examples

A

a

b

B

A

b

a

A

a

A

a

B

b

B

B

b

Example 1 in Genetics (1)
• Two linked loci with alleles A and a, and B and b
• A, B: dominant
• a, b: recessive
• A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab

1/2

1/2

1/2

1/2

A

a

b

B

A

b

a

A

a

A

a

B

b

B

B

b

Example 1 in Genetics (2)
• Probabilities for genotypes in gametes

1/2

1/2

1/2

1/2

Example 1 in Genetics (3)
• Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.
• More:

http://en.wikipedia.org/wiki/Geneticshttp://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf

Example 1 in Genetics (5)
• Four distinct phenotypes:

A*B*, A*b*, a*B* and a*b*.

• A*: the dominant phenotype from (Aa, AA, aA).
• a*: the recessive phenotype from aa.
• B*: the dominant phenotype from (Bb, BB, bB).
• b*: the recessive phenotype from bb.
• A*B*: 9 gametic combinations.
• A*b*: 3 gametic combinations.
• a*B*: 3 gametic combinations.
• a*b*: 1 gametic combination.
• Total: 16 combinations.
Example 1 in Genetics (7)
• Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution:

We know that

and

So

Example 1 in Genetics (8)
• Suppose that we observe the data of

which is a random sample from

Then the probability mass function is

Maximum Likelihood Estimate (MLE)
• Likelihood:
• Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros.
• Under regular conditions, the MLE is consistent, asymptotic efficient and normal!
• More:

http://en.wikipedia.org/wiki/Maximum_likelihood

MLE for Example 1 (1)
• Likelihood
• MLE:
MLE for Example 1 (3)
• Checking:
• 1.
• 2.
• 3. Compare ?

### Part 3Numerical Solutions for the Score Equations of MLEs

A Banach Space
• A Banach space is a vector space over the field such that
• Every Cauchy sequence of converges in (i.e., is complete).
• More:

http://en.wikipedia.org/wiki/Banach_space

Lipschitz Continuous
• A closed subset and mapping
• is Lipschitz continuous on with

if

• is a contraction mapping on if is Lipschitz continuous and
• More:

http://en.wikipedia.org/wiki/Lipschitz_continuous

Fixed Point Theorem (1)
• If is a contraction mapping on if is Lipschitz continuous and
• has an unique fixed point

such that

• initial
Fixed Point Theorem (2)
• More:

http://en.wikipedia.org/wiki/Fixed-point_theorem

http://www.math-linux.com/spip.php?article60

Applications for MLE (1)
• Numerical solution

is a contraction mapping

: initial value that

Then

Applications for MLE (2)
• How to choose s.t. is a contraction mapping?
• Optimal ?
Parallel Chord Method (1)
• Parallel chord method is also called simple iteration.
Plot Parallel Chord Method by R (1)

### Simple iteration ###

y1 = 125; y2 = 18; y3 = 20; y4 = 24

# First and second derivatives of log likelihood #

f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi}

f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)}

x = c(10:80)*0.01

y = f1(x)

plot(x, y, type = 'l', main = "Parallel chord method", xlab = expression(varphi), ylab = "First derivative of log likelihood function")

abline(h = 0)

Plot Parallel Chord Method by R (2)

phi0 = 0.25 # Given the initial value 0.25 #

segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2)

segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2)

# Use the tangent line to find the intercept b0 #

b0 = f1(phi0)-f2(phi0)*phi0

curve(f2(phi0)*x+b0, add = T, col = "red")

phi1 = -b0/f2(phi0) # Find the closer phi #

segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2)

segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2)

# Use the parallel line to find the intercept b1 #

b1 = f1(phi1)-f2(phi0)*phi1

curve(f2(phi0)*x+b1, add = T, col = "red")

Define Functions for Example 1 in R
• We will define some functions and variables for finding the MLE in Example 1 by R

# Fist, second and third derivatives of log likelihood #

f1 = function(y1, y2, y3, y4, phi){y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi}

f2 = function(y1, y2, y3, y4, phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)}

f3 = function(y1, y2, y3, y4, phi) {2*y1*(2+phi)^(-3)-2*(y2+y3)*(1-phi)^(-3)+2*y4*(phi)^(-3)}

# Fisher Information #

I = function(y1, y2, y3, y4, phi) {(-1)*(y1+y2+y3+y4)*(1/(4*(2+phi))+1/(2*(1-phi))+1/(4*phi))}

y1 = 125; y2 = 18; y3 = 20; y4 = 24; initial = 0.9

Parallel Chord Method by R (1)

> fix(SimpleIteration)

function(y1, y2, y3, y4, initial){

phi = NULL;

i = 0;

alpha = -1.0/f2(y1, y2, y3, y4, initial);

phi2 = initial;

phi1 = initial+1;

while(abs(phi1-phi2) >= 1.0E-5){

i = i+1;

phi1 = phi2;

phi2 = alpha*f1(y1, y2, y3, y4, phi1)+phi1;

phi[i] = phi2;

}

print("By parallel chord method(simple iteration)");

return(list(phi = phi2, iteration = phi));

}

Parallel Chord Method by R (2)

> SimpleIteration(y1, y2, y3, y4, initial)

Newton-Raphson Method (1)
• http://math.fullerton.edu/mathews/n2003/Newton'sMethodMod.html
• http://en.wikipedia.org/wiki/Newton%27s_method
Plot Newton-Raphson Method by R (1)

### Newton-Raphson Method ###

y1 = 125; y2 = 18; y3 = 20; y4 = 24

# First and second derivatives of log likelihood #

f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi}

f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)}

x = c(10:80)*0.01

y = f1(x)

plot(x, y, type = 'l', main = "Newton-Raphson method", xlab = expression(varphi), ylab = "First derivative of log likelihood function")

abline(h = 0)

Plot Newton-Raphson Method by R (2)

# Given the initial value 0.25 #

phi0 = 0.25

segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2)

segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2)

# Use the tangent line to find the intercept b0 #

b0 = f1(phi0)-f2(phi0)*phi0

curve(f2(phi0)*x+b0, add = T, col = "purple", lwd = 2)

# Find the closer phi #

phi1 = -b0/f2(phi0)

segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2)

segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2)

Plot Newton-Raphson Method by R (3)

# Use the parallel line to find the intercept b1 #

b1 = f1(phi1)-f2(phi0)*phi1

curve(f2(phi0)*x+b1, add = T, col = "red")

curve(f2(phi1)*x-f2(phi1)*phi1+f1(phi1), add = T, col = "blue", lwd = 2)

legend(0.45, 250, c("Newton-Raphson", "Parallel chord method"), col = c("blue", "red"), lty = c(1, 1))

Newton-Raphson Method by R (1)

> fix(Newton)

function(y1, y2, y3, y4, initial){

i = 0;

phi = NULL;

phi2 = initial;

phi1 = initial+1;

while(abs(phi1-phi2) >= 1.0E-6){

i = i+1;

phi1 = phi2;

alpha = 1.0/(f2(y1, y2, y3, y4, phi1));

phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1);

phi[i] = phi2;

}

print("By Newton-Raphson method");

return (list(phi = phi2, iteration = phi));

}

Newton-Raphson Method by R (2)

> Newton(125, 18, 20, 24, 0.9)

[1] "By Newton-Raphson method"

\$phi

[1] 0.577876

\$iteration

[1] 0.8193054 0.7068499 0.6092827 0.5792747 0.5778784 0.5778760 0.5778760

Halley’s Method
• The Newton-Raphson iteration function is
• It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley (1656-1742):

(http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html)

Halley’s Method by R (1)

> fix(Halley)

function( y1, y2, y3, y4, initial){

i = 0;

phi = NULL;

phi2 = initial;

phi1 = initial+1;

while(abs(phi1-phi2) >= 1.0E-6){

i = i+1;

phi1 = phi2;

alpha = 1.0/(f2(y1, y2, y3, y4, phi1));

phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1)*1.0/(1.0-f1(y1, y2, y3, y4, phi1)*f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3, y4, phi1)*f2(y1, y2, y3, y4, phi1)*2.0));

phi[i] = phi2;

}

print("By Halley method");

return (list(phi = phi2, iteration = phi));

}

Halley’s Method by R (2)

> Halley(125, 18, 20, 24, 0.9)

[1] "By Halley method"

\$phi

[1] 0.577876

\$iteration

[1] 0.5028639 0.5793692 0.5778760 0.5778760

Bisection Method (1)
• Assume that and that there exists a number such that . If and have opposite signs, and represents the sequence of midpoints generated by the bisection process, thenand the sequence converges to .
• That is, .
• http://en.wikipedia.org/wiki/Bisection_method
Plot the Bisection Method by R

### Bisection method ###

y1 = 125; y2 = 18; y3 = 20; y4 = 24

f <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi}

x = c(1:100)*0.01

y = f(x)

plot(x, y, type = 'l', main = "Bisection method", xlab = expression(varphi), ylab = "First derivative of log likelihood function")

abline(h = 0)

abline(v = 0.5, col = "red")

abline(v = 0.75, col = "red")

text(0.49, 2200, labels = "1")

text(0.74, 2200, labels = "2")

Bisection Method by R (1)

> fix(Bisection)

function(y1, y2, y3, y4, A, B) # A, B is the boundary of parameter #

{

Delta = 1.0E-6; # Tolerance for width of interval #

Satisfied = 0; # Condition for loop termination #

phi = NULL;

YA = f1(y1, y2, y3, y4, A); # Compute function values #

YB = f1(y1, y2, y3, y4, B);

# Calculation of the maximum number of iterations #

Max = as.integer(1+floor((log(B-A)-log(Delta))/log(2)));

# Check to see if the bisection method applies #

if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){

print("The values of function in boundary point do not differ in sign.");

Bisection Method by R (2)

print("Therefore, this method is not appropriate here.");

quit(); # Exit program #

}

for(K in 1:Max){

if(Satisfied == 1)

break;

C = (A+B)/2; # Midpoint of interval

YC = f1(y1, y2, y3, y4, C); # Function value at midpoint #

if((K-1) < 100)

phi[K-1] = C;

if(YC == 0){

A = C; # Exact root is found #

B = C;

}

else{

if((YB*YC) >= 0 ){

Bisection Method by R (3)

B = C; # Squeeze from the right #

YB = YC;

}

else{

A = C; # Squeeze from the left #

YA = YC;

}

}

if((B-A) < Delta)

Satisfied = 1; # Check for early convergence #

} # End of 'for'-loop #

print("By Bisection Method");

return(list(phi = C, iteration = phi));

}

Bisection Method by R (4)

> Bisection(125, 18, 20, 24, 0.25, 1)

[1] "By Bisection Method"

\$phi

[1] 0.5778754

\$iteration

[1] 0.4375000 0.5312500 0.5781250 0.5546875 0.5664062 0.5722656 0.5751953

[8] 0.5766602 0.5773926 0.5777588 0.5779419 0.5778503 0.5778961 0.5778732

[15] 0.5778847 0.5778790 0.5778761 0.5778747 0.5778754

Secant Method
• More: http://en.wikipedia.org/wiki/Secant_method http://math.fullerton.edu/mathews/n2003/SecantMethodMod.html
Secant Method by R (1)

> fix(Secant)

function(y1, y2, y3, y4, initial1, initial2){

phi = NULL;

phi2 = initial1;

phi1 = initial2;

alpha = (phi2-phi1)/(f1(y1, y2, y3, y4, phi2)-f1(y1, y2, y3, y4, phi1));

phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1);

i = 0;

while(abs(phi1-phi2) >= 1.0E-6){

i = i+1;

phi1 = phi2;

Secant Method by R (2)

alpha = (phi2-phi1)/(f1(y1, y2, y3, y4, phi2)-f1(y1, y2, y3, y4, phi1));

phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1);

phi[i] = phi2;

}

print("By Secant method");

return (list(phi=phi2,iteration=phi));

}

Secant Method by R (3)

> Secant(125, 18, 20, 24, 0.9, 0.05)

[1] "By Secant method"

\$phi

[1] 0.577876

\$iteration

[1] 0.2075634 0.4008018 0.5760396 0.5778801 0.5778760 0.5778760

S

C

B

A

Secant-Bracket Method
• The secant-bracket method is also called the regular falsi method.
Secant-Bracket Method by R (1)

> fix(RegularFalsi)

function(y1, y2, y3, y4, A, B){

phi = NULL;

i = -1;

Delta = 1.0E-6; # Tolerance for width of interval #

Satisfied = 1; # Condition for loop termination #

# Endpoints of the interval [A,B] #

YA = f1(y1, y2, y3, y4, A); # compute function values #

YB = f1(y1, y2, y3, y4, B);

# Check to see if the bisection method applies #

if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){

print("The values of function in boundary point do not differ in sign");

print("Therefore, this method is not appropriate here");

q(); # Exit program #

}

Secant-Bracket Method by R (2)

while(Satisfied){

i = i+1;

C = (B*f1(y1, y2, y3, y4, A)-A*f1(y1, y2, y3, y4, B))/(f1(y1, y2, y3, y4, A)-f1(y1, y2, y3, y4, B)); # Midpoint of interval #

YC = f1(y1, y2, y3, y4, C); # Function value at midpoint #

phi[i] = C;

if(YC == 0){ # First 'if' #

A = C; # Exact root is found #

B = C;

}else{

if((YB*YC) >= 0 ){

B = C; # Squeeze from the right #

YB = YC;

Secant-Bracket Method by R (3)

}else{

A = C; # Squeeze from the left #

YA = YC;

}

}

if(f1(y1, y2, y3, y4, C) < Delta)

Satisfied = 0; # Check for early convergence #

}

print("By Regular Falsi Method")

return(list(phi = C, iteration = phi));

}

Secant-Bracket Method by R (4)

> RegularFalsi(y1, y2, y3, y4, 0.9, 0.05)

[1] "By Regular Falsi Method"

\$phi

[1] 0.577876

\$iteration

[1] 0.5758648 0.5765007 0.5769352 0.5772324 0.5774356 0.5775746 0.5776698

[8] 0.5777348 0.5777794 0.5778099 0.5778307 0.5778450 0.5778548 0.5778615

[15] 0.5778660 0.5778692 0.5778713 0.5778728 0.5778738 0.5778745 0.5778750

[22] 0.5778753 0.5778755 0.5778756 0.5778757 0.5778758 0.5778759 0.5778759

[29] 0.5778759 0.5778759 0.5778759 0.5778760 0.5778760 0.5778760 0.5778760

[36] 0.5778760 0.5778760

Fisher Scoring Method
• Fisher scoring method replaces by where is the Fisher information matrix when the parameter may be multivariate.
Fisher Scoring Method by R (1)

> fix(Fisher)

function(y1, y2, y3, y4, initial){

i = 0;

phi = NULL;

phi2 = initial;

phi1 = initial+1;

while(abs(phi1-phi2) >= 1.0E-6){

i = i+1;

phi1 = phi2;

alpha = 1.0/I(y1, y2, y3, y4, phi1);

phi2 = phi1-f1(y1, y2, y3, y4, phi1)/I(y1, y2, y3, y4, phi1);

phi[i] = phi2;

}

print("By Fisher method");

return(list(phi = phi2, iteration = phi));

}

Fisher Scoring Method by R (2)

> Fisher(125, 18, 20, 24, 0.9)

[1] "By Fisher method"

\$phi

[1] 0.577876

\$iteration

[1] 0.5907181 0.5785331 0.5779100 0.5778777 0.5778761 0.5778760

Order of Convergence
• Order of convergence is if

and for .

• More: http://en.wikipedia.org/wiki/Order_of_convergence
• Note: as

Hence, we can use regression to estimate

Theorem for Newton-Raphson Method
• If , is a contraction mapping then and
• If exists, has a simple zero, then such that of the Newton-Raphson method is a contraction mapping and .
Find Convergence Order by R (1)

> # Coverage order #

> # Newton method can be substitute for different method #

> R = Newton(y1, y2, y3, y4, initial)

[1] "By Newton-Raphson method"

> temp = log(abs(R\$iteration-R\$phi))

> y = temp[2:(length(temp)-1)]

> x = temp[1:(length(temp)-2)]

> lm(y~x)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

0.6727 2.0441

Find Convergence Order by R (2)

>R = SimpleIteration(y1, y2, y3, y4, initial)

[1] "By parallel chord method(simple iteration)"

> temp = log(abs(R\$iteration-R\$phi))

> y = temp[2:(length(temp)-1)]

> x = temp[1:(length(temp)-2)]

> lm(y~x)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-0.01651 1.01809

> R = Secant(y1, y2, y3, y4, initial, 0.05)

[1] "By Secant method"

> temp = log(abs(R\$iteration-R\$phi))

> y = temp[2:(length(temp)-1)]

> x = temp[1:(length(temp)-2)]

> lm(y~x)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-1.245 1.869

> R = Fisher(y1, y2, y3, y4, initial)

[1] "By Fisher method"

> temp=log(abs(R\$iteration-R\$phi))

> y=temp[2:(length(temp)-1)]

> x=temp[1:(length(temp)-2)]

> lm(y~x)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-2.942 1.004

> R = Bisection(y1, y2, y3, y4, 0.25, 1)

[1] "By Bisection Method"

> temp=log(abs(R\$iteration-R\$phi))

> y=temp[2:(length(temp)-1)]

> x=temp[1:(length(temp)-2)]

> lm(y~x)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-1.9803 0.8448

Find Convergence Order by R (3)

> R = RegularFalsi(y1, y2, y3, y4, initial, 0.05)

[1] "By Regular Falsi Method"

> temp = log(abs(R\$iteration-R\$phi))

> y = temp[2:(length(temp)-1)]

> x = temp[1:(length(temp)-2)]

> lm(y~x)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-0.2415 1.0134

Exercises
• Write your own programs for those examples presented in this talk.
• Write programs for those examples mentioned at the following web page:

http://en.wikipedia.org/wiki/Maximum_likelihood

• Write programs for the other examples that you know.
More Exercises (1)
• Example 3 in genetics:

The observed data are

where , , and fall in

such that

Find the MLEs for , , and .

More Exercises (2)
• Example 4 in the positron emission tomography (PET):

The observed data are

and

• The values of are known and the unknown parameters are .
• Find the MLEs for .
More Exercises (3)
• Example 5 in the normal mixture:

The observed data are random samples from the following probability density function:

• Find the MLEs for the following parameters: