Download
1 / 23

Presentation taken from Nir Friedman’s HU course, available at cs.huji.ac.il/~pmai . - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Maximum Likelihood ( ML ) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a. Presentation taken from Nir Friedman’s HU course, available at www.cs.huji.ac.il/~pmai . Changes made by Dan Geiger, Ydo Wexler, and finally by Benny Chor.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Presentation taken from Nir Friedman’s HU course, available at cs.huji.ac.il/~pmai . ' - dayo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

.


Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic treesComput. Genomics, lecture 6a

Presentation taken from Nir Friedman’s HU course, available at www.cs.huji.ac.il/~pmai.

Changes made by Dan Geiger, Ydo Wexler, and finally by Benny Chor.

.


The setting
The Setting

  • We have a probabilistic model,M, of some phenomena. We know exactly the structure of M, but not the values of its probabilistic parameters, .

  • Each “execution” of M produces an observation, x[i] , according to the (unknown) distribution induced by M.

  • Goal: After observing x[1] ,…, x[n] , estimate the model parameters, , that generated the observed data.


Maximum likelihood estimation mle
Maximum Likelihood Estimation (MLE)

  • The likelihood of the observed data, given the

    model parameters, as the conditional

    probabilitythat the model, M, with parameters

    , produces x[1] ,…, x[n] .

    L()=Pr(x[1] ,…, x[n] | , M),

  • In MLE we seek the model parameters, , that

    maximize the likelihood.


Maximum likelihood estimation mle1
Maximum Likelihood Estimation (MLE)

  • In MLE we seek the model parameters, , that

    maximize the likelihood.

  • The MLE principle is applicable in a wide

    variety of applications, from speech recognition,

    through natural language processing, to

    computational biology.

  • We will start with the simplest example:

    Estimating the bias of a coin. Then apply MLE

    to inferring phylogenetic trees.

  • (will later talk about MAP - Bayesian inference).


Example binomial experiment
Example: Binomial Experiment

  • When tossed, it can land in one of two positions: Head(H) or Tail (T)

Head

Tail

  • We denote by  the (unknown) probability P(H).

Estimation task:

  • Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 - 


Statistical parameter fitting restement

i.i.d.

Samples

(why??)

Statistical Parameter Fitting (restement)

  • Consider instances x[1], x[2], …, x[M]

    such that

    • The set of values that x can take is known

    • Each is sampled from the same distribution

    • Each sampled independently of the rest

  • The task is to find a vector of parameters

     that have generated the given data. This

    vector parameter  can be used to predict

    future data.


The likelihood function

L()

0

0.2

0.4

0.6

0.8

1

The Likelihood Function

  • How good is a particular ?It depends on how likely it is to generate the observed data

  • The likelihood for the sequence H,T, T, H, H is


Sufficient statistics
Sufficient Statistics

  • To compute the likelihood in the thumbtack example we only require NH and NT (the number of heads and the number of tails)

  • NH and NT are sufficient statistics for the binomial distribution


Sufficient statistics1

  • Formally, s(D) is a sufficient statistics if for any two datasets D and D’

    • s(D) = s(D’ ) LD() = LD’ ()

Datasets

Statistics

Sufficient Statistics

  • A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood


Maximum likelihood estimation
Maximum Likelihood Estimation

MLE Principle:

Choose parameters that maximize the likelihood function

  • This is one of the most commonly used estimators in statistics

  • Intuitively appealing

  • One usually maximizes the log-likelihood function, defined as lD() = ln LD()


Example mle in binomial data

L()

0

0.2

0.4

0.6

0.8

1

Example:

(NH,NT ) = (3,2)

MLE estimate is 3/5 = 0.6

Example: MLE in Binomial Data

Taking derivative and equating it to 0,

we get

(which coincides with what one would expect)


From binomial to multinomial

Sufficient statistics:

  • N1, N2, …, NK - the number of times each outcome is observed

    Likelihood function:

    MLE: (proof @ assignment 3)

From Binomial to Multinomial

  • Now suppose X can have the values 1,2,…,K(For example a die has K=6 sides)

  • We want to learn the parameters 1, 2. …, K


Example multinomial

MLE:

Example: Multinomial

  • Let be a protein sequence

  • We want to learn the parameters q1, q2,…,q20

    corresponding to the frequencies of the 20 amino acids

  • N1, N2, …, N20 - the number of times each amino acid is observed in the sequence

    Likelihood function:


Inferring phylogenetic trees
Inferring Phylogenetic Trees

  • Let be n sequence(DNA or AA).

    Assume for simplicity they are all same length, l.

  • We want to learn the parameters of a phylogenetic tree that maximizes the likelihood.

  • But wait: Should first specify a model.


A probabilistic model
A Probabilistic Model

  • Our models will consist of a “regular” tree, where

    in addition, edges are assigned substituion probabilities.

  • For simplicity, assume our “DNA” has only two

    states, say X and Y.

  • If edge eis assigned probability pe, this means

    that the probability of substitution (X Y)

    across e is pe.


A probabilistic model 2
A Probabilistic Model (2)

  • Our models will consist of a “regular” tree, where

    in addition, edges are assigned substituion probabilities.

  • For simplicity, assume our “DNA” has only two

    states, say X and Y.

  • If edge eis assigned probability pe, this means

    that the probability of substitution (X Y)

    across e is pe.


A probabilistic model 3
A Probabilistic Model (3)

  • If edge eis assigned probability pe, this means

    that the probability of more involved patterns of

    substitution across e(e.g.XXYXYYXYXX)

    is determined, and easily computed: pe2(1- pe)3

    for this pattern.

  • Q.: What if pattern on both sides is known, but pe is

    not known?

  • A.: Makes sense to seek pe that maximizes

    probability of observation.

  • So far, this is identical to coin toss example.


A probabilistic model 4
A Probabilistic Model (4)

But a single edge is a fairly boring tree…

Now we don’t know the states at internal node(s), nor

the edge parameters pe1, pe2, pe3

YXYXX

XXYXY

pe2

pe1

pe3

?????

YYYYX


T wo ways to go
Two Ways to Go

1. Maximize over states of internal node(s)

2. Average over states of internal node(s)

In both cases, we maximize over edge parameters

YXYXX

XXYXY

pe2

pe1

pe3

?????

YYYYX


T wo ways to go1
Two Ways to Go

In the first version (average, or sum over states of internal

nodes) we are looking for the “most likely” setting of tree edges.

This is called maximum likelihood (ML) inference of

phylogenetic trees.

ML is probably the inference method most widely (wildly )

used.

YXYXX

XXYXY

pe2

pe1

pe3

?????

YYYYX


T wo ways to go2
Two Ways to Go

In the second version (maximize over states of internal nodes)

we are looking for the “most likely” ancestral states. This is

called ancestral maximum likelihood (AML).

In some sense AML is “between” MP (having ancestral states)

and ML (because the goal is still to maximize likelihood).

YXYXX

XXYXY

pe2

pe1

pe3

?????

YYYYX


bust

or a break

.


ad