V10: Bayesian Parameter Estimation

1 / 38

# V10: Bayesian Parameter Estimation

## V10: Bayesian Parameter Estimation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. V10: Bayesian Parameter Estimation Althoughthe MLE approachseems plausible, itcanbeoverlysimplistic in manycases. Assumeagainthatweperformthethumbtackexperiment andget 3 heads out of 10 → assuming  = 0.3 isthenquitereasonable. But whatifwe do the same experimentwith a standardcoin, and also get 3 heads? Intuitively, wewouldprobably not concludethattheparameterofthecoinis 0.3. Why not? Becausewehave a lotmoreexperiencewithtossingcoins, wehave a lotmorepriorknowledgeabouttheirbehavior. Mathematics of Biological Networks

2. Joint probabilisticmodel In theBayesianapproach, weencodeourpriorknowledge about with a probabilitydistribution. This distributionrepresentshowlikelywearea priori tobelievethe different choicesofparameters Thenwecancreatea jointdistributionovertheparameter and thedatacasesX[1], …, X[M] thatweareabouttoobserve. This jointdistributioncapturesourassumptionsabouttheexperiment. As longaswedon‘tknow , thetossesare not marginallyindependent becauseeachtosstellsussomethingabout. One  isknown, weassumethatthetossesareconditionallyindependentgiven . Mathematics of Biological Networks

3. Joint probabilisticmodel Wecandescribetheseassumptionsusingtheprobabilisticmodelbelow. Mathematics of Biological Networks

4. Joint probabilisticmodel Having determinedthemodelstructure, itremainstospecifythelocalprobabilitymodels in thisnetwork. Webeginbyconsideringtheprobability P(X[m] | ) : We also needtodescribethepriordistributionover  , P(). This is a continuousdensityovertheinterval [0,1]. Thereareseveralpossiblechoicesforthis. Letusfirstconsiderhowtouse it. Mathematics of Biological Networks

5. Joint probabilisticmodel The networkstructureimpliesthatthejointdistribution of a particulardatasetandfactorizesas where M[1] isthenumberofheads in thedata, M[0] isthenumberoftails, and P( x[1], …, x[M] |) issimplythelikelihoodfunctionL( : D). This networkspecifies a jointprobabilitymodeloverparametersanddata. Mathematics of Biological Networks

6. Posteriordistribution There areseveralways in whichwecanusethisnetwork. Forexample, wecantake an observeddataset D of M outcomes, anduseittoinstantiatethevaluesof x[1], …, x[M]. Wecanthencomputetheposteriordistributionover: The firstterm in thenumeratoristhelikelihood, thesecondtermisthepriorovertheparameters. The denominatoris a normalizingfactor so thattheproductis a proper densityfunction[0,1]. Mathematics of Biological Networks

7. Prediction Let usconsiderthevalueofthenextcointoss x[M+1] giventheobservationsofthefirst M tosses. Since  isunknown, we will consider all itspossiblevaluesandintegrateoverthem Whengoingfromthesecondtothethirdline, weusedtheconditionalindepenciesimpliedbythe meta-network. → weareintegratingtheposteriorover topredicttheprobabilityofheadsforthenexttoss. Mathematics of Biological Networks

8. Prediction: revisitthumbtackexample Assume thatourprioris uniform (constant) over  in theinterval [0,1]. Thenis proportional tothelikelihood . Pluggingthisintothe integral, weneedtocompute This so-calledBayesianestimatorisquitesimilartothe MLE prediction exceptthatitaddsone „imaginary“ sample toeachcount. Mathematics of Biological Networks

9. Priors: Beta distribution When usingnonuniformpriors, thechallengeisto pick a continuousdistributionthatcanbewritten in a compact form (e.g. using an analyticalformula), andthatcanbeupdatedefficientlyaswegetnewdata. An appropriateprioristheBeta distribution. Definition: a Beta distributionisparametrizedbytwo real and positive hyperparameters1, 0anddefinedas: The normalizationconstantisdefinedas: whereistheGamma function. Mathematics of Biological Networks

10. Beta distribution The parameters 1and 0correspondintuitivelytothenumberofimaginary headsandtailsthatwehave „seen“ beforestartingtheexperiment. These areexamplesofbetafunctions Mathematics of Biological Networks

11. Gamma function The Gamma functionissimply a continuousgeneralizationoffactorials. Itsatisfies (1) = 1 and (x + 1) = x (x). Hence(n + 1) = n! Beta distributionshaveproperties thatmakethemparticularlyusefulforparameterestimation. Assumeourdistribution P() isBeta(1,0) andconsider a singlecointoss X. Letuscomputethe marginal probabilityover X, based on P(). Weneedtointegrateout . Mathematics of Biological Networks

12. Properties of Beta functions This findingmatchesourinituitionthatthe Beta priorindicates thatwehaveseen1 (imaginary) headsand0(imaginary) tails. Mathematics of Biological Networks

13. Properties of Beta distributions As wegetmoreobservations, i.e. M[1] headsand M[0] tailsitfollowsthat whichispreciselyBeta(1+ M[1], 0+ M[0]). This resultillustrates a keypropertyofthe Beta distribution: Iftheprioris a Beta distribution, thentheposteriordistribution, thatis, thepriorconditioned on theevidence, is also a Beta distribution. Mathematics of Biological Networks

14. Priors An immediate consequenceisthatwecancompute theprobabilitiesoverthenexttoss: where  = 1 + 0 andM = M1+ M0 In thiscase, ourposterior Beta distributiontellsus thatwehaveseen1 + M[1] (imaginary) headsand 0+ M[0] tails. Mathematics of Biological Networks

15. Effectof Priors Let uscomparetheeffectofBeta(2,2) vs. Beta(10,10) on theprobabilityoverthenextcointoss. Bothpriorspredictthattheprobabilityofheads in thefirsttossis. How do different priors (Beta(10,10) ismorenarrow) affectfurtherconvergence? Supposeweobserve 3 heads in 10 tosses. Usingthefirstprior, ourestimateis Usingthesecondpriorgives But whenweobtainmuchmoredata, theeffectoftheprioralmostdisappears. Ifweobtain 1000 tossesofwhich 300 areheads, both and givevaluescloseto 0.3 Mathematics of Biological Networks

16. Priors andPosteriors Letusassume a generallearningproblem whereweobserve a trainingset D thatcontains M IID samples of a random variable X from an unknowndistribution P*(X). We also assumethatwehave a parametricmodel P( | ) wherewecanchooseparametersfrom a parameterspace . The MLE approachattemptedto find theparameters in  thatare „best“ giventhedata. The Bayesianapproach, on theotherhand, does not attempt to find a singlebestestimate. Instead, onequantifiesthesubjectiveprobabilityfor different valuesof  after seeingtheevidence. Mathematics of Biological Networks

17. Priors andPosteriors We needtodescribe a jointdistribution P(D, ) overthedataandtheparameters. Wecaneasilywrite The firstterm on therightisthelikelihoodfunction (see V8 – example on predicting PP complexes). The secondtermisthepriordistributionoverthepossiblevalues in . Itcapturesour initial uncertaintyabouttheparameters. Itcan also captureourpreviousexperiencebeforewestarttheexperiment. Mathematics of Biological Networks

18. Priors andPosteriors Oncewehavespecifiedthelikelihoodfunctionandtheprior, wecanusethedatatoderivetheposteriordistribution overtheparametersusingBayesrule: The term P(D) isthemarginal likelihoodofthedata whatistheintegrationofthelikelihood over all possibleparameterassignments. Mathematics of Biological Networks

19. Priors andPosteriors Let usreconsidertheexampleof a multinomialdistribution(MD). Weneedtodescribeouruncertaintyabouttheparametersof MD. The parameterspacecontains all nonnegativevectors such that. As wesawpreviously, thelikelihoodfunctionis Sincetheposterioris a productofthepriorandthelikelihood, itisnaturaltorequirethattheprior also have a form similar tothelikelihood. One such prioristheDirichletdistributionwhichgeneralizes the Beta distribution. Mathematics of Biological Networks

20. Dirichletdistribution A Dirichletdistributionisspecifiedby a setofhyperparameters1, … K so that Weuse  todenote. If weuse a Dirichletprior, thentheposterioris also Dirichlet: Proposition: If P() isthen P( | D) is , where M[K] isthenumberofoccurrencesofxk. Priors such astheDirichletareusefulsincetheyensurethattheposteriorhas a nicecompactdescriptionandusesthe same representationastheprior. We will see on 2 examplestheeffectsofpriors on posteriorestimates. Mathematics of Biological Networks

21. Effectof Beta prior on convergenceofposteriorestimates For a givendatasetsize M, weassumethat D contains 0.2 M headsand 0.8 M tails. As theamountof real datagrows, ourestimateconvergestothetrueunderlyingdistribution, regardlessofthestartingpoint. (Left): effectofvaryingpriormeans1´, 0´ for a fixedpriorstrength . (Right): effectofvaryingpriorstrengthfor a fixedpriormean 1´ = 0´= 0.5 Mathematics of Biological Networks

22. Convergenceofparameterestimate Dottedline: Beta(10,10) Small-dash line: Beta(5,5) Large-dash line: Beta (1,1) → Beta(10,10) haslonger „memory“ about initial conditions Effectof different priors on smoothingtheparameterestimates. Below thegraphisshowntheparticularsequenceoftosses. Solid line: MLE estimate Dashedlines: Bayesianestimateswith different strengthsand uniform priormeans. Mathematics of Biological Networks

23. Imprinting effects during hematopoietic differentiation? • One of the most well studied developmental systems • Mature cell line models Rathinam and Flavell 2008 Mohamed Hamed (unpublished) Mathematics of Biological Networks

24. Blood lineages Mohamed Hamed (unpublished) Mathematics of Biological Networks

25. Motivation I • Identify cellular events that drive cell differentiation and reprogramming • Construct gene-regulatory network (GRN) that governs - transitions between the different states along the developmental cell lines and - pausing at specific states. • Do imprinted genes play a role in regulating differentiation?. Mohamed Hamed (unpublished) Mathematics of Biological Networks

26. Motivation II Berg, Lin et al. (2011) Real-time PCR analysis of imprinted gene expression in hematopoietic cells Imprinted genes drastically down-regulated in differentiated cells. during the earliest phases of hematopoietic development, imprinted genes may have distinct roles Mohamed Hamed (unpublished) Mathematics of Biological Networks

27. Imprinted genes • violate the usual rule of inheritance • bi-allelic genes : gene copy (allele) encoding hemoglobin from dad gene copy (allele) encoding hemoglobin from mom Child: expresses equal amounts of the 2 types of hemoglobin • mono-allelic (imprinted) genes : one allele silenced by DNA methylation Mathematics of Biological Networks

28. Imprinted genes cluster in the genome Mathematics of Biological Networks

29. Parental conflict hypothesis = “battle of the sexes” Paternally expressed genes Maternally expressed genes embryonicggrowth in placenta embryonic growth in placenta Mathematics of Biological Networks

30. Mouse Pluripotency network (Plurinet) Pluripotency network in mouse G. Fuellen et al. (2010) based on 177 publications 274 genes 574 stimulations / inhibitions/ and interactions Mathematics of Biological Networks

31. Gene regulatory network around Oc4 controls pluripotency Tightly interwoven network of 9 transcription factors keeps ES cells in pluripotent state. 6632 human genes have binding site in their promoter region for at least one of these 9 TFs. Many genes have multiple motifs. 800 genes bind ≥ 4 TFs. Mathematics of Biological Networks

32. Gene expression profiles imprinted pluri hematopoiesis c a b • long and short-term hematopoietic stem cells • Intermediate progenitor populations such as Lymphoid primed multipotent progenitor (LMPP), common lymphoid progenitor (CLP), and granulocyte–monocyte progenitor (GMP), and • Terminally differentiated blood progeny such as NK cells and granulocyte- monocyte (GM). • All 3 gene sets contain genes that are • upregulated either in (1), (2) or (3) stages Mohamed Hamed (unpublished) Mathematics of Biological Networks

33. Lineage-specific marker genes from all 3 gene sets cluster together red : maternally expressed imprinted genes blue : paternally expressed imprinted genes cyan : pluripotency genes orange: hematopoietic genes Mohamed Hamed (unpublished) Mathematics of Biological Networks

34. Imprinted gene network (IGN) Aim: explain surprisingly similar expression profiles of 3 gene sets • only 5imprinted genes (Gab1, Ins1, Phf17, Tsix, and Xist) are present in the pluripotency list and • only 3 imprinted genes (Axl, Calcr, and Gnas) belong to the hematopoietic list. Who regulates the imprinted genes? • Identify regulators (TFs) of imprinted genes and target genes regulated by imprinted genes Mohamed Hamed (unpublished) Mathematics of Biological Networks

35. Mohamed Hamed (unpublished) Johannes Trumm, MSc thesis ,CBI, 2011. Mathematics of Biological Networks

36. Mebitoo GRN Plugin Johannes Trumm, MSc thesis ,CBI, 2011. Mathematics of Biological Networks

37. gene sets are (largely) co-expressed andenriched with developmental GO terms Mohamed Hamed (unpublished) Mathematics of Biological Networks

38. Summary Parameter learning from data is an important research field. We entered into some basics about MLE and Bayesian parameter estimation. Powerful and efficient priors need to be estimated, see Beta function. V11: enter into structure learning. Application example: construct GRN to derive genes that drive hematopoiesis. Intersection with pluripotency and imprinted genes reveals interesting module of co-expressed genes with homogenous involvement in development. Mathematics of Biological Networks