V10: Bayesian Parameter Estimation

V10: Bayesian Parameter Estimation Althoughthe MLE approachseems plausible, itcanbeoverlysimplistic in manycases. Assumeagainthatweperformthethumbtackexperiment andget 3 heads out of 10 → assuming  = 0.3 isthenquitereasonable. But whatifwe do the same experimentwith a standardcoin, and also get 3 heads? Intuitively, wewouldprobably not concludethattheparameterofthecoinis 0.3. Why not? Becausewehave a lotmoreexperiencewithtossingcoins, wehave a lotmorepriorknowledgeabouttheirbehavior. Mathematics of Biological Networks

Joint probabilisticmodel In theBayesianapproach, weencodeourpriorknowledge about with a probabilitydistribution. This distributionrepresentshowlikelywearea priori tobelievethe different choicesofparameters Thenwecancreatea jointdistributionovertheparameter and thedatacasesX[1], …, X[M] thatweareabouttoobserve. This jointdistributioncapturesourassumptionsabouttheexperiment. As longaswedon‘tknow , thetossesare not marginallyindependent becauseeachtosstellsussomethingabout. One  isknown, weassumethatthetossesareconditionallyindependentgiven . Mathematics of Biological Networks

Joint probabilisticmodel Wecandescribetheseassumptionsusingtheprobabilisticmodelbelow. Mathematics of Biological Networks

Joint probabilisticmodel Having determinedthemodelstructure, itremainstospecifythelocalprobabilitymodels in thisnetwork. Webeginbyconsideringtheprobability P(X[m] | ) : We also needtodescribethepriordistributionover  , P(). This is a continuousdensityovertheinterval [0,1]. Thereareseveralpossiblechoicesforthis. Letusfirstconsiderhowtouse it. Mathematics of Biological Networks

Joint probabilisticmodel The networkstructureimpliesthatthejointdistribution of a particulardatasetandfactorizesas where M[1] isthenumberofheads in thedata, M[0] isthenumberoftails, and P( x[1], …, x[M] |) issimplythelikelihoodfunctionL( : D). This networkspecifies a jointprobabilitymodeloverparametersanddata. Mathematics of Biological Networks

Posteriordistribution There areseveralways in whichwecanusethisnetwork. Forexample, wecantake an observeddataset D of M outcomes, anduseittoinstantiatethevaluesof x[1], …, x[M]. Wecanthencomputetheposteriordistributionover: The firstterm in thenumeratoristhelikelihood, thesecondtermisthepriorovertheparameters. The denominatoris a normalizingfactor so thattheproductis a proper densityfunction[0,1]. Mathematics of Biological Networks

Prediction Let usconsiderthevalueofthenextcointoss x[M+1] giventheobservationsofthefirst M tosses. Since  isunknown, we will consider all itspossiblevaluesandintegrateoverthem Whengoingfromthesecondtothethirdline, weusedtheconditionalindepenciesimpliedbythe meta-network. → weareintegratingtheposteriorover topredicttheprobabilityofheadsforthenexttoss. Mathematics of Biological Networks

Prediction: revisitthumbtackexample Assume thatourprioris uniform (constant) over  in theinterval [0,1]. Thenis proportional tothelikelihood . Pluggingthisintothe integral, weneedtocompute This so-calledBayesianestimatorisquitesimilartothe MLE prediction exceptthatitaddsone „imaginary“ sample toeachcount. Mathematics of Biological Networks

Priors: Beta distribution When usingnonuniformpriors, thechallengeisto pick a continuousdistributionthatcanbewritten in a compact form (e.g. using an analyticalformula), andthatcanbeupdatedefficientlyaswegetnewdata. An appropriateprioristheBeta distribution. Definition: a Beta distributionisparametrizedbytwo real and positive hyperparameters1, 0anddefinedas: The normalizationconstantisdefinedas: whereistheGamma function. Mathematics of Biological Networks

Beta distribution The parameters 1and 0correspondintuitivelytothenumberofimaginary headsandtailsthatwehave „seen“ beforestartingtheexperiment. These areexamplesofbetafunctions Mathematics of Biological Networks

Gamma function The Gamma functionissimply a continuousgeneralizationoffactorials. Itsatisfies (1) = 1 and (x + 1) = x (x). Hence(n + 1) = n! Beta distributionshaveproperties thatmakethemparticularlyusefulforparameterestimation. Assumeourdistribution P() isBeta(1,0) andconsider a singlecointoss X. Letuscomputethe marginal probabilityover X, based on P(). Weneedtointegrateout . Mathematics of Biological Networks

Properties of Beta functions This findingmatchesourinituitionthatthe Beta priorindicates thatwehaveseen1 (imaginary) headsand0(imaginary) tails. Mathematics of Biological Networks

Properties of Beta distributions As wegetmoreobservations, i.e. M[1] headsand M[0] tailsitfollowsthat whichispreciselyBeta(1+ M[1], 0+ M[0]). This resultillustrates a keypropertyofthe Beta distribution: Iftheprioris a Beta distribution, thentheposteriordistribution, thatis, thepriorconditioned on theevidence, is also a Beta distribution. Mathematics of Biological Networks

Priors An immediate consequenceisthatwecancompute theprobabilitiesoverthenexttoss: where  = 1 + 0 andM = M1+ M0 In thiscase, ourposterior Beta distributiontellsus thatwehaveseen1 + M[1] (imaginary) headsand 0+ M[0] tails. Mathematics of Biological Networks

Effectof Priors Let uscomparetheeffectofBeta(2,2) vs. Beta(10,10) on theprobabilityoverthenextcointoss. Bothpriorspredictthattheprobabilityofheads in thefirsttossis. How do different priors (Beta(10,10) ismorenarrow) affectfurtherconvergence? Supposeweobserve 3 heads in 10 tosses. Usingthefirstprior, ourestimateis Usingthesecondpriorgives But whenweobtainmuchmoredata, theeffectoftheprioralmostdisappears. Ifweobtain 1000 tossesofwhich 300 areheads, both and givevaluescloseto 0.3 Mathematics of Biological Networks

Priors andPosteriors Letusassume a generallearningproblem whereweobserve a trainingset D thatcontains M IID samples of a random variable X from an unknowndistribution P*(X). We also assumethatwehave a parametricmodel P( | ) wherewecanchooseparametersfrom a parameterspace . The MLE approachattemptedto find theparameters in  thatare „best“ giventhedata. The Bayesianapproach, on theotherhand, does not attempt to find a singlebestestimate. Instead, onequantifiesthesubjectiveprobabilityfor different valuesof  after seeingtheevidence. Mathematics of Biological Networks

Priors andPosteriors We needtodescribe a jointdistribution P(D, ) overthedataandtheparameters. Wecaneasilywrite The firstterm on therightisthelikelihoodfunction (see V8 – example on predicting PP complexes). The secondtermisthepriordistributionoverthepossiblevalues in . Itcapturesour initial uncertaintyabouttheparameters. Itcan also captureourpreviousexperiencebeforewestarttheexperiment. Mathematics of Biological Networks

Priors andPosteriors Oncewehavespecifiedthelikelihoodfunctionandtheprior, wecanusethedatatoderivetheposteriordistribution overtheparametersusingBayesrule: The term P(D) isthemarginal likelihoodofthedata whatistheintegrationofthelikelihood over all possibleparameterassignments. Mathematics of Biological Networks

Priors andPosteriors Let usreconsidertheexampleof a multinomialdistribution(MD). Weneedtodescribeouruncertaintyabouttheparametersof MD. The parameterspacecontains all nonnegativevectors such that. As wesawpreviously, thelikelihoodfunctionis Sincetheposterioris a productofthepriorandthelikelihood, itisnaturaltorequirethattheprior also have a form similar tothelikelihood. One such prioristheDirichletdistributionwhichgeneralizes the Beta distribution. Mathematics of Biological Networks

Dirichletdistribution A Dirichletdistributionisspecifiedby a setofhyperparameters1, … K so that Weuse  todenote. If weuse a Dirichletprior, thentheposterioris also Dirichlet: Proposition: If P() isthen P( | D) is , where M[K] isthenumberofoccurrencesofxk. Priors such astheDirichletareusefulsincetheyensurethattheposteriorhas a nicecompactdescriptionandusesthe same representationastheprior. We will see on 2 examplestheeffectsofpriors on posteriorestimates. Mathematics of Biological Networks

Effectof Beta prior on convergenceofposteriorestimates For a givendatasetsize M, weassumethat D contains 0.2 M headsand 0.8 M tails. As theamountof real datagrows, ourestimateconvergestothetrueunderlyingdistribution, regardlessofthestartingpoint. (Left): effectofvaryingpriormeans1´, 0´ for a fixedpriorstrength . (Right): effectofvaryingpriorstrengthfor a fixedpriormean 1´ = 0´= 0.5 Mathematics of Biological Networks

Convergenceofparameterestimate Dottedline: Beta(10,10) Small-dash line: Beta(5,5) Large-dash line: Beta (1,1) → Beta(10,10) haslonger „memory“ about initial conditions Effectof different priors on smoothingtheparameterestimates. Below thegraphisshowntheparticularsequenceoftosses. Solid line: MLE estimate Dashedlines: Bayesianestimateswith different strengthsand uniform priormeans. Mathematics of Biological Networks

Imprinting effects during hematopoietic differentiation? • One of the most well studied developmental systems • Mature cell line models Rathinam and Flavell 2008 Mohamed Hamed (unpublished) Mathematics of Biological Networks

Blood lineages Mohamed Hamed (unpublished) Mathematics of Biological Networks

Motivation I • Identify cellular events that drive cell differentiation and reprogramming • Construct gene-regulatory network (GRN) that governs - transitions between the different states along the developmental cell lines and - pausing at specific states. • Do imprinted genes play a role in regulating differentiation?. Mohamed Hamed (unpublished) Mathematics of Biological Networks

Motivation II Berg, Lin et al. (2011) Real-time PCR analysis of imprinted gene expression in hematopoietic cells Imprinted genes drastically down-regulated in differentiated cells. during the earliest phases of hematopoietic development, imprinted genes may have distinct roles Mohamed Hamed (unpublished) Mathematics of Biological Networks

Imprinted genes • violate the usual rule of inheritance • bi-allelic genes : gene copy (allele) encoding hemoglobin from dad gene copy (allele) encoding hemoglobin from mom Child: expresses equal amounts of the 2 types of hemoglobin • mono-allelic (imprinted) genes : one allele silenced by DNA methylation Mathematics of Biological Networks

Imprinted genes cluster in the genome Mathematics of Biological Networks

Parental conflict hypothesis = “battle of the sexes” Paternally expressed genes Maternally expressed genes embryonicggrowth in placenta embryonic growth in placenta Mathematics of Biological Networks

Mouse Pluripotency network (Plurinet) Pluripotency network in mouse G. Fuellen et al. (2010) based on 177 publications 274 genes 574 stimulations / inhibitions/ and interactions Mathematics of Biological Networks

Gene regulatory network around Oc4 controls pluripotency Tightly interwoven network of 9 transcription factors keeps ES cells in pluripotent state. 6632 human genes have binding site in their promoter region for at least one of these 9 TFs. Many genes have multiple motifs. 800 genes bind ≥ 4 TFs. Mathematics of Biological Networks

Gene expression profiles imprinted pluri hematopoiesis c a b • long and short-term hematopoietic stem cells • Intermediate progenitor populations such as Lymphoid primed multipotent progenitor (LMPP), common lymphoid progenitor (CLP), and granulocyte–monocyte progenitor (GMP), and • Terminally differentiated blood progeny such as NK cells and granulocyte- monocyte (GM). • All 3 gene sets contain genes that are • upregulated either in (1), (2) or (3) stages Mohamed Hamed (unpublished) Mathematics of Biological Networks

Lineage-specific marker genes from all 3 gene sets cluster together red : maternally expressed imprinted genes blue : paternally expressed imprinted genes cyan : pluripotency genes orange: hematopoietic genes Mohamed Hamed (unpublished) Mathematics of Biological Networks

Imprinted gene network (IGN) Aim: explain surprisingly similar expression profiles of 3 gene sets • only 5imprinted genes (Gab1, Ins1, Phf17, Tsix, and Xist) are present in the pluripotency list and • only 3 imprinted genes (Axl, Calcr, and Gnas) belong to the hematopoietic list. Who regulates the imprinted genes? • Identify regulators (TFs) of imprinted genes and target genes regulated by imprinted genes Mohamed Hamed (unpublished) Mathematics of Biological Networks

Mohamed Hamed (unpublished) Johannes Trumm, MSc thesis ,CBI, 2011. Mathematics of Biological Networks

Mebitoo GRN Plugin Johannes Trumm, MSc thesis ,CBI, 2011. Mathematics of Biological Networks

gene sets are (largely) co-expressed andenriched with developmental GO terms Mohamed Hamed (unpublished) Mathematics of Biological Networks

Summary Parameter learning from data is an important research field. We entered into some basics about MLE and Bayesian parameter estimation. Powerful and efficient priors need to be estimated, see Beta function. V11: enter into structure learning. Application example: construct GRN to derive genes that drive hematopoiesis. Intersection with pluripotency and imprinted genes reveals interesting module of co-expressed genes with homogenous involvement in development. Mathematics of Biological Networks

V10: Bayesian Parameter Estimation