V11: Structure Learning in Bayesian Networks

V11: Structure Learning in Bayesian Networks In lectures V9 and V10 wemadethe strong assumption thatweknow in advancethenetworkstructure. Today, weconsiderthetaskoflearning in situations wherewe do not knowthestructureofthe BN in advance. Instead, we will applythe strong assumptionthatourdatasetisfullyobserved. As before, weassumethatthedata D aregenerated IID from an underlyingdistribution P*(X). Weassumethat P* isinducedbysomeBayesiannetwork G* over X. Mathematics of Biological Networks

Example: cointoss Consider an experimentwherewetoss 2 standardcoins X and Y independently. Wearegiven a datasetwith 100 instancesofthisexperiment andwould like tolearn a modelforthisscenario. A „typical“ datasetmayhavetheentriesfor X and Y: 27 head/head 22 head/tail 25 tail/head 26 tail/tail In thisempiricaldistribution, the 2 coinsareapparently not fullyindependent. (If X = tail, thechancefor Y = tailappearshigherthanif X = head). This isquiteexpectedbecausetheprobabilityoftossing 100 pairsof fair coins andgettingexactly 25 outcomes in eachcategoryisonlyabout 1/1000. Thus, evenifthecoinsareindeedindepdendent, we do not expectthat theobservedempiricaldistribution will satisfyindependence. Mathematics of Biological Networks

Example: cointoss Supposewegetthe same results in a very different situation. Say wescanthesportssectionofourlocalnewspaperfor 100 days andchoose an article at randomeachday. Wemark X = x1iftheword „rain“ appears in thearticleandX = x0otherwise. Similarly, Y denoteswhethertheword „football“ appears in thearticle. Ourintuitionwhetherthetworandom variables areindependentisunclear. Ifwegetthe same empiricalcountsas in thecoinsdescribedbefore, wemightsuspectthatthereissomeweakconnection. In otherwords, itishardtobesurewhetherthetrueunderlying modelhas an edgebetween X and Y or not. Mathematics of Biological Networks

Goals ofthelearningprocess The goaloflearning G* fromthedataishardtoachievebecause • thedatasampledfrom P* aretypicallynoisyand • do not reconstruct P* perfectly. Wecannotdetectwithcompletereliabilitywhich independenciesarepresent in theunderlyingdistribution. Therefore, we must generallymake a decisionaboutourwillingness toinclude in ourlearnedmodeledgesaboutwhichwearelesssure. Ifweincludemoreoftheseedges, we will oftenlearn a modelthatcontainsspuriousedges. Ifweincludefewedges, wemaymiss independencies. The decisionwhichcompromiseisbetterdepends on theapplication. Mathematics of Biological Networks

Independence in cointossexample? Itseemsthatifwe do makemistakes in thestructure, itisbettertohave toomanyratherthantoofewedges. But thesituationismorecomplex. Let‘sgo back tothecoinexampleandassumethatwehadonly 20 casesfor X/Y: 3 head/head 6 head/tail 5 tail/head 6 tail/tail Whenweassume X and Y ascorrelated, we find by MLE (wherewesimplycounttheobservednumbers): P(X = H) = 9/20 = 0.45 P(Y = H | X = H) = 3/9 = 1/3 P(Y = H | X = T) = 5/11 In theindependentstructure (noedgebetween X and Y), P(Y = H) = 8/20 = 0.4 Mathematics of Biological Networks

Noisydata→ prefersparsemodels All oftheseparameterestimatesareimperfect, ofcourse. The ones in themorecomplexmodelaresignificantlymorelikelytobeskewed (dt: verzogen)becauseeachoneisestimatedfrom a muchsmallerdataset. E.g. P(Y = H | X = H) isestimatedfromonly 9 instances comparedto20 instancesfortheestimationof P(Y = H). Note thatthestandardestimationerrorof MLE is. Whendoingdensityestimationfrom limited data, itisoftenbettertoprefer a sparserstructure. Mathematics of Biological Networks

Overviewofstructurelearningmethods Roughlyspeaking, thereare 3 approachestolearning without a prespecifiedstructure: • constraint-basedstructurelearning Finds a modelthatbestexplainsthedependencies/independencies in thedata. (2) Score-basedstucturelearning(today) Wedefine a hypothesisspaceof potential modelsand a scoringfunction thatmeasureshowwellthemodelfitstheobserveddata. Ourcomputationaltaskisthento find thehighest-scoringnetwork. (3) Bayesianmodelaveragingmethods Generates an ensembleofpossiblestructures. Mathematics of Biological Networks

StructureScores We will discuss 2 obviouschoicesofscoringfunctions: • Maximum likelihoodparameters (today) • Bayesianscores (V12) Maximum likelihoodparameters This functionmeasurestheprobabilityofthedatagiven a model. → tryto find a modelthatwouldmakethedataas probable aspossible. A modelis a pair G, G. Ourgoalisto find both a graph G andparametersG thatmaximizethelikelihood. Mathematics of Biological Networks

Maximum likelihoodparameters In V9 wedeterminedhowtomaximizethelikelihoodfor a givenstructure G. We will simplyusethesemaximumlikelihoodparametersforeach graph. To find themaximumlikelihood(G, G) pair, weshould find thegraphstructure G thatachievesthehighestlikelihoodwhenweusethe MLE parametersfor G. Wedefine whereisthelogarithmofthelikelihoodfunction andarethemaximumlikelihoodparametersfor G. Mathematics of Biological Networks

Maximum likelihoodparameters Let usconsideragainthescenarioofthe 2 coins. In model G0, X and Y areassumedtobeindependent. In thiscase, weget In model G1, weassume a dependencymodelledbythearc X → Y. Weget whereisthemaximumlikelihoodestimatefor P(x) andisthemaximumlikelihoodestimatefor P(y | x). The scoresofthetwomodelsshare a commoncomponent, thefirstterm. Mathematics of Biological Networks

Maximum likelihoodparameters Thus, wecanwritethedifferencebetweenthetwoscoresas: Bycountinghowmanytimeseachconditionalprobabilityparameterappears in thisterm, wecanchangethesummationindexandwritethisas: Letbetheempiricaldistributionobserved in thedata; thatisistheempiricalfrequencyofx,yin D. Thenwecanwriteand Moreover, and Mathematics of Biological Networks

Mutual information We get whereisthemutual informationbetween X and Y in thedistribution. The likelihoodofmodel G1thusdepends on the mutual informationbetween X and Y. Note thathigher mutual informationimplies strongerdependency. Thus, strongerdependencyimpliesstrongerpreference forthemodelwhere X and Y depend on eachother. Can thisbegeneralizedtogeneralnetworkstructures? Mathematics of Biological Networks

Decompositionof Maximum likelihoodscores Proposition: thelikelihood score decomposesasfollows: Proof: omitted The likelihoodof a networkmeasuresthestrength ofthedependenciesbetween variables andtheirparents. Weprefernetworkswheretheparentsofeach variable are informative about it. Mathematics of Biological Networks

Maximum likelihoodparameters Wecan also express thisresult in a complementarymanner. Corollorary: Let X1, …, Xnbe an orderingofthe variables thatisconsistentwithedges in G. Then The firstterm on theright-hand-sidedoes not depend on thestructure, but thesecondtermdoes. The secondterminvolvesconditional mutual-information of variable Xiandthepreceding variables giventheparentsofXi. Mathematics of Biological Networks

Limitationsofthemaximumlikelihood score The likelihood score is a goodmeasure ofthe fit oftheestimated BN andthetrainingdata. Important, however, istheperformanceofthelearnednetwork on newinstances. Itturns out thatthe MLE score neverprefers simpler networksovermorecomplexnetworks. The optimal ML network will alwaysbe a fullyconnectednetwork. Thus, the ML overfitsthetrainingdata. The modeloftendoes not generalizewelltonewdatacases. Mathematics of Biological Networks

Structuresearch The inputtotheoptimizationproblemis: - trainingset D - scoringfunction (includingpriors, ifneeded) - a set G ofpossiblenetworkstructures Ourdesiredoutput: - a networkstructure (fromthesetofpossiblestructures) thatmaximizesthe score. Weassumethatwecandecomposethe score ofthenetworkstructure G: asthesumoffamilyscores. FamScore(X | U : D) is a score measuringhowwell a setof variables U servesasparentsof X in thedataset D. Mathematics of Biological Networks

Tree-structurenetworks Structurelearningoftree-structurenetworksisthesimplestcase. Definition: in tree-structurednetworkstructuresG, each variable X has at mostoneparent in G. The advantageoftreesisthattheycanbelearnedefficiently in polynomial time. Learning a treemodelisoftenusedas a startingpoint forlearningmorecomplexstructures. Key propertiestobeused: - decomposabilityofthe score - restriction on thenumberofparents Mathematics of Biological Networks

Score oftreestructure Instead ofmaximizingthe score of a treestructure G, we will trytomaximizethedifferencebetweenitsscore andthe score oftheemptystructure G0. issimply a sumofterrmsforeachXi. Thatisthe score ofXiifitdoes not haveanyparents. The score consistsofterms. Nowthereare 2 cases: If, thenthetermsforXiin bothscorescancel out. Mathematics of Biological Networks

Structuresearch If weareleftwiththedifferencebetweenthe 2 terms. Ifwedefinetheweight thenweseethat (G) isthesumofweights on pairs, such that in G Wehavethustransformedourproblemtooneoffinding a maximumweightspanningforestin a directedweighted graph. Mathematics of Biological Networks

General case For a generalmodeltopologythatmay also involvecycles, westartbyconsideringthesearchspace. Wecanthinkofthesearchspaceas a graph overcandidatesolutions(different networktopologies). Nodes areconnectedbyoperatorsthattransformonenetworkintotheotherone. Ifeachstatehasfewneighbors, thesearchprocedureonly hastoconsiderfewsolutions at eachpointofthesearch. However, thesearchfor an optimal (or high-scoring) solution maybelongandcomplex. On theotherhand, ifeachstatehasmanyneighbors, thesearchmayinvolveonly a fewsteps, but itmaybedifficulttodecide at eachstepwhichpointtotake. Mathematics of Biological Networks

General case A good trade-off forthisproblemchoosesreasonablyfewneighborsforeachstate but ensuresthatthe „diameter“ ofthesearchspaceremainssmall. A naturalchoicefortheneighborsof a stateis a setofstructures thatareidenticaltoitexceptforsmall „local“ modifications. We will considerthefollowingsetofmodificationsofthissort: • Edge addition • Edge deletion • Edge reversal The diameterofthesearchspaceisthen at mostn2. (Wecouldfirstdelete all edges in G1that do not appear in G2andthenaddtheedgesof G2thatare not in G1. This isboundedbythe total numberofpossibleedgesn2). Mathematics of Biological Networks

Examplerequiringedgedeletion • Original modelthatgeneratedthedata • and (c) Intermediate networksencounteredduringthesearch. Let‘sassumethat A ishighly informative about B and C. Whenstartingfrom an emptynetwork, edges A → B and A → C wouldbeaddedfirst. Sometimes, wemay also addtheedge A → D. Since A is informative about B and C (whicharetheparentsof D), A is also informative about D -> (b) Later, wemay also addtheedges B → D and C → D-> (c) Node A cannotprovide additional information on top ofwhat B and C convery. Deleting A → D thereforemakesthe score optimal. Mathematics of Biological Networks

Examplerequiringedgereversal • Original network, (b) and (c) intermediate networksduringsearch. (d) Undesirableoutcome. Whenaddingedges, we do not knowtheirdirectionbecausebothdirectionsgivethe same score. This iswhereedgereversalhelps. In situation (c) werealizethat A and B togethershouldmakethebestpredictionof C. Therefore, weneedtoreversethedirectionofthearcpointing at A. Mathematics of Biological Networks

Who controls thedevelopmental status of a cell? We will continuewithstructurelearning in V12 Mathematics of Biological Networks

Transcription http://www.berkeley.edu/news/features/1999/12/09_nogales.html a Mathematics of Biological Networks

Preferred transcription factor binding motifs DNA-binding domain of a glucocorticoid receptor from Rattus norvegicus with matching DNA fragment. www.wikipedia.de Chen et al., Cell 133, 1106-1117 (2008) Mathematics of Biological Networks

Gene regulatory network around Oc4 controls pluripotency Tightly interwoven network of 9 transcription factors keeps ES cells in pluripotent state. 6632 human genes have binding site in their promoter region for at least one of these 9 TFs. Many genes have multiple motifs. 800 genes bind ≥ 4 TFs. Kim et al. Cell 132, 1049 (2008) Mathematics of Biological Networks

Complex of transcription factors Oct4 and Sox2 Idea: Check forconserved transcriptionfactorbindingsites in mouseand human www.rcsb.org Mathematics of Biological Networks

Combined binding of Oct4, Sox2 and Nanog The combination of OCT4, SOX2 and NANOG influences conservation of binding events. (A) Bars indicate the fraction of loci where binding of Nanog, Sox2, Oct4 or CTCF can be observed at the orthologous locus in mouse ES cells for all combinations of OCT4, SOX2 and NANOG in human ES cells as indicated by the boxes below. Dark boxes indicate binding, white boxes indicate no binding (‘‘AND’’ relation). Combinatorial binding of OCT4, SOX2 and NANOG shows the largest fraction of conserved binding for Oct4, Sox2 and Nanog in mouse. Göke et al., PLoSComputBiol 7, e1002304 (2011) SS 2014 - lecture 11 Mathematics of Biological Networks

Increased Binding conservation in ES cells at developmental enhancers Fraction of loci where binding of Nanog, Sox2, Oct4 and CTCF can be observed at the orthologous locus in mouse ESC. Combinations of OCT4, SOX2 and NANOG in human ES cells are discriminated by developmental activity as indicated by the boxes below. Dark boxes : ‘‘AND’’ relation, light grey boxes with ‘‘v’’ : ‘‘OR’’ relation, ‘ ‘?’’ : no restriction. Combinatorial binding events at develop-mentally active enhancers show the highest levels of binding conservation between mouse and human ES cells. Göke et al., PLoS Comput Biol 7, e1002304 (2011) SS 2014 - lecture 11 Mathematics of Biological Networks

Transcriptional activation looping factors Mediator DNA-looping enables interactions for the distal promotor regions, Mediator cofactor-complex serves as a huge linker Mathematics of Biological Networks

coactivators corepressor TFs cis-regulatory modules TFs are not dedicated activators or respressors! It‘s the assembly that is crucial. IFN-enhanceosome from RCSB Protein Data Bank, 2010 Mathematics of Biological Networks

Aim: identify Protein complexes involving transcription factors Borrow idea from ClusterOne method: Identify candidates of TF complexes in protein-protein interaction graph by optimizing the cohesiveness Thorsten Will, Master thesis (accepted for ECCB 2014) Mathematics of Biological Networks

DDI model “domain-level“ network transition to domain-domain interactions domain interactions can reveal which proteins can bind simultaneously if interactions per domain are restricted “protein-level“ network Ozawa et al., BMC Bioinformatics, 2010 Ma et al., BBAPAP, 2012 SS 2014 - lecture 11 Mathematics of Biological Networks

underlying domain-domain representation of PPIs Assumption: every domain can only participate in one interaction. Green proteins A, C, E form actual complex. Their red domains are connected by the two green edges. B and D are incident proteins. They could form new interactions (red edges) with unused domains (blue) of A, C, E Mathematics of Biological Networks

data source used: Yeast Promoter Atlas, PPI and DDI Mathematics of Biological Networks

Daco identifies far more TF complexes than other methods Mathematics of Biological Networks

Examples of TF complexes – comparison with ClusterONE Green nodes: proteins in the reference that were matched by the prediction red nodes: proteins that are in the predicted complex, but not part of the reference. Mathematics of Biological Networks

Performance evaluation Mathematics of Biological Networks

Are target genes of TF complexes co-expressed? Mathematics of Biological Networks

Functional role of TF complexes Mathematics of Biological Networks

Complicated regulation of Oct4 Kellner, Kikyo, Histol Histopathol 25, 405 (2010) Mathematics of Biological Networks

V11: Structure Learning in Bayesian Networks

V11: Structure Learning in Bayesian Networks

Presentation Transcript

Bayesian models of inductive learning

Bayesian models of inductive learning

Computer Science CPSC 502 Uncertainty Probability and Bayesian Networks (Ch. 6)

Naïve Bayes

Learning linguistic structure

Chapter 11 Supervised Learning: STATISTICAL METHODS

Vehicular Ad hoc Networks (VANET)

Learning linguistic structure

Inferring gene regulatory networks with non-stationary dynamic Bayesian networks

Building Bayesian Networks

Feed-Forward Neural Networks

Interconnection Networks

Introduction to Bayesian Learning

Chapter 05 Ad Hoc Networks

Introduction to Radial Basis Function Networks

Sociolinguistics Chapter 8 Ethnicity and Social Networks

Inference in Bayesian Networks

Fundamentals of Bayesian Inference

Support Vector Machines and Radial Basis Function Networks

Mini-course on Artificial Neural Networks and Bayesian Networks

No Escaping Mobile Learning

Connecting in a Globalized World through Social Media