1 / 36

ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING. BACKGROUND. Completion of sequencing projects Need for functional discovery Emerging area of study: Large scale genomic analysis Similarity of living systems. GENETIC NETWORKS. Modelling genetic networks

byronp
Download Presentation

ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING

  2. BACKGROUND • Completion of sequencing projects • Need for functional discovery • Emerging area of study: Large scale genomic analysis • Similarity of living systems

  3. GENETIC NETWORKS • Modelling genetic networks • Interaction of genes and proteins • Relationship between topology and function

  4. MOTIVATION • Common biological processes • Comparison of networks • Discovering missing interactions • Discovering missing genes

  5. mge236 mge336 mge313 mge310 mge314 mge312 mge235 mge337 mpn133 mpn134 mpn145 mpn141 mpn124 mpn132 mge234 GRAPH MATCHING G1 Search-based Algorithm Pruning Techniques G2

  6. ROADMAP • Scale-Free Networks • Modelling Genetic Networks • Graph Matching • Algorithm • Results

  7. SCALE-FREE NETWORKS

  8. COMPLEX NETWORKS • Small-world model • WWW • Human acquaintances network • Citation networks • Biological networks

  9. SMALL-WORLD • Features: • Characteristic path length • Clustering coefficient • Sparseness

  10. SMALL-WORLD • Somewhere in between regular & random graphs

  11. SMALL-WORLD • Highly clustered • Short diameter

  12. SCALE-FREE NETWORKS • Complex networks: biological, social, www, power grid, citation etc. • Power low connectivity: P(k) = k -a • Hubs - authorities

  13. SCALE-FREE NETWORKS • Application for testing scale free behavior • Yeast • Helicobacter Pylori • Mycoplasma Pnuemonia • Mycoplasma Genitelium • Linear log-log graph • Slope = a

  14. SCALE-FREE NETWORKS • Slope is calculated by least mean square method

  15. TOPOLOGY & FUNCTIONALITY • Small diameter • ease of dissemination of information • ease of restoring after disturbance • Cliquishness • Alternate paths are found • Heterogeneity • Random removal does not effect the network • Hubs are vulnerable to attack

  16. BIOLOGICAL ASPECTS • Multifunctionality • Grouped into functional units • Stability • Reason: Most of the interactions are between hubs and authorities

  17. MODELLING GENETIC NETWORKS

  18. TYPES OF GENETIC NETWORKS • Categorized by data sources • Metabolic pathways • Gene expression arrays • Protein interactions • Gene interactions

  19. INTERACTION MAPS • High level perspective • Nodes: Genes or proteins • Edges: Presence of an interaction • Data sources • Two-hybrid analysis • Fusion analysis • Chromosomal proximity • Phylogenetic analysis

  20. GRAPH MATCHING

  21. PROBLEM DEFINITION Attributed Relational Graph (ARG) G = { V, E, X}. V = {v1, v2, …, vn} Nodes E = {e1, e2, …, em} Edges X = {x1, x2,…,xn} Attributes

  22. INEXACT SUBGRAPH MATCHING Allow for : • Mismatching attribute values • Missing nodes • Missing links Also called error-correcting subgraph isomorphism NP-Complete

  23. SEARCH TECHNIQUES • Cost function • Pruning (Structure Constraints) • Backtracking

  24. ATTRIBUTED GRAPH MATCHING TOOL

  25. ATTRIBUTE MATCHING • Amino Acid Sequence Content Composition • array of 20, percentage of each aa • Amino acid grouped into classes: array of 6 • Amino acid triples grouped into classes: array of 216 MKVLNKNEL 6 x 6 x 6

  26. ATTRIBUTE MATCHING Difference in amino acid composition values of gene pairsfor M.Genitalium and M. Pneumoniae. Score observations

  27. STRUCTURAL CONSTRAINTS • Effect of scale-free behaviour • Connectivity information: Highly heterogeneous, thus start with most connected and work around it • Pruning strategy: comparibility is determined by power low

  28. STRUCTURAL CONSTRAINTS • Neigborhood connectivity • Choose the neighbor at the next stage • Backtracking • Component by component • Go back to the neighbor with the most connectivity within the component

  29. TEST CASE • Mycoplasma Genitalium: • smallest genome (470 ORFs) • Mycoplasma Pnuemoniae: • Very similar, superset (688 ORFs)

  30. TEST CASE... • Mycoplasma Genitalium: • 232 nodes • 211 links • Mycoplasma Pnuemoniae: • 267 nodes • 257 links • Inputs: • MGE links • MPN links • MGE synonyms • MPN synonyms • MGE amino acid sequence • MPN amino acid sequence

  31. RESULTS MGE MPN

  32. DISCOVERY OF MISSING DATA • Missing link • Link between in MPN632 and MPN637 is missing in our data but exists in literature

  33. DISCOVERY OF MISSING DATA • Missing node with known COG MPN236--- MPN237---MPN238---MPN678 MG098 ----MG099-----MG100----MG459 MG459 is ortholog of MPN678

  34. DISCOVERY OF MISSING DATA • Missing node without known ortholog

  35. CONCLUSION • Large-scale genomics • Interaction data captures system structure and dynamics • Graph matching exploits the scale-free characteristics • Novel interactions and genes can be identified

  36. ACKNOWLEDGEMENT • YASEMİN TÜRKELİ

More Related