1 / 104

U nweaving regulatory networks: Automated extraction from literature and statistical analysis

U nweaving regulatory networks: Automated extraction from literature and statistical analysis. Overview of the talk. Introduction: project participants & jigsaw puzzle analogy Project motivation. Duality of signal transduction language. How the whole system works. Good and ugly graphs.

johana
Download Presentation

U nweaving regulatory networks: Automated extraction from literature and statistical analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unweaving regulatory networks: Automated extraction from literature and statistical analysis

  2. Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • -------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks

  3. Our project is set up as a collaboration of three departments of Columbia University

  4. Interdisciplinary Collaboration: Department of Medical Informatics, Columbia University (Carol Friedman, Pauline Kra, Michael Krauthammer, Yu Hong, Andrey Rzhetsky) Department of Computer Science, Columbia University (Vasileios Hatzivassiloglou, Pablo Ariel Duboue, Wubin Weng) Columbia Genome Center, Columbia University (Pavel Morozov, Tomohiro Koike, Shawn Gomez, Sabina Kaplan, Sergey Kalachikov, Jim Russo, Andrey Rzhetsky)

  5. Studying living organisms is not unlikeplaying with a jigsaw puzzle…

  6. Starting point: before sequence data were available

  7. “Stamp collecting”: some regularities start to emerge...

  8. Defining families of sequences

  9. Beginning assembly of pieces: where we are now

  10. Future

  11. Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • -------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks

  12. Our long-term objective:develop computational tools for automated compilation and analysis of complex cell regulation cascades in vertebrates

  13. Problem/Motivation: Currently a search through the PubMed system with the keywords “cell cycle” and “apoptosis” produced lists of 169,293 and 29,961 articles, respectively. Clearly it is not feasible to scan all these papers “manually” ...

  14. We decided (i) to develop tools for automatic retrieval of binary regulatory relationships between molecules from research literature using techniquesof natural language processing, and (ii) to use extracted knowledgefor editing, visualization, and superimposing/comparing homologous networks.

  15. We call the systemGENIES (GENomics Information Extraction System)

  16. An overview of our system.

  17. Application of techniques of Artificial Intelligence: Natural Language Processing Goal: to identify binary relationships of the form “protein A activates protein B” “protein B inactivates gene C”

  18. Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • --------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks

  19. The language of regulatory pathways have significant differences with the language of metabolic pathways

  20. Representation We represent a pathway a series of overlapping “links” – substance/action/substance triplets Substance A  Substance B  Substance C  Substance D

  21. Duality of actions in signal transduction literature

  22. We realized that the current research literature in molecular biology Describes pathways on two different levels: Logical and Biochemical

  23. A activates BA inactivates BA phoshorylates BA methylates B... logical biochemical

  24. Dualism: in the biochemical representation substance A is not a participant of the action, while it is in thelogicalrepresentation Logical Biochemical

  25. Both logical and biochemical descriptions can be combined in the same sentence: Activated raf-1 phosphorylates and activates mek-1. biochemical logical

  26. The paper descibing a “knowledge model” (=ontology) will appear in Bioinformatics

  27. Ontology paper

  28. We represent a pathway a series of overlapping “links” – substance/action/substance triplets Substance A  Substance B  Substance C  Substance D

  29. “Actions” are relatively scanty:one can provide an exhaustive list of them

  30. Each action comes with a mechanism (biochemical representation) and result (logical representation)

  31. Gene and protein names are numerous (currently >80,000) and the number is growing

  32. MedLEE (by Carol Friedman and colleagues) contains implementation of various grammatical patterns associated with the same verb:A activates B…A is an activator of B…A appeared to activate B…A is activating B…

  33. MedLEE=Medical Language Extraction and Encoding System It is an integral part of Clinical Information Service at Columbia-Presbyterian Medical Center, It routinely processes thousands of patient records a day. MedLEE does semantic analysis of the complete sentence. If it a complete sentence cannot be parsed successfully, MedLEE does re-analysis, trying to extract parts.

  34. For details see, e.g., Friedman, C., G. Hripcsak, W. DuMouchel, S.B. Johnson, and P.D. Clayton. 1995. Natural language processing in an operational clinical system. Natural Language Engineering. 1 (1): 83-108.

  35. Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • --------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks

  36. Term identification:

  37. To give you a feeling of the work of the complete conveyer line…

  38. Consider sentence from an actualScience article

  39. NLP module (term markup + MedLEE) produces • [action, inactivate, [protein,rap1], • [action, activate, [complex,T-cell receptor] • [action, transcribe, [gene, gene encoding interleukin-2]]], • [parsemode, mode1]]

  40. Which is then converted into“shorthand” notation Substance (gene) Action Action on action

  41. Which is then further converted to a format readable by our pathway visualization program LogicalAction{ { UpstreamActionAgent { Protein{ Name{ "IL-3", } } }, DownstreamActionAgent { Complex{ Name{ "IL-3R" } } }, Result{ activation } } } Complex{ Name{ "IL-3R" } Composition{ Protein{ Name{ “IL-3R alpha” } } Protein{ Name{ “IL-3R beta” } } }   } Protein{ Name{ "IL-3", } }

  42. Which is then visualized...

  43. Example of an actual human regulatory network visualized

  44. Corresponding article

  45. Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • --------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks

  46. Drawing a complex graph is a separate problem of Computer Science. We are using a Simulated Annealing Technique to find an optimum graph layout

  47. What is a good pathway graph? • Every gene/protein name is easy to read every • Easy to trace connections between pairs of molecules • Easy to read mechanism and result for each action • Compact • Shows tissue/stage/species/cell line specificity • Beautiful

  48. Human Cell Cycle / Apoptosis Machinery

  49. ~400 nodes

  50. Layered graph layout

More Related