1 / 66

BeeSpace Informatics: Interactive System for Functional Analysis

BeeSpace Informatics: Interactive System for Functional Analysis. Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign www.beespace.uiuc.edu Fifth Annual Project Workshop IGB, Urbana IL May 22, 2009. Behavioral. Molecular. Biologist. Biologist.

Albert_Lan
Download Presentation

BeeSpace Informatics: Interactive System for Functional Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BeeSpace Informatics:Interactive System for Functional Analysis Bruce Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign www.beespace.uiuc.edu Fifth Annual Project Workshop IGB, Urbana IL May 22, 2009

  2. Behavioral Molecular Biologist Biologist Molecular Biology Literature Brain Gene Bee Bee Expression Literature Genome Profiles Flybase, Brain Region WormBase Localization Neuroscience Literature Neuro- scientist Concept Navigation in BeeSpace

  3. Informatics: From Bases to Spaces data Bases support genome data e.g. FlyBase has sequences and maps Genes annotated by GeneOntology and linked to biological literature information Spaces support biological literature e.g. BeeSpace uses automatically generated conceptual relationships to navigate functions

  4. System Architecture BeeSpace Concepts Concepts SEQ Expressions Expressions Databases Bees Flies Documents Documents SEQ Community Community

  5. System Versions • V1 Filter Concept Graph • Search, Expand, Merge, Switch, Visualize • V2 Cluster Conceptual Groupings • Small Worlds (Natural), Language Model (Steerable), Concepts/Documents • V3 Summarize Gene Descriptions • Gene Extraction, Sentence Classification • V4 Analyze Functional Concepts • Concept Identification, Category Grouping • V5 Answer Entity Relationships • Entities, Relations, Templates

  6. Informatics Researchers (Faculty) Investigators: • Bruce Schatz, systems (Medical Information Science) • ChengXiang Zhai, algorithms (Computer Science) Collaborators (students): • Saurabh Sinha, Computer Science • Jiawei Han, Computer Science • Sheng Zhong, Bioengineering • Nathan Price, Chemical & Biomolecular Engineering Collaborators (advices): • John MacMullen, Library & Information Science • Dan Roth, Computer Science • Roxana Girju, Linguistics • Karrie Karahalios, Computer Science

  7. Informatics Researchers (Staff) • V1-V3 • Todd Littell, research programmer • Jim Buell, research coordinator • Nyla Ismail, biology postdoc • Moushumi Sen Sarma, biology postdoc • V4-V5 • David Arcoleo, research programmer • Barry Sanders, research programmer • Moushumi Sen Sarma, biology postdoc • Radhika Khetani, biology postdoc

  8. Informatics Researchers (Students) V1 Filter (parse) Jing Jiang, Azadeh Shakery, Yuanhua Lv V2 Cluster (group) Brant Chee, Qiaozhu Mei, Peixiang Zhao V3 Summarize (classify) Xu Ling, Jing Jiang, Qiaozhu Mei, Xin He V4 Analyze (annotate) Xin He, Brant Chee, Moushumi Sarma, Xu Ling V5 Answer (extract) Xu Ling, Xin He, Yanen Li, Yue Lu

  9. Analysis Environment: Features SPACE is a Paradigm not a Metaphor! Point of View for YOUR Problem Externally: -Dynamically describe custom Region of Space -Merge Regions to form Hypothesis Space -Differentially express genes against Space

  10. Analysis Environment: System Concepts and Genes are Universal Entities! Uniformly Represented Uniformly Manipulated Internally: -Extract and Index Concepts within Collections -Navigate Concepts within Documents -Follow Genes from Documents into Databases

  11. Automatic Categorization v2 Sorting of Spaces based on Metadata Sorting of Spaces based on Ontology MeSH for Medline Abstracts Gene Ontology computed for documents Sorting of Spaces based on Clustering Natural Maps from Small Worlds Steerable Maps from Language Models Semantic Indexing of Dynamic Spaces Fast System enables Interactive Sorting!

  12. Small World Graph

  13. Semantics Deeper and Faster Semantic Indexing across all of Medline Previous Attempts used Word Co-Occurrence Now Phrase Parser works general-purpose Now Mutual Information full differential Parallel Optimization of MI Graph Real-time Computation Shared Memory Cluster Interactive on our 16PC 256GB RAM workerbee Dynamic Spaces then Dynamic Semantic Indexing Interactive Clustering Natural Map Heuristic Approximation Small Worlds Graphs

  14. Dynamic Clustering Community Structure enables Dynamic Clustering with Large Vectors

  15. Automatic Curation v3 Automatic Summarization of Genes Retrieve relevant sentences about gene Classify sentences into important aspects protein domain, homolog/ortholog expression pattern, phenotype function regulatory element, genetic interaction Generalizing to Biology Entities Genes, anatomical, behavior, chemical Question answering from biology factoids Computed Curation from Literature

  16. Gene Summary (FlyBase) GP EL SI GI MP WFPI

  17. Gene Summary (BeeSpace) Structured summary consists of relevant sentences covering 6 aspects of a gene Gene Products (GP) Expression Location (EL) Sequence Information (SI) Wild-type Function & Phenotypic Information (WFPI) Mutant Phenotype (MP) Genetical Interaction (GI)

  18. Drosophila gene Abelson (Abl) tyrosine kinase

  19. Tribolium gene Scr

  20. Gene Summarizer New Aspects New categories (proposed by FlyBase curators) GP + SI => PS (protein domain or structure) SI => HO (homologs or orthologs) EL => EP (spatial/temporal expression patterns) SI => RE (regulatory element information) WFPI + MP => PF (wild-type or mutant phenotype and function) GI => IT (genetic or physical interaction) New (beyond FlyBase) => PG (population genetics) Utilize cross-domain information for improving the GS on other organisms.

  21. BeeSpace System v3 SPACES and REGIONS Dynamic and Relative Space is collection of documents Region is collection of terms • Extract creates new Region from old Space • Map creates new Space from old Region • New from Old Spaces and Regions via merges • Summarize classifies Gene within Space • Annotate finds differential functional expression

  22. BeeSpace Semantic Operations • Merge (S1,S2) into S3 • Summarize (S) into Gene classify Extract S R Map S R

  23. New Interface v4 Single Window, Multiple Panes Space Panel, Service Tabs SPACES custom, system FILTER searching, sorting CLUSTER map natural and steerable SUMMARIZE categorize using space ANALYZE annotate using space

  24. Functional Analysis v4 The software system goes beyond a searchable database, using statistical literature analyses to discover functional relationships between genes and behavior. This research will enable all scientists who study bee genes to live on the frontier of integrative biology, where biotechnology enables routine expression analysis and bioinformatics enables functional analysis unconstrained by pre-existing categories. Genelist Analyzer v4 -Differential Expression of Gene Names against Space -Background is custom made Literature Space -Produces Concept List from Gene List -Analyze using Concept Navigation and Gene Summarization

More Related