1 / 71

Diseases

Genes. Diseases. Diseases. Diseases. Physiology. Diseases. Physiology. Genes. Genes. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Anatomy.

adolph
Download Presentation

Diseases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genes Diseases Diseases Diseases Physiology Diseases Physiology Genes Genes Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Anatomy Genes Genes Genes Genes Genes Genes Novel relationships & Deeper insights Medical Informatics Bioinformatics

  2. Integrative Genomics For Understanding Disease Process Anil Jegga Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center (CCHMC) Department of Pediatrics, University of Cincinnati Cincinnati, Ohio - 45229 Anil.Jegga@cchmc.org

  3. Acknowledgement • Jing Chen • Mrunal Deshmukh • Sivakumar Gowrisankar • Chandra Gudivada • Arvind Muthukrishnan • Bruce J Aronow

  4. Two Separate Worlds….. Disease World Genome Variome Transcriptome Regulome Proteome • Name • Synonyms • Related/Similar Diseases • Subtypes • Etiology • Predisposing Causes • Pathogenesis • Molecular Basis • Population Genetics • Clinical findings • System(s) involved • Lesions • Diagnosis • Prognosis • Treatment • Clinical Trials…… Interactome Pharmacogenome Metabolome Physiome Pathome Medical Informatics Bioinformatics PubMed Disease Database Patient Records OMIM Clinical Synopsis Clinical Trials 354 “omes” so far……… and there is “UNKNOME” too - genes with no function known http://omics.org/index.php/Alphabetically_ordered_list_of_omics (as on October 15, 2006) With Some Data Exchange…

  5. Motivation To correlate diseases with anatomical parts affected, the genes/proteins involved, and the underlying physiological processes (interactions, pathways, processes). In other words, bringing the disciplines of Medical Informatics (MI) and BioInformatics (BI) together (Biomedical Informatics - BMI) to support personalized or “tailor-made” medicine. How to integrate multiple types of genome-scale data across experiments and phenotypes in order to find genes associated with diseases

  6. Model Organism Databases: Common Issues • Heterogeneous Data Sets - Data Integration • From Genotype to Phenotype • Experimental and Consensus Views • Incorporation of Large Datasets • Whole genome annotation pipelines • Large scale mutagenesis/variation projects (dbSNP) • Computational vs. Literature-based Data Collection and Evaluation (MedLine) • Data Mining • extraction of new knowledge • testable hypotheses (Hypothesis Generation)

  7. Support Complex Queries • Get me all genes involved in brain development that are expressed in the Central Nervous System. • Get me allgenesinvolved in brain developmentinhumanandmouse that also showiron ion binding activity. • For this set of genes, what aspects of function and/or cellular localization do they share? • For this set of genes, what mutations are reported to cause pathological conditions?

  8. Bioinformatic Data-1978 to present • DNA sequence • Gene expression • Protein expression • Protein Structure • Genome mapping • SNPs & Mutations • Metabolic networks • Regulatory networks • Trait mapping • Gene function analysis • Scientific literature • and others………..

  9. Human Genome Project – Data Deluge No. of Human Gene Records currently in NCBI: 31507 (excluding pseudogenes, mitochondrial genes and obsolete records). Includes ~460 microRNAs NCBI Human Genome Statistics – as on October 18, 2006

  10. The Gene Expression Data Deluge Till 2000: 413 papers on microarray! Problems Deluge! Allison DB, Cui X, Page GP, Sabripour M. 2006. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 7(1): 55-65.

  11. Information Deluge….. A researcher would have to scan 130 different journals and read 27 papers per day to follow a single disease, such as breast cancer (Baasiri et al., 1999 Oncogene 18: 7958-7965). • 3 scientific journals in 1750 • Now - >120,000 scientific journals! • >500,000 medical articles/year • >4,000,000 scientific articles/year • >16 million abstracts in PubMed derived from >32,500 journals • >4.5 billion distinct web pages indexed by Google! • Google Search for integrative genomics: ~930,000 hits • “integrative genomics”: ~112,000 hits

  12. Data-driven Problems….. • How to name or describe proteins, genes, drugs, diseases and conditions consistently and coherently? • How to ascribe and name a function, process or location consistently? • How to describe interactions, partners, reactions and complexes? Some Solutions • Develop/Use controlled or restricted vocabularies (IUPAC-like naming conventions, HGNC, MGI, UMLS, etc.) • Create/Use thesauruses, central repositories or synonym lists (MeSH, UMLS, etc.) • Work towards synoptic reporting and structured abstracting • Generally, the names refer to some feature of the mutant phenotype • Dickie’s small eye (Thieler et al., 1978, Anat Embryol (Berl), 155: 81-86) is now Pax6 • Gleeful: "This gene encodes a C2H2 zinc finger transcription factor with high sequence similarity to vertebrate Gli proteins, so we have named the gene gleeful (Gfl)." (Furlong et al., 2001, Science 293: 1632) What’s in a name! Rose is a rose is a rose is a rose! Gene Nomenclature • Disease names • Mobius Syndrome with Poland’s Anomaly • Werner’s syndrome • Down’s syndrome • Angelman’s syndrome • Creutzfeld-Jacob disease • Accelerin • Antiquitin • Bang Senseless • Bride of Sevenless • Christmas Factor • Cockeye • Crack • Draculin • Dickie’s small eye • Draculin • Fidgetin • Gleeful • Knobhead • Lunatic Fringe • Mortalin • Orphanin • Profilactin • Sonic Hedgehog

  13. and there are some weird ones too…….. • AR*E: aryl sulfatase E in all species • f**K: fuculokinase gene in bacteria Some more ambiguous examples…….. • The yeast homologue of the human gene PMS1, which codes for a DNA repair protein, is called PMS2; whereas yeast PMS1 corresponds to human PMS2! • Even more confusing, 4,257 abbreviated names were used to refer to more than one gene. Top of the list was MT1, used to describe at least 11 members of a cluster of genes encoding small proteins that bind to metal ions (Nature: 411: 631-632).

  14. Rose is a rose is a rose is a rose….. Not Really! What is a cell? • any small compartment • (biology) the basic structural and functional unit of all organisms; they may exist as independent units of life (as in monads) or may form colonies or tissues as in higher plants and animals • a device that delivers an electric current as a result of chemical reaction • a small unit serving as part of or as the nucleus of a larger political movement • cellular telephone: a hand-held mobile radiotelephone for use in an area divided into small sections, each with its own short-range transmitter/receiver • small room in which a monk or nun lives • a room where a prisoner is kept Image Sources: Somewhere from the internet…

  15. Semantic Groups, Types and Concepts: • Semantic Group Biology – Semantic Type Cell • Semantic Groups ObjectORDevices – Semantic Types Manufactured Device or Electrical Device or Communication Device • Semantic Group Organization – Semantic Type Political Group Foundation Model Explorer

  16. HEPATOCELLULAR CARCINOMA SOMATIC [ARG249SER] CTNNB1 TP53* MET Hepatocellular Carcinoma TP53 aflatoxin B1, a mycotoxin induces a very specific G-to-T mutation at codon 249 in the tumor suppressor gene p53. Environmental Effects • COLORECTAL CANCER [3-BP DEL, SER45DEL] • COLORECTAL CANCER [SER33TYR] • PILOMATRICOMA, SOMATIC [SER33TYR] • HEPATOBLASTOMA, SOMATIC [THR41ALA] • DESMOID TUMOR, SOMATIC [THR41ALA] • PILOMATRICOMA, SOMATIC [ASP32GLY] • OVARIAN CARCINOMA, ENDOMETRIOID TYPE, SOMATIC [SER37CYS] • HEPATOCELLULAR CARCINOMA SOMATIC [SER45PHE] • HEPATOCELLULAR CARCINOMA SOMATIC [SER45PRO] • MEDULLOBLASTOMA, SOMATIC [SER33PHE] The REAL Problems Many disease states are complex, because of many genes (alleles & ethnicity, gene families, etc.), environmental effects (life style, exposure, etc.) and the interactions.

  17. ALK in cardiac myocytes • Cell to Cell Adhesion Signaling • Inactivation of Gsk3 by AKT causes accumulation of b-catenin in Alveolar Macrophages • Multi-step Regulation of Transcription by Pitx2 • Presenilin action in Notch and Wnt signaling • Trefoil Factors Initiate Mucosal Healing • WNT Signaling Pathway • HEPATOCELLULAR CARCINOMA • LIVER: • Hepatocellular carcinoma; • Micronodular cirrhosis; • Subacute progressive viral hepatitis • NEOPLASIA: • Primary liver cancer • CBL mediated ligand-induced downregulation of EGF receptors • Signaling of Hepatocyte Growth Factor Receptor CTNNB1 MET • Estrogen-responsive protein Efp controls cell cycle and breast tumors growth • ATM Signaling Pathway • BTG family proteins and cell cycle regulation • Cell Cycle • RB Tumor Suppressor/Checkpoint Signaling in response to DNA damage • Regulation of transcriptional activity by PML • Regulation of cell cycle progression by Plk3 • Hypoxia and p53 in the Cardiovascular system • p53 Signaling Pathway • Apoptotic Signaling in Response to DNA Damage • Role of BRCA1, BRCA2 and ATR in Cancer Susceptibility….Many More….. TP53 The REAL Problems

  18. Hypothesis DATA INFORMATION KNOWLEDGE Information is not knowledge - Albert Einstein Integrative Genomics - what is it?Another buzzword or a meaningful concept useful for biomedical research? Acquisition, Integration, Curation, and Analysis of biological data Integrative Genomics: the study of complex interactions between genes, organism and environment, the triple helix of biology. Gene <–> Organism <-> Environment It is definitely beyond the buzzword stage - Universities now have programs named 'Integrated Genomics.'

  19. Methods for Integration • Link driven federations • Explicit links between databanks. • Warehousing • Data is downloaded, filtered, integrated and stored in a warehouse. Answers to queries are taken from the warehouse. • Others….. Semantic Web, etc………

  20. Link-driven Federations • Creates explicit links between databanks • query: get interesting results and use web links to reach related data in other databanks • Examples: NCBI-Entrez, SRS

  21. http://www.ncbi.nlm.nih.gov/Database/datamodel/

  22. http://www.ncbi.nlm.nih.gov/Database/datamodel/

  23. http://www.ncbi.nlm.nih.gov/Database/datamodel/

  24. http://www.ncbi.nlm.nih.gov/Database/datamodel/

  25. http://www.ncbi.nlm.nih.gov/Database/datamodel/

  26. Querying Entrez-Gene

  27. Link-driven Federations • Advantages • complex queries • Fast • Disadvantages • require good knowledge • syntax based • terminology problem not solved

  28. Data Warehousing Data is downloaded, filtered, integrated and stored in a warehouse. Answers to queries are taken from the warehouse. • Advantages • Good for very-specific, task-based queries and studies. • Since it is custom-built and usually expert-curated, relatively less error-prone. • Disadvantages • Can become quickly outdated – needs constant updates. • Limited functionality – For e.g., one disease-based or one system-based.

  29. Gene World Biomedical World No Integrative Genomics is Complete without Ontologies • Gene Ontology (GO) • Unified Medical Language System (UMLS)

  30. The 3 Gene Ontologies • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • What a product ‘does’, precise activity • Biological Process = biological goal or objective • broad biological goals, such as dna repair or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Biological objective, accomplished via one or more ordered assemblies of functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme • ‘is located in’ (‘is a subcomponent of’ ) http://www.geneontology.org

  31. Example: Gene Product = hammer Function (what)Process (why) Drive a nail - into wood Carpentry Drive stake - into soilGardening Smash a bugPest Control A performer’s juggling objectEntertainment http://www.geneontology.org

  32. GO term associations: Evidence Codes • ISS: Inferred from sequence or structural similarity • IDA: Inferred from direct assay • IPI: Inferred from physical interaction • TAS: Traceable author statement • IMP: Inferred from mutant phenotype • IGI: Inferred from genetic interaction • IEP: Inferred from expression pattern • ND: no data available http://www.geneontology.org

  33. What can researchers do with GO? • Access gene product functional information • Find how much of a proteome is involved in a process/ function/ component in the cell • Map GO terms and incorporate manual annotationsinto own databases • Provide a link between biological knowledge and • gene expression profiles • proteomics data • Getting the GO and GO_Association Files • Data Mining • My Favorite Gene • By GO • By Sequence • Analysis of Data • Clustering by function/process • Other Tools And how?

  34. http://www.geneontology.org/

  35. Open biomedical ontologies http://obo.sourceforge.net/

  36. Unified Medical Language System Knowledge Server– UMLSKShttp://umlsks.nlm.nih.gov/kss/ • The UMLS Metathesaurus contains information about biomedical concepts and terms from many controlled vocabularies and classifications used in patient records, administrative health data, bibliographic and full-text databases, and expert systems. • The Semantic Network, through its semantic types, provides a consistent categorization of all concepts represented in the UMLS Metathesaurus. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain. • The SPECIALIST Lexicon is an English language lexicon with many biomedical terms, containing syntactic, morphological, and orthographic information for each term or word.

  37. Unified Medical Language SystemMetathesaurus • about over 1 million biomedical concepts • About 5 million concept names from more than 100 controlled vocabularies and classifications (some in multiple languages) used in patient records, administrative health data, bibliographic and full-text databases and expert systems. • The Metathesaurus is organized by concept or meaning. Alternate names for the same concept (synonyms, lexical variants, and translations) are linked together. • Each Metathesaurus concept has attributes that help to define its meaning, e.g., the semantic type(s) or categories to which it belongs, its position in the hierarchical contexts from various source vocabularies, and, for many concepts, a definition. • Customizable: Users can exclude vocabularies that are not relevant for specific purposes or not licensed for use in their institutions. MetamorphoSys, the multi-platform Java install and customization program distributed with the UMLS resources, helps users to generate pre-defined or custom subsets of the Metathesaurus. • Uses: • linking between different clinical or biomedical vocabularies • information retrieval from databases with human assigned subject index terms and from free-text information sources • linking patient records to related information in bibliographic, full-text, or factual databases • natural language processing and automated indexing research

  38. Semantic Groups (15) Semantic Types (135) Concepts (millions) UMLSKS – Semantic Network • Complexity reduced by grouping concepts according to the semantic types that have been assigned to them. • There are currently 15 semantic groups that provide a partition of the UMLS Metathesaurus for 99.5% of the concepts. ACTI|Activities & Behaviors|T053|Behavior ANAT|Anatomy|T024|Tissue CHEM|Chemicals & Drugs|T195|Antibiotic CONC|Concepts & Ideas|T170|Intellectual Product DEVI|Devices|T074|Medical Device DISO|Disorders|T047|Disease or Syndrome GENE|Genes & Molecular Sequences|T085|Molecular Sequence GEOG|Geographic Areas|T083|Geographic Area LIVB|Living Beings|T005|Virus OBJC|Objects|T073|Manufactured Object OCCU|Occupations|T091|Biomedical Occupation or Discipline ORGA|Organizations|T093|Health Care Related Organization PHEN|Phenomena|T038|Biologic Function PHYS|Physiology|T040|Organism Function PROC|Procedures|T061|Therapeutic or Preventive Procedure

  39. UMLSKS – Semantic Navigator

  40. Alzheimer’s Disease – Alarming Statistics • The number of patients with AD in any community depends on the proportion of older people in the group. Traditionally, the developed countries had large proportions of elderly people, and so they had very many cases of Alzheimer’s disease in the community at one time. • 4.5 million AD patients in the United States today. • Expected to increase to 11 to 16 million by 2050. • In 2000, health care costs for AD patients in the United States totaled approximately $31.9 billion, which is expected to reach $49.3 billion by 2010 (http://www.alz.org) • World-wide: ~18 million (projected to nearly double by 2025 to 34 million). • Demographic transition - Developing countries: • Increased life expectancy (current life expectancy in India is >60 years). • 1991 India Census: 70 million people were over 60 years. • 2001 India Census: 77 million, or 7.6% of the population. • By 2025, we will have 177 million elderly people. • Currently, more than 50% of people with Alzheimer’s disease live in developing countries and by 2025, this will be over 70%. Source: WHO & NIA

  41. Alzheimer’s Disease – Why Computational Approaches? • The goal of applying computational data-mining approaches is to extract useful information from large amounts of data by employing mathematical methods that should be as automated as possible. • Computational data-mining approaches are particularly appropriate in areas with much data but few explanations, such as gerontology. If researchers can find/derive patterns in data to perceive information, then information may enhance our knowledge over aging. • The complexity and broad range of cellular and biochemical events make researchers believe that there must be a sophisticated network of AD signal transduction, gene regulation, and protein-protein interaction events. • Therefore, deciphering AD-related molecular network “circuitry” can help researchers understand AD disease better, model details, and propose treatment ideas.

  42. Frontal Lobe Hippocampus Temporal Lobe Astrocytes Cerebral Cortex Cerebrum Basal Nucleus of Meynert Brain Microglia Alzheimer Disease Brain and Nervous System Neurons APP NEF3 A2M APOE ALOX12 ABCA1 ABCA2 NME1 PARK2 STH A simplistic picture

  43. Frontal Lobe Hippocampus Temporal Lobe Astrocytes Cerebral Cortex Cerebrum Basal Nucleus of Meynert Brain Microglia Alzheimer Disease Brain and Nervous System Neurons APP NEF3 A2M APOE ALOX12 ABCA1 ABCA2 NME1 PARK2 STH

  44. Many Diseases – Many Genes Frontal Lobe Hippocampus Temporal Lobe Astrocytes Cerebral Cortex Cerebrum Basal Nucleus of Meynert Brain Microglia Alzheimer Disease Brain and Nervous System Neurons PARK3 PARP PARK7 Parkinson Disease SCZD2 Schizophrenia SCZD8 SCZD3 ABCA2 A2M STH APOE APP ALOX12 NEF3 ABCA1 PARK2 NME1

  45. Genes: Functions & Pathways Frontal Lobe Hippocampus Temporal Lobe Astrocytes Cerebral Cortex Cerebrum Basal Nucleus of Meynert Brain Microglia Alzheimer Disease Brain and Nervous System Neurons • enzyme binding • extracellular space • interleukin-1 binding • interleukin-8 binding • intracellular protein transport • protein carrier activity • protein homooligomerization • serine-type endopeptidase inhibitor activity • tumor necrosis factor binding • wide-spectrum protease inhibitor activity Functions/ Processes Alzheimer's disease (Kegg) Neurodegenerative Disorders (Kegg) Deregulation of CDK5 in Alzheimers Disease (BioCarta) Generation of amyloid b-peptide by PS1 (BioCarta) Platelet Amyloid Precursor Protein Pathway (BioCarta) Hemostasis (Reactome) APP NEF3 A2M APOE ALOX12 ABCA1 ABCA2 NME1 PARK2 STH Pathways

  46. Frontal Lobe Hippocampus Temporal Lobe Astrocytes Cerebral Cortex Cerebrum Basal Nucleus of Meynert Brain Microglia Alzheimer Disease Brain and Nervous System Neurons C1QBP KLKB1 APPBP1 NS5A KNG1 TGFB2 CNTF Protein Interactions APP NEF3 A2M APOE ALOX12 ABCA1 ABCA2 NME1 PARK2 STH

  47. Understanding the genetic network of human Alzheimer’s disease - Two general phases • Identifying the genetic players involved • Systematically perturbing individual players and/or pathways suspect of being involved in neurodegenerative diseases of model organisms (e.g. knock-outs) • Computational Approaches • Data-mining (Data marts): Comparative Genomics, Interactome, Comparative Phenomics, Regulomics (TFBSs, motif/pattern search) • Text-mining: Literature mining (hypothesis-generator) • Mathematical Modeling: Disease process modeling • Experimental Approaches • Genetic Manipulations • Gene Expression Studies • Animal Models • Cellular Studies (to investigate specific cellular processes)

  48. Transcriptional Regulation Post-Transcriptional Regulation - MicroRNAs Text-mining: Knowledge Discovery Cellular Studies Gene Expression Models of human neurodegenerative diseases Model Organisms & Genetic Manipulations Clustering Algorithms Comparative Genomics Differentially expressed genes Alzheimer Disease Related Genes Transcriptome Proteomics Genomics

More Related