TM Web services: Whatizit, CiteXplore

EBI TM services: mapping targets to diseasesSeptember 3rd, 2009Dietrich Rebholz-Schuhmann, MD, PhDGroup Leader Rebholz GroupEBI, WT Genome CampusHinxton, Cambridge, U.K.

TM at the EBI: current developments • TM Web services: Whatizit, CiteXplore • One of EBI’s major services: 11,000 hits per day, 400 MB data transfer • Ongoing integration into public services (UKPMC) • Research around new developments and Quality assurance • Working towards a knowledge infrastructure from literature • Named entity recognition: most progress • Relation / event identification • Repository of inferred knowledge: functional annotation of genes, diseases, gene-disease associations, relation identification • Exploitation of semantic resources (ontologies)

The magic transformation from text to semantics Concepts Ideas Facts Relationships Events “Knowledge” ?

How far can we go? Automatic + full integration with database resources? => Mainly entities + concepts ? Automatic generation of paper summaries => Extraction of facts + events Extraction of new knowledge => Generate hypothesis first Let the authors do it all => Do not use papers anymore

Idealized R&D stages (overview) Genes/ProteinsChemical entitiesDiseasesGO/MeSH termsBioLexicon Gene regulationontology Ternary relations Functions of proteinsGene-diseaseassociations WhatizitIeXML Integration of literature into bioinformatics IT services 2006 2008 2009 Time Semanticssupport Named entity recognition / grounding Identificationof relations Interoperabilityof literature and text mining

Document Entities Concepts Tokens Facts

“The function of OmpR appears to be the enhancement of a basal level of ompC expression” basal level of ompCexpression OmpRompC … the of appear … OmpR increases ompC expression

Gene normalisation SwissProt Biolexicon, human Best performance=> 100% Precision=> 100% Recall Performance is state of the ArtResults are nottuned to theBioCreAtIve IIcorpus Pezik et al., Proc. LREC Workshop, 2008

Entities + concepts SwissProt Biolexicon, human Chemicals entities Disease NER MeSH terms Go terms(@ Rank 1) All solutions are state of the art Jimeno et al., BMC Bioinformatics, 2008

Protein-protein interaction identification GREs withInference Performance not adequate,improvementsrequired “Associate” MI-PPI All NMI-PPI GREs w/oInference Rebholz-Schuhmann et al., SMBM 2008

How do we find knowledge?

Gene-disease associations Motivation • Some diseases have a mono-genetic cause: • For example Cystic fibrosis, sickle cell anemia, F8/F9-defects, deafness • Other diseases have a pluri-genetic cause: • Schizophrenia, stomach cancer, hypertension • Question: • Can we find molecular functions that are shared between genes and diseases?

Gene-disease association pairsfrom the literature

Candidate genes: Approach • Complete Medline analysis • Identify all genes/proteins (80% F-measure) • Identify all gene ontological terms (35% F-measure) • Identify all diseases (70% F-measure) • Generation of concept profiles for genes and diseases • Each vector contains the TF-IDF value of all relevant GO concepts • A GO concept is relevant if found in the context of a gene or disease • Pivoted cosine similarity • Selection of gene profiles that are most similar to disease profiles • Prioritization of gene-disease associations • Evaluation • Alternative methods: MeSH annotations and tokens • “Gold standard” data resources: OMIM, GAD, GOA • Assessment by curators

Candidate genes: Evaluation, Omim/GAD Limited performance due to:- term variability- not all G2D associations are relevant to Omim

Candidate genes: Validation by curators • Neither OMIM, nor GAD are complete • Curators are more able to verify putative novel knowledge • Evaluation: • Random sample of novel 30 gene disease association pairs • At least 2 out of 3 curators have to agree, use of literature resources • Verify the direct mention of the gene diseases association • Identify indirect evidence for the gene-disease pair • Verify the assignment of GO concepts

Candidate genes: curator assessment 63% of gene-diseaseassociations can beconfirmed by at least 2 curators 57% of GOassignmentsdescribe thedisease and the gene

P-values of GDAPs (based on cosine scores) No clear confirmation of gene-disease associations Clear confirmation of mostGDAPs

Candidate genes: Outcome • Identification of 1,154 putative novel gene-disease associations from the literature • 63% (in total 727) should be reliable=> to be confirmed • 672 distinct candidate genes linked to the associations • 340 genes are also covered in GOA linked to 545 gene-disease associations • 57% of the assigned GO concepts are reliable • Interpretation of the gene-disease association • 10% of the GO concept annotations are shared with GOA

Gene-disease association pairsfrom the literature

Where do we move in the future?

How far can we go? ? Automatic generation of paper summaries => Extraction of facts + events Let the authors do it all => Do not use papers anymore

Research to drive standards Standardization of Document Formats: • IeXML • SciXML • Standardization of Content: • Genes • Chemical Entities • Medical terms • MeSH, GO terms PaperMaker: Support to authors Performance assessment on a very large corpus(FP07, support action) Bioinformatics user: Analytical pipelines

UKPMC: Prospect

The process • Collaborative annotation of a large-scale biomedical corpus • Five project partners annotated the first corpus(150,000 documents, different semantic types) • Reconciliation, syntax + semantics=> generate the pilot corpus • Make part of the pilot corpus available => challenge: reproduce the annotations • Close the challenge, harmonise the annotations again=> next corpus • Reopen the challenge with the second harmonized corpus

The challenge 150,000 documentsor more ... Test set for all systemsAssessment, benchmarking

Support to authors / readers • FEBS Letter experiment • Authors contribute to the curation work • They identify the correct entity in the DBs (gene/protein) • Curators add the protein-protein interaction to the DB (MINT) • BioCreative Meta-Server => BioCreative II.5 • BioLit (P. Bourne et all) • adding semantic data to the literature => keep it in a DB • Word plug-gin to annotate ontological terms • PaperMaker (Rebholz group) • Consistency analysis of manuscripts • Reflect , OnTheFly (Schneider group) • Annotation of documents + interlinking with DBs • Royal Society of Chemistry • Markup of text (Oscar + editors) => interlinked chemistry

PaperMaker • PaperMaker - a tool to support authors writing biomedical papers: • Interactive feedback on the contents of papers (related work and concept annotations) • Formal consistency criteria checking (spelling, terminology, acronyms, references)

Consistency parameters Domain-independent • General spelling and grammar • General readability • Appropriate use of references • Finding and acknowledging related work Domain-specific use of terminology: • Should be consistent with naming domain-specific guidelines • Should not be ambiguous • Should conform to the conventional usage (possible clashes between naming guidelines and common-sense convention) • Useful to resolve terminology to reference databases (e. g. UniProt for protein names, ChEBI chemical entities, etc.) • The special case of acronyms

Content feedback • Resolving the contents to literature repositories • Finding related work (document retrieval) • Finding related ideas (passage retrieval) • Resolving the contents to ontological reference databases • MeSH descriptors have been demonstrated to improve biomedical information retrieval. Can we suggest MeSH terms directly to the authors? • Gene Ontology (GO) terms are increasingly used in information extraction systems.

PaperMaker workflow Original manuscript text Module 1 Spell Checker Module 2 Acronym Resolution Module 3 NER Module 4 GO Recognition Module 8 Summary Module 7 Related Work Module 6 Reference Check Module 5 MeSH Annotation Modified manuscript text

PaperMaker, Conclusions • PaperMaker can help the author conform to the formal requirements of paper writing with special emphasis on the domain • It also provides feedback on the contents by relating it to reference resources and literature repositories • It may improve the indexing of a paper in literature repositories (less ambiguous terminology) • http://www.ebi.ac.uk/Rebholz-srv/PaperMakerWork in progress 

TM services at the EBI: Conclusions • Standardised TM solutions available, free use • Quality assurance is ongoing work, integration with EBI’s data resources • About 500 to 2,500 users, 50 GB annual data transfer • Knowledge infrastructure is work in progress • Annotations of genes, diseases • Extraction of different types of relationsCollaborations between publishers and pharmaceutical industry(SESL project)

Editorial Board: Christopher Baker Olivier Bodenreider Philip Bourne Anita Burgun-Parenthoine Carol Friedman Carole Goble Udo Hahn Lynette Hirschman Jung-Jae Kim Patrick Lambrix Ulf Leser Susanna Lewis Jong C. Park Editorial Board (cont): Alan Ruttenberg Tapio Salakoski Susanna Assunto-Sansone Michael Schroeder Stefan Schulz Amnon Shabo Barry Smith Robert Stevens Toshihisa Takagi Alfonso Valencia Mark Wilkinson Limsoon Wong

Acknowledgements … IeXML: G. Nenadic, Uo.Manchester CALBC: J. v.d.Lei, Rotterdam E. v.Mulligen, Rotterdam O. Bodenreider, NLM Other: M. Ashburner, Uo.Cambridge U. Leser, HUo.Berlin D. Trieschnigg, Uo.Twente F. Couto, Uo.Lisbon A. Waagmeester, Uo.Maastricht S. Jaeger, HUo.Berlin T. Grego, Uo.Lisbon A. Baillif, Uo. Clermont-Feront BootStrep: Udo Hahn, Uo.Jena E. Beisswanter, Uo.Jena K. Tomanek, Uo.Jena K. Buyko, Uo.Jena S. Ananiadou, Uo.Manchester N. Calzolari, CNRS Pisa A. Burgun, Uo.Rennes EBI: P. Stoehr, E. Dimmer, E. Camron, M. Kapushevski, H. Hermjakob, N. Luscombe, D. Clark, P. Flicek,

TM Web services: Whatizit, CiteXplore

TM Web services: Whatizit, CiteXplore

Presentation Transcript

Overview of Web Services

Results Services

Web Services

Packers and Movers faridabad

7. Web Services

Web services @ work

Rice Services

Web Services Strategy

Teleservices

Web Services

Foundations of Services Marketing

Chapter 6: Name Services

How Outsourcing CFO Services Can Help Scale Your Business

SEO services London

Packers and movers in gurgaon

DMV services solutions tailored for you

Housekeeping Services, Manpower, Payroll Outsourcing Services in Ahmedabad

Seo Services - www.wtechy.com

Change the Looks of Your Place with Carpentry Services

VAT Consultation Services

commodity Daily Prediction Report By TradeIndia Research -11-02-19