nlp pipeline for protein mutation knowledgebase construction
Download
Skip this Video
Download Presentation
NLP pipeline for protein mutation knowledgebase construction

Loading in 2 Seconds...

play fullscreen
1 / 17

NLP pipeline for protein mutation knowledgebase construction - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

NLP pipeline for protein mutation knowledgebase construction. Jonas B. Laurila, Nona Naderi, René Witte, Christopher J.O. Baker. Background. Knowledge about mutations is crucial for many applications, e.g. Protein engineering and Biomedicine.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'NLP pipeline for protein mutation knowledgebase construction' - halona


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
nlp pipeline for protein mutation knowledgebase construction

NLP pipeline for protein mutation knowledgebase construction

Jonas B. Laurila, Nona Naderi, René Witte, Christopher J.O. Baker

background
Background
  • Knowledge about mutations is crucial for many applications, e.g. Protein engineering and Biomedicine.
  • Protein mutations are described in scientific literature.
  • The amount of Information grow faster than manual database curation can handle.
  • Automatic reuse of mutation impact information from documents needed.
example excerpts
Example excerpts

"The W125F mutant showed only a slight reduction of activity (Vmax) and a larger increase of Km with 1,2-dibromoethane."

  • Mutation
  • Directionalityofimpact
  • Protein property

"Haloalkane dehalogenase (DhlA) from Xanthobacter autotrophicus GJI0 hydrolyses terminally chlorinated and brominated n-alkanes to the corresponding alcohols."

  • Protein name
  • Gene name
  • Organism name
named entity recognition
Named entity recognition
  • Protein-, gene- and organism names
    • Gazetteer lists based on SwissProt
    • Mappings encoded in the MGDB
  • Mutation mentions
    • MutationFinder ~700 regular expressions
    • normalize into wNm-format
named entity recognition1
Named entity recognition

Protein Properties

  • Protein functions
    • Noun phrases extracted with MuNPEx
    • Activity, binding, affinity, specificity as head nouns
  • Kinetic variables
    • Jape rules to extract Km, kcat and Km/kcat in current implementation
mutation grounding linking mutations positionally correct to target sequence
Mutation groundingLinking mutations positionally correct to target sequence
  • Important for reuse of mutation mentions
  • Levels of grounding:
mstrapviz
mSTRAPviz

Structure annotation visualization

Mutations extracted from text visualized on the protein structure for which mutation grounding is a prerequisite.

protein function grounding
Protein function grounding

Mentions of protein functions are linked to correct Gene Ontology concepts.

Previously grounded proteins and mutations provide us with hints.

Grounding scored based on string similarity (later used during impact extraction)

relation detection
Relation detection
  • Impacts
    • Words describing directionality + protein properties
  • Mutants
    • Set of mutations giving rise to altered proteins
  • Mutant – Impacts
    • The causal relation between mutants and their impacts
owlexporter
OwlExporter
  • Translates GATE Annotations to OWL instances
  • Application independent
  • Literature Specifications added automatically
  • Used here to populate our Mutation impact ontology to create a mutation knowledgebase
example query
Example query

Retrieve mutations that do not have an impact on haloalkane dehalogenase activity(also retrieve the Swissprot identifier of the protein beeing mutated).

example query1
Example query

Retrieve mutations on Haloalkane Dehalogenase that do not impact negatively on the Michaelis Constant.

evaluation
Evaluation

Mutation grounding performance

what s next
What’s next?

*Bromberg and Rost, 2007

Modularize into a set of web services

Database (re-)creation

Reuse in phenotype prediction algorithms, (SNAP)*

nlp pipeline for protein mutation knowledgebase construction1
Jonas B. Laurila

CSAS, UNB, Saint John

[email protected]

Nona Naderi

CSE, Concordia University, Montréal

[email protected]

René Witte

CSE, Concordia University, Montréal

[email protected]

Christopher J.O. Baker

CSAS, UNB, Saint John

[email protected]

NLP pipeline for protein mutation knowledgebase construction

Acknowledgement

This research was funded in part by :

  • New Brunswcik Innovation Foundation, New Brunswick, Canada
  • NSERC, Discovery Grant, Canada
  • Quebec -New Brunswick University Co-operation in Advanced Education - Research Program, Government of New Brunswick, Canada
ad