cis regulatory text mining interface n.
Skip this Video
Loading SlideShow in 5 Seconds..
Cis-Regulatory/ Text Mining Interface PowerPoint Presentation
Download Presentation
Cis-Regulatory/ Text Mining Interface

Loading in 2 Seconds...

play fullscreen
1 / 7

Cis-Regulatory/ Text Mining Interface - PowerPoint PPT Presentation

  • Uploaded on

Cis-Regulatory/ Text Mining Interface. Discussion. Questions. (1) What does ORegAnno want from text mining? Curation queue Document mark-up Mapping to database IDs (2) What does text mining need from ORegAnno? (3) What can text mining provide? What level of performance is needed?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Cis-Regulatory/ Text Mining Interface' - darryl-aguirre

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

(1) What does ORegAnno want from text mining?

  • Curation queue
  • Document mark-up
  • Mapping to database IDs

(2) What does text mining need from ORegAnno?

(3) What can text mining provide?

  • What level of performance is needed?

(4) What is the right way to proceed?

  • Data sets for BioCreAtIvE?
  • Custom tools for individual “early adopters”?
answers 1 what does oreganno want from text mining
Answers: (1) What does ORegAnno Want from Text Mining
  • Management of curation queue
    • Ideally, user customized, so that user annotates those documents of immediate interest to her/him
  • Document mark-up to highlight relevant passages
    • A workflow pipeline making either the html or pdf version of the document available, with the (potentially) relevant terms highlighted
    • Support for “cut and paste” transfer of relevant regions to the database comments fields
  • Mapping to IDs, ontology codes
    • Gene, transcription factor (protein), organism, cell and tissue type, evidence types
answers 2 what does text mining need from oreganno
Answers: (2) What does Text Mining Need From ORegAnno?
  • Significant quantity of reliably annotated data to train text mining systems
    • Annotated at a level useful for natural language processing (e.g., marked for evidence at the phrase, sentence or passage level, depending on task)
  • This requires that ORegAnno have:
    • A clear statement of the scope of the ORegAnno database and a stable set of annotation guidelines
    • Annotations with high inter-annotator agreement
    • Tracking of entries by annotator, including depth of annotation (different annotators will annotate to different levels of detail, depending on interests)
answers 3 what can text mining provide
Answers: (3) What Can Text Mining Provide?
  • Curation queue management:
    • Document classification approaches (from e.g., TREC Genomics or BioCreAtIvE) can be applied and evaluated, making use of new training data from pre-jamboree and jamboree annotation
    • We can experiment with “user defined” criteria, based on restrictions for gene, transcription factor, organism, tissue, etc.
  • Document mark-up
    • Users could be provided with a list of genes/transcription factors in a paper, with hot links into the paper to find relevant passages
    • This would allow the annotator to drive the annotation process, selecting only those annotations that are correct and relevant. This in turn provides feedback using ORegAnno annotations to validate & train the text mining
    • Such a tool should make it easy for the annotator to provide the underlying text passages as evidence for the annotation, to provide more training data
  • Mapping to unique identifiers/controlled vocabulary/ontology
    • For each entity type (gene, transcription factor, organism, tissue type...), a tool can provide a mapping to the correct identifier; where there is possible ambiguity, the tool could provide a ranked list for the annotator to choose from
    • A tool can also flag different evidence types, with suggested code(s)
answers 4 how to proceed
Answers: (4) How to Proceed?
  • Stabilize guidelines and redo the inter-annotator agreement expt (and write up)
  • Prepare a Gold Standard data set of expert annotated data for training new annotators
  • Collect sufficient amount of training data for the various tasks (queue management, document mark up, automated mapping)
  • Develop end-to-end pipeline (in the style of the FlySlip project) to capture whole documents in machine-readable form for mark-up
recommendations training materials tools
Recommendations: Training Materials & Tools
  • Case studies and gold-standard annotated articles
  • On-line training
    • Perhaps with a way for new annotators to test themselves against a set of gold standard annotations
    • This will require automated comparison of annotations for certain fields
  • Best tools links
  • Tools:
    • Copy mechanism for largely duplicated record