A probabilistic term variant generator for biomedical terms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

A Probabilistic Term Variant Generator for Biomedical Terms PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

A Probabilistic Term Variant Generator for Biomedical Terms. Yoshimasa Tsuruoka and Jun ’ ichi Tsujii CREST, JST The University of Tokyo. Outline. Probabilistic Term Variant Generator Generation Algorithm Application: Dictionary expansion. Background.

Download Presentation

A Probabilistic Term Variant Generator for Biomedical Terms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A probabilistic term variant generator for biomedical terms

A Probabilistic Term Variant Generator for Biomedical Terms

Yoshimasa Tsuruoka and Jun’ichi Tsujii

CREST, JST

The University of Tokyo


Outline

Outline

  • Probabilistic Term Variant Generator

    • Generation Algorithm

    • Application: Dictionary expansion


Background

Background

  • Information extraction from biomedical documents

  • Recognizing technical terms (e.g. DNA, protein names)

    We measured glucocorticoid receptors ( GR )

    in mononuclear leukocytes ( MNL ) isolated…


Technical term recognition

Technical Term Recognition

  • Machine learning based

    • Identifying the regions of terms

      ⇒ No ID information

  • Dictionary-based

    • Comparing the strings with each entry in the dictionary

      ⇒ ID information


Problems of dictionary based approaches

Problems of Dictionary-based approaches

  • Spelling variation degrades recall

     ⇒ Approximate string searching

  • False positivesdegrade precision

     ⇒ Filtering by machine learning


Exact string searching

Exact String Searching

  • Example

    • Text

      Phorbol myristate acetate induced Egr-1 mRNA…

    • Dictionary

      EGP

      EGR-1

      EGR-1 binding protein

      :

      ⇒ Any of them does not match


Edit distance

Edit Distance

  • Defines the distance of two strings by the sequence of three kinds of operations.

    • Substitution

    • Insertion

    • Deletion

  • Ex.)board → abord

    • Cost = 2 (delete `a’ and add `a’)


Automatic generation of spelling variants

Automatic Generation of Spelling Variants

  • Variant Generator

NF-Kappa B(1.0)

NF Kappa B (0.9)

NF kappa B(0.6)

NF kappaB(0.5)

NFkappaB(0.3)

:

Generator

NF-Kappa B

Each generated variant is associated with

its generation probability


Generation algorithm

Generation Algorithm

  • Recursive generation

    P = P’ x Pop

T cell (1.0)

0.5

0.2

T-cell (0.5)

T cells (0.2)

0.2

T-cells (0.1)


Collecting examples of spelling variation

Collecting Examples of Spelling Variation

  • Abbreviation Extraction (Schwartz 2003)

    • Extracts short and long form pairs


Learning operation rules

Learning Operation Rules

  • Operations for generating variants

    • Substitution

    • Deletion

    • Insertion

  • Context

    • Character-level context: preceding (following) two characters

  • Operation Probability


Probabilistic rules

Probabilistic Rules


Example 1

Example (1)


Example 2

Example (2)


Example 3

Example (3)


Application dictionary expansion

Application:Dictionary Expansion

  • Expanding each entry in the dictionary

    • Threshold of Generation Probability: 0.1

    • Max number of variants for each entry: 20


Protein name recognition

Protein Name Recognition

  • Information Extraction

  • Longest match

  • GENIA corpus


Results of dictionary expansion

Results of Dictionary Expansion

  • a


Conclusion

Conclusion

  • Probabilistic Variant Generator

    • Learning from actual examples

    • Dictionary expansion by the generator improves recall without the loss of precision.


  • Login