Loading in 5 sec....

Identifying Abbreviation Definitions in Biomedical TextPowerPoint Presentation

Identifying Abbreviation Definitions in Biomedical Text

- By
**lotus** - Follow User

- 92 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Identifying Abbreviation Definitions in Biomedical Text' - lotus

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Identifying Abbreviation Definitions in Biomedical Text

Ariel Schwartz Marti Hearst

The Problem

- The volume of biomedical text is growing at a fast rate. New abbreviations are introduced frequently.
- Manual abbreviation dictionaries are out of date.
- The goal is to have a simple, fast and accurate algorithm to identify abbreviations and their definitions in biomedical text.
- We are interested in this algorithm, as one of many preprocessing steps we apply to biomedical texts, in order to be able to extract meaningful information from these texts.

Abbreviation Examples

- “Heat-shock protein 40 (Hsp40) enables Hsp70 to play critical roles in a number of cellular processes, such as protein folding, assembly, degradation and translocation in vivo.”
- “Glutathione S-transferase pull-down experiments showed the direct interaction of in vitro translated p110, p64, and p58 of the essential CBF3 kinetochore protein complex with Cbf1p, a basic region helix-loop-helix zipper protein (bHLHzip) that specifically binds to the CDEI region on the centromere DNA.”
- “Hpa2 is a member of the Gcn5-related N-acetyltransferase (GNAT) superfamily, a family of enzymes with diverse substrates including histones, other proteins,arylalkylamines and aminoglycosides.”

Related Work

- Pustejovsky et al. present a solution based on hand-build regular expression and syntactic information. Achieved 72% recall at 98%
- Chang et al. use linear regression on a pre-selected set of features. Achieved 83% recall at 80%* precision, and 75% recall at 95% precision.
- Park and Byrd present a rule-based algorithm for extraction of abbreviation definitions in general text.
- Yoshida et al. present an approach close to ours, trying to first match characters on word and syllable boundaries.

* Counting partial matches, and abbreviations missing from the “gold-standard” their algorithm achieved 83% recall at 98% precision.

The Algorithm

- Much simpler than other approaches.
- Extracts abbreviation-definition candidates adjacent to parentheses.
- Finds correct definitions by matching characters in the abbreviation to characters in the definition, starting from the right.
- The first character in the abbreviation must match a character at the beginning of a word in the definition.
- To increase precision a few simple heuristics are applied to eliminate incorrect pairs.
- Example: Heat shock transcription factor (HSF).
- The algorithm finds the correct definition, but not the correct alignment: Heat shock transcription factor

Results

- On the “gold-standard” the algorithm achieved 83% recall at 96% precision.*
- On a larger test collection the results were 90% recall at 95% precision.
- An alternative algorithm, based on modification of the Park and Byrd algorithm using decision lists, achieved only slightly better results – 83% recall at 97% precision, and 90% at 96% precision.
- These results show that a very simple algorithm produces results that are comparable to these of the exiting more complex algorithms.

* Counting partial matches, and abbreviations missing from the “gold-standard” our algorithm achieved 83% recall at 99% precision.

Download Presentation

Connecting to Server..