Information Extraction From Medical Records. by Alexander Barsky. Current Methodology:. Broad assessment of patient contained in beginning of chart with references to more specific areas. Specific divisions follow broad assessment. Records are listed in chronological order of activity.
Information Extraction From Medical Records
by Alexander Barsky
Broad assessment of patient contained in beginning of chart with references to more specific areas. Specific divisions follow broad assessment. Records are listed in chronological order of activity.
A patient's medical chart is very detailed and very complex in nature. Any attempt to quickly locate specific information will be met with frustration.
Create a system that properly extracts wanted information based on a predefined set of parameters.
Example: "Hormonal imbalance during puberty". Retrieve all references to hormonal imbalances but only between two specific time periods in medical chart.
JAPE : Java Annotation Patterns Engine.
Use : pattern matching and semantic extraction
GATE : General Architecture for Text Engineering.
Use: Information Extraction, document annotation, and
C# : Visual C# Winforms.
Use: Medium for conversion between XML and .csv file formats.
1. Create corpus of documents in GATE.
2. Introduce rules for information extraction.
3. Annotate documents in corpus.
4. Output annotated documents in XML.
5. Strip file of unnecessary elements and convert to .csv.
-Tokeniser - splits sentence into simple tokens
-Gazetter - identify entity names contained in lists
-Sentence Splitter - splits text into sentences based on lists.
-Parts of Speech Tagger - identifies text as different POS.
-Coreference Matcher- identifies relationships between previously defined entities.
XLST - Extensible Stylesheet Language Transformations
- Add specific rules to seperate needed from unnecessary information.
-Find all the nodes within the <Lookup>. Add string between the tags.
CSV File TypeComma Seperated Value - Used to present information in a tabular system. Useful for analyzing large amount of data in an easy to understand format. Most common program to use it is Excel.
Regardless of how well all the ANNIE tools are utilized and how well the JAPE rules are defined, proper recall precentage won't ever be exact.
Machine learning is our best chance to increase precision of output results. Training a computer to recognize commonally used reporting phraseology will organize extraction better with more precise, concise outputs. Lucky for us, GATE include plugins to program machine learning.