Text mining in animal health surveillance
This presentation is the property of its rightful owner.
Sponsored Links
1 / 37

Text Mining in Animal Health Surveillance PowerPoint PPT Presentation


  • 47 Views
  • Uploaded on
  • Presentation posted in: General

Text Mining in Animal Health Surveillance. John Berezowski Clarissa Snyder Lindsay Mclarty Food Safety Division Alberta Agriculture Food And Rural Development. Text Mining In Public Health. Knowledge management Classification of journal articles to manage and search of databases

Download Presentation

Text Mining in Animal Health Surveillance

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Text mining in animal health surveillance

Text Mining in Animal Health Surveillance

John Berezowski

Clarissa Snyder

Lindsay Mclarty

Food Safety Division

Alberta Agriculture Food And Rural Development


Text mining in public health

Text Mining In Public Health

  • Knowledge management

    • Classification of journal articles to manage and search of databases

    • Classification of hospital records to allow data mining of hospital databases to discover knowledge

  • Classification of medical records for real time surveillance

    • Free text emergency room chief complaints classified into syndromes eg GI or Influenza like


Purpose

Purpose

  • Canada-Alberta BSE Surveillance Program:

    • CABSESP

    • Alberta Veterinarians participate in BSE surveillance

    • Submit cattle samples for BSE testing

      • Dead or euthanized

    • Examine cattle prior to sampling

    • Provide data about farmers and animals tested

  • Purpose: maximize information about cattle tested

    • Especially why cattle were: sick/dead/sampled

    • Assist CFIA to identify ‘Clinical Suspects”


Purpose1

Purpose

  • Large sample (July 04 - July 06)

    • 35,720 Alberta cattle tested by AAFRD

      • Another 25,000(+/-) tested by the CFIA

    • 9,117 farms

    • 141 veterinary clinics (293 veterinarians)

  • Purpose: evaluate utility of BSE submission form data for other surveillance purposes


Submission form data

Submission Form Data

  • Farmer ID, date, location, number on farm

  • Purebred (y/n), breed, age, sex, BCS, PM (y/n)

  • Diseased, Distressed, Down, Dead, Neuro

  • Clinical signs in free text format

  • Presumptive diagnosis in free text format


Example submission

Example Submission

  • Clinical Signs: Cow was in dry lot. Went off feed, coughing and labored breathing

  • Presumptive Diagnosis: PM findings- traumatic pericarditis and abscess from hardware between reticulum and diaphragm

  • Need tools (Text Mining) to extract information from free text fields


Text mining definition

Text Mining: Definition

  • Based on data mining definitions

  • Knowledge discovery in text

  • Semi or automated discovery of trends and patterns across large volumes of text

  • Computer applications that aim to aid in making sense of large volumes of text


Text mining our context

Text Mining: Our Context

  • Classify cattle with respect to certain concepts:

    • Etiologies: Johne’s, AIP, hepatic lipidosis, LDA, IBR, unknown, etc.

    • Descriptors: acute, chronic, emaciated, lame, autolyzed, blind, ataxic, etc.

    • Clinical Presentation:Syndromes: respiratory, GI, repro etc

  • Use classifications to better describe the cattle sampled and look for associations or trends within the samples


Named entity recognition

Named Entity Recognition

  • Identify terms in text

    -Term = textual representation of a concept

  • Classify terms

    -Noun vs verb vs adjective,preposition, etc.

    -Etiology vs descriptors: animal (pregnant) vs clinical sign (chronic)

  • Map terms to concepts in an ontology

    -Associate each term with one or more concepts

Bleeding

Concept of hemorrhage

Bled

Hemorrhage


Problems with our data

Problems With Our Data

  • No suitable ontology

  • What’s an ontology?

    • A model that links concept labels to their textual representations and defines or describes the relationships between concepts

    • Machine readable descriptions of concepts and their relationships

    • Examples: Dictionaries, SNOMED-SNOVET


Problems with our data1

Problems With Our Data

  • Terms are formal (vet/med) + unusual

    “Nephritis”, “peritonitis”, “cancer eye”, “lump jaw”, “corkscrew claw”, ‘downer”, “fatty liver”, “hardware”, “found dead”

  • Specific to food animal practitioners.


Problems with our data2

Problems With Our Data

  • Term Variation

    • A single concept is expressed in a number of different ways (synonyms)

    • Probability of two experts using the same term to refer to the same concept is lessthan 20%1

    • Arthritis: arthritis, arthritic, osteoarthritis, polyarthritis, septic-arthritis

  • 1Grefenstette G. 1994


Problems with our data3

Problems With Our Data

  • Term Ambiguity

    • The same term is used to refer to multiple concepts

    • Multiple meanings for the same term

    • Boated= nutritional (feedlot, pasture), or bloated abdomen (perforated ulcer)

    • Prolapse = vagina, uterus, rectum, vaginal fat, intestinal


Problems with our data4

Problems With Our Data

  • No sentence structure

    • “Old age, arthritis, no teeth”

    • “Stifle, bilateral, degenerative, arthritis”

    • ‘Pelvic injury, post calving, crippled “

    • “Down, tumor on R shoulder, losing condition”


Build our ontology

Build Our Ontology

  • From the text fields on the submission forms

  • Designed to meet our classification needs

    • Identify Potential “Clinical Suspects”

    • Classify BSE submissions into clinical syndromes


Clinical suspect

Clinical Suspect

Refractory To Treatment

Alive

Yes

Yes

Progressive Behavior Change

Progressive Neuro Signs

OR

Clinical Suspect

Yes

Over 30 Months

Rule Outs

No

Yes

[Alive]AND[(Refractory to tx)AND(Progressive Behavior ChangeORProgressive Neuro Change)AND(No Rules Outs)AND(Over 30 months of Age)]

Clinical Suspect=


Clinical suspect1

Clinical Suspect

Refractory To Treatment

Alive

Yes

Yes

Progressive Behavior Change

Progressive Neuro Signs

OR

Clinical Suspect

Yes

Over 30 Months

Rule Outs

No

Yes

[Alive]AND[(Refractory to tx)AND(Progressive Behavior ChangeORProgressive Neuro Change)AND(No Rules Outs)AND(Over 30 months of Age)]

Clinical Suspect=


Ontology

Ontology

  • Chronic (refractory to Tx)

  • Neurologic

  • Behavioral

  • Rule outs

    • LameSkin/Ocular/Mammary

    • CardiovascularSudden Death

    • GI Infectious Dz

    • ReproEdema/Swelling/Neoplasia

    • RespiratoryTrauma

    • UrologicAnorexia/Wt loss


Method

Method

  • Text Mining Software

    • “WordStat” and “SimStat” (Provalis Research, Quebec City, PQ)

  • Spell checked text fields

  • Identified all words in the text fields

    • 292,537 words in total, 7,266 unique

  • Manually sorted words into ontology categories


Chronic

Chronic

  • ADVANCED DOWNHIL*

  • CHONIC DURATION

  • CHRINIC AWHILE

  • CHRONCI POOR_DOER

  • CRONIC DECLIN*

  • D*BILIT* EMACIAT*

  • DAYS_AGO


Neurological

Neurological

  • Ataxia

  • Neurological

  • Paresis/Paralysis

  • Hyperesthesia

  • Hypermetria

  • Locomotor deficits


Neurological1

Neurological

  • Ataxia

    • *ATAX*, AT*XIA, AT*XIC, ATACHIA, ATAXIA, TAXIA, etc

  • CNS

    • CN*, MENINGITIS, MENINGOMA , etc

  • Neurological

    • CONVULS*, HEAD_PRESS*, HEPATOENCEPHALOPATHY, N*URO*, NEUR*, etc

  • Paresis/Paralysis

    • PARLAYSIS, PARLYSIS, PARYALYZED, PARAPARESIS, PAREISIS, PARES*, PARETIC, etc


Behavioral

Behavioral

  • Behavioral

  • Hyperexcitable


Behavioral1

Behavioral

  • Behavioral

    • *EHAV*, APPREHENS*, AVOID*, BALKING, BAWLING, BELIGER*, BELLIGER, BELLOW*, BIZARRE, COMPULSIVELY, CRAZY, DELIROUS etc

  • Hyperexcitable

    • ANXIETY, ANXIOUS, CHARG*, CHASE*, EXCITEABLE, HYPERALERT, HYPEREXC*, HYPEREXCITABLE, HYPERSENSITIV*, IRRITA*, etc.


Example submission1

Example Submission

  • Clinical Signs: Cow was in dry lot. Went off feed, coughing and labored breathing

  • Presumptive Diagnosis: PM findings- traumatic pericarditis and abscess from hardware between reticulum and diaphragm


Classifying submissions

Classifying Submissions

  • Cow was in dry lot. Went off feed, coughing and labored breathing

Anorexia

Respiratory


Classifying submissions1

Classifying Submissions

  • PM findings- traumatic pericarditis and abscess from hardware between reticulum and diaphragm

GI

Cardiovascular

Trauma


Classified submissions

Classified Submissions

N = 35,721


Clinical suspects

Clinical Suspects


Clinical suspect examples

Clinical Suspect Examples


Veterinary practice surveillance

Veterinary Practice Surveillance

  • Veterinary Practice Surveillance (VPS)

    • Cattle practitioners submit data about about cattle to AAFRD daily via a restricted access website

    • Practitioners classify sick cattle by commodity (cow-calf, dairy etc), age and syndrome (12)

  • Large sample

    • 26,016 Submissions (Aug 05 – Dec 06)

    • 5,081 farms

    • 31 veterinary clinics


Submissions per day

Submissions per day

Sept 2005 to July 2006


Respiratory syndrome

Respiratory Syndrome

VPS = Cattle greater than 30 months of age


Clostridium hemolyticum

Clostridium hemolyticum

VPS = 75 cases, BSE = 157 cases


Utility

Utility ?

  • Classifying/identifying “High Risk”

  • Generalize with caution (no prevalence)

    • Sampling bias

    • Misclassification

      • For each classification estimate:

      • Se and Sp of veterinarians

      • Se and Sp of text classifier


Utility1

Utility ?

  • But:

    • Large sample

      • Disease importance or trends over time and space

      • Clostridium hemolyticum

    • Events: syndromic, unknown, emerging

      • Establish normal patterns to identify unusual events

      • Respond/investigate

      • Access for targeted surveillance


Questions

Questions?

  • Our Team:

    • Clarissa Snyder

    • Lindsay McLarty

    • John Berezowski

  • Contact us:

    [email protected]


  • Login