Networked Knowledge Organization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Brian A. Carlsen Apelon, Inc. PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Networked Knowledge Organization Systems/Services Workshop June 28, 2001. Tools For Classification Integration. Brian A. Carlsen Apelon, Inc. Presentation Outline. State of the UMLS Metathesaurus Life-cycle of a Source Tools and Processes Challenges Further Approaches.

Download Presentation

Brian A. Carlsen Apelon, Inc.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Brian a carlsen apelon inc

Networked Knowledge Organization

Systems/Services Workshop June 28, 2001

Tools For Classification Integration

Brian A. Carlsen

Apelon, Inc.


Presentation outline

Presentation Outline

  • State of the UMLS Metathesaurus

  • Life-cycle of a Source

  • Tools and Processes

  • Challenges

  • Further Approaches


State of the umls metathesaurus

State of the UMLS Metathesaurus

  • Concept orientation, concept persistance

  • Growth to over 800,000 concepts and over 60 vocabulary families

  • Over 1000 users worldwide

  • Uses of the Metathesaurus

    • Natural Language Processing

    • Knowledge Representation

    • Patient Record Systems

    • Linking Patient Data to Knowledge Sources

    • Automated Indexing/ Retrieval


Concept and name counts by release year

Concept and Name Counts By Release Year


English word string counts by release year

English Word, String Counts by Release Year


Outline

Outline

  • State of the UMLS Metathesaurus

  • Life-cycle of a Source

  • Tools and Processes

  • Challenges

  • Further Approaches


Life cycle of a source inversion

Life-cycle of a Source: Inversion

  • Source arrives in “machine readable” format*

    • Many formats are used, including PDF, Clipper dump files, WordPerfect files, unit-record formats, and relational flat files.

  • Source undergoes “inversion”

    • Requires a human

    • Input is this machine readable file

    • Process is source-specific

    • Output is a common relational flat-file format used internally.


Life cycle of a source insertion

Life-cycle of a Source: Insertion

  • A “Recipe” is created

  • Test insertion to validate recipe

  • Insertion and matching.

    • Load common format into database

    • Match to existing content algorithmically

      • Use string normalization

      • Determine SAFE vs. UNSAFE matches

    • Prepare data for editing

    • Process is fully undoable


Life cycle of a source editing

Life-cycle of a Source: Editing

  • Predicate-based partitioning

  • Workflow management

    • Review ALL content for new sources

    • Review UNSAFE content for updates

  • Human Review

  • QA Driven Editing

    • Source-specific QA

    • Feedback QA

    • Conservation of Mass QA


Life cycle of a source release

Life-cycle of a Source: Release

  • Synchronize editing changes

    • State-based model

  • Release data in desired format

    • Full release/partial release

  • Transform base release

    • “MetamorphoSys”

    • Remove unlicensed data

    • Create “Content Views”


Outline1

Outline

  • State of the UMLS Metathesaurus

  • Life-cycle of a Source

  • Tools and Processes

  • Challenges

  • Further Approaches


Tools and processes overview

Tools and Processes: Overview

  • Humans vs. Computers

    • Humans are good at making content decisions

    • Computers are good at automating tasks

  • Tools vs. Processes

    • Tools enable computers to automate tasks

    • Processes keep humans productive.


Tools and processes pre editing

Tools and Processes: Pre-Editing

  • No common data representation

  • Source-by-source conversion to common format

    • Perl, Unix tools

  • What would a common format need?

    • Represent terms and attributes

    • Represent within-source relationships

    • Represent hierarchies

    • Represent external-source relationships

    • Represent classifications (e.g. Concept)


Tools and processes editing

Tools and Processes: Editing

  • Workflow Management

  • Report Generation

  • State Model vs. Action Model

    • Actions represented as new states vs.

    • Single state + actions as data

  • Human Editing

    • Interface enabling “high level cognitive editing”

  • LVG: String Normalization

  • Automated Editing

    • Save vs. Unsafe, Integrities


Tools and processes release

Tools and Processes: Release

  • License Agreements

  • Content Views

    • e.g. Indexing View

    • Filter by Semantic Type

    • Filter by Language

  • Alternative Release Formats

  • Updates

  • MetamorphoSys


Outline2

Outline

  • State of the UMLS Metathesaurus

  • Life-cycle of a Source

  • Tools and Processes

  • Challenges

  • Further Approaches


Challenges ambiguity

Challenges: Ambiguity

  • Ambiguous Strings

    • e.g. “Cold”

    • Solution: Disambiguating strings, Preferred Names with “face validity”, Integrity checks when merging.

  • Not fully specified Strings

    • e.g. “Head of Pancreas” within “Malignant Neoplasm of Pancreas”

    • Solution: Fully specified preferred name.


Challenges what is a classification

Challenges: What is a Classification?

  • A classification is any grouping of terms with a consistent semantics.

  • Thesauri typically group terms by meaning into concepts (synonymy).

  • Alternatives

    • Neighborhoods (e.g. Descriptors in MeSH).

    • Near-synonymy

    • No classification (identity or term classification).

    • Lexical

  • Connecting relationships/attributes to classifiers


Challenges precedence

Challenges: Precedence

  • Concepts (or other classifications) generally have a preferred name

  • A thesaurus will have terms from different sources competing for precedence

  • Source precedence should be a user-level choice

  • Preferred name should not be used as a proxy for concept-ness

  • Every level of classification should have a preferred term

  • Preferred name exists primarily for “face validity”


Challenges update model

Challenges: Update Model

  • Constituent sources of a thesaurus will be updated

  • Editing cycle

    • Updated sources will require editing

    • Typically overlap is > 90%

    • Overlap can safely replace the old version’s content

    • Safe replacements should not be edited

    • Ideally, source providers would indicate replacement otherwise it must be computed

  • Release

    • Release changes


Outline3

Outline

  • State of the UMLS Metathesaurus

  • Life-cycle of a Source

  • Tools and Processes

  • Challenges

  • Further Approaches


Further approaches description logic

Further Approaches: Description Logic

  • What is it?

    • Concepts (or other classifications) are axioms

    • Relationships (roles) are theorems

    • The transitive closure of the roles across the concepts is computed to ensure no violations.

    • e.g. A isa B, B isa C, C isa A (!violation)

  • When is it useful?

    • In formalized, static domains like Anatomy

  • When is it not useful?

    • Performance > formalism

    • In dynamic, loosely coupled domains like Genomics


Further approaches standards xml

Further Approaches: Standards XML

  • Standardized Terminology/Ontology Representation

    • XML is the most likely candidate

    • Ideally would support

      • Links to external sources

      • Relationships between different levels of classification

      • Update model

      • Description Logic Metadata

  • Standardized Thesaurus Representation

  • XML Repository

  • Standard Object Representations


Conclusion lessons learned

Conclusion: Lessons Learned

  • Use the Web

  • Use current technology

  • Use Description Logic where appropriate

  • Make editing intuitive

  • Automate tasks

    • “A well-understood, reproducible, automated process that succeeds 95% of the time is a vast improvement over a poorly-understood, labor-intensive process that is believed to succeed 100% of the time. “

    • Review UNSAFE automated tasks.

    • Stop automating when marginal utility falls below a threshold.


  • Login