360 likes | 521 Views
The UMLS* Metathesaurus*: Lessons for Metadata Registries. Betsy L. Humphreys blh@nlm.nih.gov http://www.nlm.nih.gov. * UMLS and Metathesaurus are registered trademarks of the National Library of Medicine. Outline of Presentation.
E N D
The UMLS* Metathesaurus*: Lessons for Metadata Registries Betsy L. Humphreys blh@nlm.nih.gov http://www.nlm.nih.gov * UMLS and Metathesaurus are registered trademarks of the National Library of Medicine
Outline of Presentation • Brief overview -- NLM’s Unified Medical Language System (UMLS) Project and its products • Description of the UMLS Metathesaurus • content, construction methods, characteristics • Interspersed Metadata Questions/Issues
UMLS Purpose • Make it easy for health professionals and researchers to retrieve and integrate relevant information from disparate automated sources, e.g. • computer-based patient records • factual databanks • bibliographic databases and full-text • expert systems
UMLS Focus -- Conceptual Connections • Build knowledge sources that can be used by intelligent programs to overcome: • disparities in language used by different users and in different information sources; • difficulties in identifying which of many information sources is relevant
UMLS Knowledge Sources Multi-purpose tools or “intellectual middleware” for System Developers • Metathesaurus • SPECIALIST lexicon and lexical programs • Semantic Network
UMLS Knowledge Sources Distribution • Annual updates, 1990 - - • Free under license agreement with NLM • Need separate license agreements with vocabulary producers for some uses of some vocabularies in the Metathesaurus • Available to licensed users (~900) via Internet server and on CDs • Relational format (ASN.1 retired due to lack of use, XML being developed)
1999 UMLS Metathesaurus • 626,313 concepts • 1,134,413 “terms” (Eye, Eyes, eye = 1) • 1,358,891 “strings”/concept names • (Eye, Eyes, eye = 3) • ~50 source vocabularies
UMLS Metathesaurus • Concepts, terms, and attributes from many controlled “vocabularies” • New inter-source relationships, definitional information, use information • Scope determined by combined scope of source vocabularies
UMLS Source “Vocabularies” • Widely varying purposes, structures, properties, but all are in essence “sets of valid values” for data elements: • Thesauri, e.g., MeSH • Statistical Classifications, e.g., ICD • Billing Codes, e.g., CPT • Clinical coding systems, e.g., SNOMED • Lists of controlled terms, e.g., COSTAR, HL7 value sets
Metathesaurus Construction • Convert machine-readable vocabulary sources to UMLS “normal” form, making source semantics explicit • Merge, using source semantics and lexical processing techniques • Edit results, adding additional relationships and semantic information
$100,000 Metadata Questions • What constitutes “explicit semantics” for Metadata? • At a minimum interpretable by humans • Preferably interpretable by machines • How will the significant human effort required to create useful Metadata registries be organized and funded?
Metathesaurus Characteristics (1) • Concept organization • Many sources in a common database format • Representation of the meaning in each source vocabulary • Explicit tagging of each source vocabulary’s information
Metadata Question • What is the operational definition of synonymy in the realm of Metadata element names? • OR, When does a distinction make a difference in Metadata?
Metadata Question • Will the Metathesaurus approach to “multiple meanings” work for data element names? • E.g., Country • Country of Birth • Country of Residence • Country of Publication • REMINDER: different data elements can have the SAME set of valid values
Metadata Question • What level of explicit tagging is needed in Metadata Registries?
Metathesaurus Characteristics (2) • Added relationships between concepts and terms from different vocabularies • Added definitional and use information • “Context-free” unique identifiers • the concept “names” that never change • Normalized word and string indexes produced using UMLS lexical tools
Metadata Question • In the realm of Metadata, what requires unique, permanent, context-free identifiers?
Normalization -- example • disorder esophageal motility = normalized form of: • Esophageal Motility Disorders • Esophageal Motility Disorder • Motility Disorder, Esophageal • Disorder, Esophageal Motility
Metadata Questions • Are similar lexical resources needed as adjuncts to Metadata Registries? • Are the UMLS lexical tools directly useful for Metadata efforts?