1 / 37

The Refined Semantic Network

The Refined Semantic Network. James Geller Yehoshua Perl New Jersey Institute of Technology. Sources. This presentation is based: on a pending proposal on Auditing and Extending the UMLS; [Gu et al., JAMIA 2000] and [MEDINFO YEARBOOK 2001]. Fundamental Observation.

gerard
Download Presentation

The Refined Semantic Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Refined Semantic Network James Geller Yehoshua Perl New Jersey Institute of Technology

  2. Sources This presentation is based: • on a pending proposal on Auditing and Extending the UMLS; • [Gu et al., JAMIA 2000] and [MEDINFO YEARBOOK 2001]

  3. Fundamental Observation • The UMLS requires that there is an assignment of one or several Semantic Types to each concept. • This assignment provides semantics for concepts. • We call the set of all concepts to which a Semantic Type S has been assigned the Extent of S.

  4. Problems (1) • Assigning Semantic Types to new concepts is a complex manual task due to complexity, ambiguity and homonymy of medical concepts. • Categorization is highly dependent on an editor’s specialty, background and priorities and thus not fully predictable.

  5. Problems (2) • The extents of most Semantic Types are not uniform. If we look at the extent of a Semantic Type it may contain concepts with “second” assignments that are different from each other. • A desire was expressed to make the SN deeper [McCray and Nelson 1995].

  6. Example (concrete) • We are looking at all the 61 concepts to which the Semantic Type Environmental Effect of Humans (henceforth EEH) has been assigned (I.e., the extent of EEH).

  7. Are these concepts similar? • Are Classroom environment, Sanitation problem, Acid rain, and Industrial waste really similar to each other? • ONLY EEH is assigned to 54 of these 61 concepts. • To 7 other concepts combinations of 3 additional Semantic Types are assigned. That’s why we say that the extent of EEH is not uniform.

  8. Concepts of non-uniform semantics • EEH & Finding: 2 concepts: Poor Sanitation, Sanitation Problem • EEH & Hazardous or Poisonous Substance: 4 concepts: Acid rain, Radioactive fallout, Radioactive waste, Smoke • EEH & Manufactured Object & Hazardous or Poisonous Substance: 1 concept: Industrial waste

  9. Abstract Example with 4 Semantic Types. Boxes show extents. g, h W X a, b, d, e, f, g Y b, g Z c, d, e, g

  10. Problem even in simple case • One has to look into all boxes to see if a concept occurs in them or not. • Definition: Intersection of two extents: The intersection of two extents contains all and only the concepts that occur in BOTH extents. We will use the symbol & for it. • Example: Intersection of [a, b, c] & [b, c, d] --> [b, c]

  11. Example of Venn Diagram for X, Y, Z, W X a f Z g b d e c Y h W

  12. Our Solution to all 3 Problems • Identify all existing intersections. • Display every concept exactly once, in its pure “original box” or in a new “box” that corresponds to a unique intersection.

  13. Original Semantic Types (non uniform Semantics) W g, h X a, b, d, e, f, g Y b, g Z c, d, e, g PURE Semantic Types (simple semantics) Intersection Types (Compound Semantics) W h X a, f W & X & Y & Z g Y b X & Y Z c X & Z d, e

  14. Intersection Types • Intersection Types are “new” Semantic Types that are constructed by intersection of the extents of their component Semantic Types. • The “names” of Intersection Types are constructed by chaining the names of their component Semantic Types together with &-signs.

  15. Semantic Refinement • We call the process of constructing all necessary Pure Semantic Types and Intersection Types Semantic Refinement. • Concepts are reassigned, so that every concept occurs only in one extent. • After Semantic Refinement, every Semantic Type has a uniform extent.

  16. Pure Semantic Types have extents of simple concepts. (They are uniform.) • Intersection Types have extents of compound concepts. (They are also uniform!)

  17. Advantages • Extents of pure and intersection types now have a uniform semantics. That means, every extent contains concepts that are highly similar. • Small sets of concepts are easy to review and also more suspicious. • It is easier to see a concept that “does not belong” to a small set or even that a concept is “missing.”

  18. What does this have to do with the Semantic Network? • Every intersection type S of types X, Y, Z,… should be added to the Semantic Network as follows. • S is made a child of several appropriate Semantic Types. We allow multiple parents. • This is the Refined Semantic Network: RSN

  19. W Y X Z X & Y X & Z W & X & Y & Z The Refined Semantic Network of the Semantic Types W, X, Y, Z

  20. Subnetwork of SN with EEH Intersections and all their ancestors Thing Event Entity Phenomenon or Process Conceptual Entity Physical Object Finding Manufactured Object Substance Human-Caused Phenomenon or Process Chemical EEH Chemical Viewed Functionally Hazardous or Poisonous Substance EEH & Finding EEH & Haz. or Poi.. Sub. Manu. Obj. & Haz. or Poi.. Sub. EEH & Manu. Obj. & Haz. or Poi.. Sub.

  21. The RSN supports auditing. • Auditing has helped us find mistakes in the UMLS. • Removal of mistakes typically leads to simplifications of the UMLS and of the RSN itself, by removing wrong intersections.

  22. EEH Auditing Example • The intersection of the extents of three Semantic Types EEH and Manufactured Object and Hazardous or Poisonous Substance contained only one concept: Industrial Waste • Industrial smog and Factory smoke are notconsidered Manufactured Objects, and our audit suggested that Industrial Waste should not be one either.

  23. More strange intersections • We found concepts belonging to both Human-caused phenomenon or process and Manufactured object. • It is out of the question that something is at the same time a process and an object. • By creating the RSN we found this. • It was caused by homonyms. E.g. Video recording as the process and as its result.

  24. Wrong Categorizations • By reviewing the pure semantic types and intersection types we found various errors. • Drinking water problem and PBC Airborne level are missing a Finding assignment. • Smoke is assigned Hazardous or Poisonous Substance, but its subconcepts Factory smoke and Second hand smoke are missing such an assignment.

  25. Classroom Environment and College Environment should not be assigned EEH at all. • These and other errors were exposed by review of the extents, which should be semantically uniform. • After correcting these errors, the concepts of EEH look very different.

  26. Venn Diagrams before/after audit EEH EEH Finding 54 40 5 4 2 10 4 1 3 3 Substance Hazardous or Poisonous Substance Manufactured Object Hazardous or Poisonous Substance Finding Manufac-tured Object

  27. Revised Subnetwork of SN with EEH Intersections and all their ancestors Thing Event Entity Phenomenon or Process Conceptual Entity Physical Object Finding Manufactured Object Substance Human-Caused Phenomenon or Process Chemical EEH Chemical Viewed Functionally Hazardous or Poisonous Substance EEH & Finding EEH & Substance EEH & Haz. or Poi.. Sub. Manu. Obj. & Haz. or Poi.. Sub.

  28. Exclusive Semantic Types • We found 143 concepts that are classified as both Organic Chemical and Inorganic Chemical!! • Of those, 82 are assigned to additional semantic types.

  29. Redundant Categorizations • Many concepts are assigned to a Semantic Type S and the parent or ancestor T of S. This is a redundant categorization, a no-no. [McCray and Nelson, 1995][Peng et al., AMIA 2002] • Sample in1998: Desertification was assigned EEH and also PHENOMENON OR PROCESS, a redundant categorization. It was removed after our report.

  30. Auditing simplifies the RSN • After correcting the assignments of those 143 (“organic”) concepts, 13 invalid Intersection Types disappeared. • The RSN becomes simpler, as it has fewer Intersection Types. • In a sample of 100 intersections with only one concept, only 15 were deemed legal. [Gu, JAMIA 2000]

  31. Renaming Intersection Types • Instead of Environmental Effect of Humans&Hazardous and Poisonous Substancewe rather have a designer rename it into: Environmentally Hazardous or Poisonous Substance. • An intersection of Body Part&Manufactured Object is a Prosthesis.

  32. Subnetwork with simplified names Thing Event Entity Phenomenon or Process Conceptual Entity Physical Object Finding Manufactured Object Substance Human-Caused Phenomenon or Process Chemical EEH Chemical Viewed Functionally Hazardous or Poisonous Substance Environmental Finding Human- produced Environmental Substance Environmentally Hazardous or Poisonous Substance Manufactured Hazardous or Poisonous Substance

  33. Overall Results for UMLS 1998 Level Number of Pure Semantic Types at Level Number of Intersection Types 1 1 0 2 2 0 3 4 0 4 20 0 5 41 56 6 23 203 7 23 163 8 17 187 9 2 234 10 0 212 11 0 89 12 0 16 13 0 3 14 0 1

  34. Concept Distribution UMLS 98 Number of concepts/intersection type How many intersection types with so many concepts 1 421 } Many of these will 2 147 } disappear 3 102 } [GU, AIM 2004] 4 65 5 35 6 41 7 32 8 15 9 13 …. 3947 1 4582 1 6705 1 19349 1 41564 1

  35. Streamlining Categorizations • Currently several UMLS EDITORs may assign to new concepts any combination of semantic types. Even combinations that don’t make sense. • We propose that concepts may be assigned only existing pure and intersection types. • If a new intersection type is desired it has to be “approved” by the NLM.

  36. Summary (1) • We propose to change the SN as follows: • Allow a DAG structure, to enable intersections (with multiple parents) • Create the “lower half” of the RSN by our method of Semantic Refinement. That takes care of deepening! • Use various auditing techniques to eliminate all wrong intersections.

  37. Summary (2) • Rename legitimate intersection types. • The RSN limits the choices of a UMLS editor to reasonable intersections. • This will prevent future UMLS mistakes. • The RSN streamlines categorization, making it more accurate and easier.

More Related