1 / 10

Omega Ontology: Supporting Annotation

Omega Ontology: Supporting Annotation. Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI. http://omega.isi.edu. Omega content and framework. Goal : one environment for various ontologies and resources.

urbana
Download Presentation

Omega Ontology: Supporting Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Omega Ontology:Supporting Annotation Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI

  2. http://omega.isi.edu Omega content and framework Goal: one environment for various ontologies and resources • Concepts: 120,604 Concept/term entries [76 MB]: • WordNet (Princeton; Miller & Fellbaum) • Mikrokosmos (NMSU; Nirenburg et al.) • Penman Upper Model (ISI; Bateman et al.) • 25,000+ noun-noun compounds (ISI; Pantel) • Upper model: ResearchCYC, DOLCE, SUMO (not aligned) • Lexicon / sense space: • 156,142 English words; 33,822 Spanish words • 271,243 word senses • 13,000 frames of verb arg structure with case roles: • LCS case roles (Dorr) [6.3MB] • PropBank roleframes (Palmer et al.) [5.3MB] • Framenet roleframes (Fillmore et al.) [2.8MB] • WordNet verb frames (Fellbaum) [1.8MB] • Associated information (not all complete): • WordNet subj domains (Magnini & Cavaglia) [1.2 MB] • Various relations: learned from text (ISI; Pantel) • TAP domain groupings (Stanford; Guha) • SemCor term frequencies [7.5MB] • Topic signatures (Basque U; Agirre et al.) [2.7GB] • Instances [10.1 GB]: • 1.1 million persons harvested from text • 765,000 facts harvested from text • 5.7 million locations from USGS and NGA • Framework (over 28 million statements of concepts, relations, & instances): • Available in PowerLoom • Instances in RDF • With database/MYSQL • Online browser • Clustering software • Term and ontology alignment software

  3. Omega in annotation • Used as sense inventory for IAMTC and OntoBank projects • IAMTC: annotators choose directly from Omega; choose both Mikrokosmos and WordNet source tokens • OntoBank: PropBank procedure: expert creates set of senses; annotators choose sense(s) from set; we anchor senses into Omega • Lessons: • WordNet is too fine-grained, Mikro too coarse  need middle group (approx. 60K concepts in Middle Model) • PropBank sense groups based on frame roles, so may group together ontologically distinct meanings (same word, same frameset, diff meanings)  need procedure to recluster sense groups

  4. From words to concepts Lexical space • Words • Monolingual • “drive” • “steer” • “fahren” • “steuern” • “besturen” • “rijden” • “drijven” • … Sense space • Word senses • Multilingual • Drive1 • Drive2 • Drive3 • … • Manage Concept space • Concepts • Interlingual (?) ? • How many concepts? • How related to senses?

  5. From senses to concepts: Graduated refinement • Initialization: Given a term (word), collect several dozen sentences containing it. Also collect definitions from various dictionaries • Cluster the word’s senses into preliminary, loosely similar groups • Differentiation process: Begin a tree structure with all the groups at the root • Considering all the groups, identify the group most different from the others • If you can find one clearly most different group, write down its most important distinction explicitly — this will later become the differentium and be formalized axiomatically • If you cannot find any distinctions by which to further subdivide the group, stop elaborating this branch and continue with some other branch • If you can find several distinctions that subdivide the group in different, but equally valid, ways, also stop elaborating this branch and continue with some other branch • Create two new branches in the evolving tree structure, putting the new group under one, and leaving the other groups under the other • Repeat from step 4, considering separately the group(s) under each branch • Concept formation: When all branches have stopped, the ultimate result is a tree of increasingly fine-grained distinctions, which are explicitly listed at each branch point. Each leaf becomes a single concept, not further differentiable in the current task/application/domain. Each distinction must be formalized as an axiom that holds for the branch it is associated with • Insertion into ontology: Starting from the top, visit each branch point. Do the two branches have approximately the same meaning? • If so, insert them into the ontology at the appropriate point and stop traversing this branch • If not, split the tree and repeat step 8 separately for each branch. Repeat until done

  6. <move in desired direction> (1,2,3,4,5,6,8,9,10) <other: various> (7,11,12) <physical> (2,3,4,8,9,10) <psych> <profession> 7: “drive mad” 11: taxi <negotiate> <direct/steer> <propel> <motivate> 12: “drive a hard bargain” 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle <non-physical> (1,5,6) <financial> (5,6) <spiritual> 1: demons 5: business 6: dollar An exercise: “drive” “drive” (1,2…12)

  7. <move in desired direction> (1,2,3,4,5,6,8,9,10) <other: various> (7,11,12) <physical> (2,3,4,8,9,10) <‘unheal’> 7: “drive mad” <direct/steer> <propel> <motivate> 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle <non-physical> (1,5,6) <financial> (5,6) <heal> 1: demons 5: business 6: dollar Deeper semantic “drive” “drive” (1,2…12) <psych> <profession> 11: taxi <negotiate> 12: “drive a hard bargain”

  8. <profession> <psych> <physical> (2,3,4,8,9,10) 11: taxi <‘unheal’> <heal> 7: “drive mad” 1: demons <direct/steer> <propel> <motivate> 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle <non-physical> (1,5,6) <negotiate> 12: “drive a hard bargain” <financial-move> (5,6) 5: business 6: dollar Ontologizing “drive” <move in desired direction> (1,2,3,4,5,6,8,9,10)

  9. <move in desired direction> (1,2,3,4,5,6,8,9) <profession> <psych> <physical> (2,3,4,8,9,10) 11: taxi <‘unheal’> <heal> 1: demons 7: “drive mad” <direct/steer> <propel> <motivate> 2,3: car 10: bulldozer 4: legs 9: hammer 8: cattle <non-physical> (1,5,6) <negotiate> • Granularity: choose level • Generally fewer concepts than senses • Complex sense-concept mappings 12: “drive a hard bargain” <financial-move> (5,6) 5: business 6: dollar From words to concepts Sense space • Word senses • Multilingual • Drive1 • Drive2 • Drive3 • … Concept space • Concepts • Interlingual (?) Lexical space • Words • Monolingual • “drive” • “steer” • “fahren” • “rijden” • …

  10. Statistics • Not yet done on large scale • Tests show (when set of senses is given): • Approx. 20 mins needed to sense cluster and create major families (incl. difference axiom specification) • Approx. 20 mins to insert into ontology and verify axioms • Comparable to speed of sense creation in PropBank • Inter-human agreement increases with training (for simple adj clusters, over 80% agreement with little training) • Can apply CBC (high-quality mutual-information based clustering) to speed up process

More Related