1 / 59

Building Ontologies Automatically Theory and Demonstration

Building Ontologies Automatically Theory and Demonstration. Dan Moldovan Human Language Technology Research Institute University of Texas at Dallas. Outline. Introduction to Ontologies Automatic Ontology Building Applications OWL/RDF Representation Jaguar-Jager Demo CHiPS Demo. Ontology.

dgardner
Download Presentation

Building Ontologies Automatically Theory and Demonstration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Ontologies AutomaticallyTheory and Demonstration Dan Moldovan Human Language Technology Research Institute University of Texas at Dallas

  2. Outline • Introduction to Ontologies • Automatic Ontology Building • Applications • OWL/RDF Representation • Jaguar-Jager Demo • CHiPS Demo ABBYY - 2012

  3. Ontology • An ontology is an organization of concepts and semantic relations within a given domain • Ontologies explicitly represent knowledge about domains of interest; i.e. what concepts are important and how do they relate to each other • Ontologies serve as the backbone of semantic technologies and applications • Ontologies can help users achieve an unified understanding of concepts • Ontologies facilitate dealing with acronyms • Ontologies can be used as interchange formats to enable common access to data ABBYY - 2012

  4. Ontology • Ontologies facilitate exchange of knowledge between machines and between people and machines • Ontologies allow easier visualization of documents; i.e. which concepts are important and how far semantically they are • Once an ontology is created, it can be used to tag new texts to enable better retrieval and further processing [this is the idea of the semantic web] • Ontologies help browsing, searching and question answering; it is possible to understand questions and provide semantic connections between question concepts and text words ABBYY - 2012

  5. Ontologies for Question Answering • QP: determine the expected answer type and select the keywords used to retrieve relevant passages • Question classification • Answer type detection • PR: retrieve and rank passages that are relevant to the input question • Query formulation • Keyword expansion • AP: extract an exact answer by evaluating all answer candidates • Answer surface form • Answer redundancy ABBYY - 2012

  6. Manual ontology creation Time consuming Error prone Requires subject matter experts The end product is difficult to maintain Hard to cope with the rapidly changing and vast amount of information available for a domain Automatic/Semi-automatic ontology generation Leverage existing domain models to seed the process of extracting semantically rich ontologies from unstructured text Automatically update the ontology when new documents are made available or the domain model changes Communicate ontology content across multiple applications using OWL/RDF as the common interchange format Allow the user to easily review, update, and maintain the ontology Customize ontology relations using semantic calculus and/or user defined rules How to Create an Ontology? ABBYY - 2012

  7. Ontologies for Question Answering • QA system integrated with an automatic ontology building system ABBYY - 2012

  8. Outline • Introduction to Ontologies • Ontologies for Question Answering • Automatic Ontology Building • Applications • OWL/RDF Representation • Jaguar- Jager Demo • CHiPS Demo ABBYY - 2012

  9. Knowledge Acquisition from Text • KAT: automatically builds ontologies and knowledge bases (KBs) from concepts and semantic relationships found in text • Constituents of an ontology/KB • Concepts/Vocabulary • Key domain concepts (often missing from general purpose machine-readable dictionaries, e.g., WordNet) • “weapon”, “WMD”, “launcher” • Relationsbetween ontological concepts • “anthrax” ISA “biological weapon”, “anthrax” CAUSE “death” • Organization of Relations • Hierarchical (universally true transitive relations, e.g. ISA, PART-WHOLE) • Contextual(text-conveying relations identified by a semantic parser) ABBYY - 2012

  10. Universal (or ontological) Represented in hierarchies Simple binary relations between concepts “Chemical weapons such as nerve gas, …” Contextual Represented in individual (semantic) contexts Groups of relations centered on a common concept “The forces launched a full-scale attack on Monday” Types of Knowledge ABBYY - 2012

  11. Knowledge Base Constituents ABBYY - 2012

  12. Knowledge Acquisition from Text • Functionality • Produce ontologies • Link concepts and relations to text • Visualize ontology • Edit ontology • Enhance an existing ontology • Merge two ontologies into a consistent ontology ABBYY - 2012

  13. Automatically Building Ontologies • Ontology/KB creation • Knowledge extraction from text • Pattern recognition; semantic parsing • Knowledge representation and storage • Contextual vs. universal • XML; relational database • Knowledge base maintenance • Conflict resolution • Ontology mapping; ontology merging • User interaction; ontology modification ABBYY - 2012

  14. KAT Modules – Text Processing • Input: Documents, Seeds • Extract “concepts” of interest • Extract binary relations (universal) • Use semantic parser to obtain contextual knowledge • Output: Concepts, Contexts, Binary Relations • The rebels had access to chemical weapons, such as nerve gas and other poisonous gases. ABBYY - 2012

  15. Text Processing • Candidate concepts: NPs that contain seed concepts (e.g., <modifier> <seed_word>) and NPs semantically linked to seed concepts • Concept selection: discard candidates that match certain criteria( e.g. <modifier_descriptive_adjective> <seed_word> • Seed enrichment: enhance the current set of seeds with Step 2’s domain concepts and return to Step 1 • Relation selection: collect all semantic relations that link domain concepts with other concepts (in- or out-of-the- domain). The relations between domain concepts will become part of the ontology. ABBYY - 2012

  16. Semantic Relations Stored in KB ABBYY - 2012

  17. Semantic Relations Stored in KB ABBYY - 2012

  18. Examples of Semantic Relations in text Semantic Relations are the interconnections between words or concepts that define the meaning of text. They are used as elements of knowledge bases. Example: John went to the park yesterday because he saw hot air balloons taking off from there Agent At-Time At-Location John went to the park yesterday because Cause Part - Whole ISA Value he saw balloons taking off from there hot air At-Location Stimulus Experiencer Experience ABBYY - 2012

  19. Semantic Parser • Various syntactic patterns: verb-argument, complex nominals, genitives, adjectival phrases/clauses, etc. • Semantic restrictions on relation arguments R(x,y) • Domain and range restrictions defined using an ontology of sorts • KINSHIP: [AnimateConcreteObject]  [AnimateConcreteObject] • Filter relations that cannot exist between certain arguments ABBYY - 2012

  20. Semantic Parser • Bracketer – determine semantic dependencies between compound nouns with three or more nouns • Sugar industry analyst vs. Female industry analyst • Argument detection – identify argument pairs likely to encode a semantic relation based on lexico-syntactic patterns • Domain and range filtering – filter candidate arguments based on their semantic classes and relation definitions • Feature extraction – extract features corresponding to each pattern • Semantic class of modifier noun, syntactic path, voice, etc. • Machine learning classifiers – per-relation and per-pattern approaches • Support vector machines, Decision trees, Naïve Bayes, Semantic Scattering • Conflict resolution – resolve relation conflicts between classifiers ABBYY - 2012

  21. KAT Modules – Classification/Hierarchy Creation • Input: Concepts, Binary Relations • Classify each concept against every other using defined procedures, obtaining set of ISA relations • Add all ISA and other binary relations to the hierarchy using conflict resolution • Output: Hierarchy of relations • “Scud missile” ISA “missile” • “Iraqi standing_army” ISA “Asian army” • “weapons inspection team” ISA “inspection team” ABBYY - 2012

  22. Subsumption used for Knowledge Classification Proposition Let C = A1 ⊓ ⋯⊓ Am ⊓ ∀R1.C1 ⊓ ⋯ ⊓ ∀Rn.Cn be the normal form of theconcept description C, and D = B1 ⊓ ⋯ ⊓ Bk ⊓ ∀S1.D1 ⊓ ⋯ ⊓ ∀Sl.Dl be the normal form concept description D. Then C ⊑D iff both conditions hold. • For all i, 1 ≤ i ≤ k, there exists j, 1 ≤ j ≤ m such that Bi= Aj • For all i, 1 ≤ i ≤ l, there existsj, 1 ≤ j ≤ n such that Si = Rj and Cj ⊑Di This formulation of subsumption is • Sound (the “if” part holds) • Complete (the “only if” part holds) Algorithm has a polynomial complexity. ABBYY - 2012

  23. Classification/Hierarchy Creation • Classification procedures • For domain concepts modifier1 head1 and modifier2 head2, create • If ISA(modifier1,modifier2) and ISA(head1,head2), then ISA(modifier1 head1, modifier2 head2) • Japan discount rate ISA Asian country interest rate • If ISA(modifier1,modifier2) and SYNONYMY(head1,head2), then ISA(modifier1 head1, modifier2 head2) • Japan discount rate ISA Asian country discount rate • If SYNONYMY(modifier1,modifier2) and ISA(head1,head2), then ISA(modifier1 head1, modifier2 head2) • Japan discount rate ISA Japan interest rate • If SYNONYMY(modifier1,modifier2) and SYNONYMY(head1,head2), then SYNONYMY(modifier1 head1, modifier2 head2) ABBYY - 2012

  24. Classification/Hierarchy Creation • Classification procedures • For domain concepts modifier head and head, create ISA(modifier head, head) relation • nontaxable dividends ISA dividends • For domain concepts modifier1 modifier2 head, create • If modifier1 head exists, then ISA(modifier1 modifier2 head, modifier1 head) • nuclear weapon testing ISA nuclear testing • If modifier2 head exists, then ISA(modifier1 modifier2 head, modifier2 head) • nuclear weapon testing ISA weapon testing ABBYY - 2012

  25. Classification/Hierarchy Creation • Textual entailment for concept subsumption • monetary policy ? fiscal policy ISA economic policy ISA policy (WordNet hierarchy) ABBYY - 2012

  26. Domain Ontology/KB Creation - Example ABBYY - 2012

  27. Domain Ontology/KB Creation - Example ABBYY - 2012

  28. “Our Balancing Act” • Quantity • Making sure that the available information is actually extracted • Beauty • Making sure that the ontology concepts are real concepts, not just sentence fragments • Relevance • Not including every concept mentioned in a sentence ABBYY - 2012

  29. “Striking the Balance” • Tuning text exploration aggressiveness • Pruning sentence phrases down to the “real concept” • Filtering out “ugly” sentence fragments • Handling conjunctions • “Tom and Bill” went to “Dallas and Fort Worth” • “Hank or Susan” went to “Chicago or New York” ABBYY - 2012

  30. Ontology - Example • International Economics Ontology • Document collection: International Economics Book • 2.8 MB of plain text • Seed ontology: economics reference taxonomy • 558 seed concepts, e.g. aggregate demand, ATC curve, budget deficit, commodity money, etc. • 791 semantic relations • 5,678 ontological concepts • 13,878 semantic relations • AGENT, CAUSE, INFLUENCE, INSTRUMENT, ISA, AT-LOCATION, MAKE-PRODUCE, MANNER, PROPERTY, PURPOSE, PART-WHOLE, QUANTITY, SYNONYMY, THEME, AT-TIME, VALUE ABBYY - 2012

  31. KAT Modules – Knowledge Base Maintenance • Knowledge base merging • Visualization • Knowledge base editing • User interaction • Modifications ABBYY - 2012

  32. Knowledge Base Maintenance • New concept integration: concepts and relations extracted from incoming documents are added to the existing ontology • Establish a mapping between the new set of concepts/relations and the existing ontology • Add non-mapped concepts and relations to the ontology • Ontology mapping: identify a set of rules that link concepts from one ontology to analogous concepts (in another ontology) • Calculate semantic similarity of concepts • Similarity between the semantic models of concepts • Degree of textual entailment between the concepts’ glosses • Concept label-based similarity • Calculate semantic similarity of relations • Function of their arguments’ similarity degree ABBYY - 2012

  33. Knowledge Base Maintenance • Ontology merging: create a new ontology by combining information from two or more ontologies • Map the ontologies (two at a time) • Combine domain concepts (use a single copy for mapped concepts) • Merge the relation sets of mapped concepts • Conflict resolution algorithm • Re-classify the new set of ontological concepts • Classification/hierarchy creation procedures ABBYY - 2012

  34. Conflict Resolution • Approach used – prevention • Start from an empty hierarchy and an input relation set • Add a relation from the input set to the hierarchy, if • It does not form a cycle • It is not redundant (does not duplicate a path) • Remove jump links • Properties of hierarchical relations • Transitive • If R(A,B) and R(B,C), then R(A,C) • ISA(cat,mammal) and ISA(mammal,animal)  ISA(cat,animal) • Strictly non-symmetric • If R(A,B), then NOT R(B,A) • ISA(cat,mammal)  ¬ISA(mammal,cat) ABBYY - 2012

  35. Inconsistencies Simple loops Cycles Redundancies Duplicate relations Jump links Types of Conflict ABBYY - 2012

  36. Jump Links • Multiple paths from one node to another are acceptable • As long as no single link duplicates a path • Jump link removal • When it is safe to add R(A,B), remove links from direct descendents of B to B, if they have a path to A ABBYY - 2012

  37. Do fewer links mean fewer knowledge? • Number of links: 4 • Assertions • a  b • a  c • b  d • c  d • a  d • Number of links: 3 • Assertions • a  b • b  c • c  d • a  c • b  d • a  d ABBYY - 2012

  38. Ontology Merging - Example ABBYY - 2012

  39. Domain Ontology/KB Evaluation • Compare KAT’s automatically generated ontologies against gold annotations • Evaluation focuses on • Lexical level • Vocabulary/data layer level • Other semantic relations level • Viewing an ontology as a set of semantic relations between two concepts, the human annotators: • Labeled an entry correctif the concepts and the semantic relation are correctly detected by the system, else marked the entry as incorrect • Labeled a correctentry as irrelevantif any of the concepts or the semantic relation are irrelevant to the domain • Added new entries for concepts and semantic relations omitted by KAT (from input documents) ABBYY - 2012

  40. Ontology/KB Evaluation - Metrics • NK(*) gives the counts from KAT’s output • NG(*) correspond to counts from gold annotations ABBYY - 2012

  41. Domain Ontology/KB Evaluation - Results ABBYY - 2012

  42. Jager™: Ontology Visualization and Editing • Web application - scalable, multi-user visualization and editing of KAT’s ontologies/KBs • Based on the Django framework and written in a mix of Python, HTML and Javascript • Jager (pronounced yeager) is a corruption of the German word Jäger (hunter) • Capabilities • Jager admin tool • Import/Export/Delete/Trim ontology • Compare two ontologies • Edit ontology name • For a given ontology • Edit/Delete/Insert concept/semantic relation ABBYY - 2012

  43. Jager™: Ontology Visualization and Editing ABBYY - 2012

  44. Outline • Introduction to Ontologies • Ontologies for Question Answering • Automatic Ontology Building • Applications • OWL/RDF Representation • Jaguar – Jager Demo • CHiPS Demo ABBYY - 2012

  45. Collaborative High Precision Search • CHiPS™: ontology-guided search • More powerful than keyword search • Search from the perspective of a given ontology • Document matching • Semantic profiles are generated for documents based on a given ontology • Ontology concepts are identified in the text • Each identified concept is assigned a weight • Semantic profile matching • Semantic profiles for each document in a repository are generated in advance • Semantic profile for input search text is generated on the fly • Search algorithm finds a list of repository documents whose profiles most closely match that of the input search text profile ABBYY - 2012

  46. CHiPS™ Architecture ABBYY - 2012

  47. Document Similarity • Possible applications in medical domain • For diagnosis – patient data vs medical knowledge • For research – text snippet vs Medline • Match decision rules to KB • Others • Approaches • Statistical approaches: Latent Dirichlet Allocation, Pachinko Allocation, others • Semantic approaches: • Event based • Ontology based – outlined here • Others ABBYY - 2012

  48. Sample Search • Search: The patient’s eye pain was associated with the surgical procedure and poly-L-lactic acid • Result: She describes this area as looking like a "bug bite" & was located "on top of" (above) gortex implant, near the lateral canthus. Its shape is round about one-fourth inch in diameter w/a rise w/a peak "maybe" one-eighth of an inch in height total. She said her phys has treated the "bug bite" area w/an unknown type of steroid injection, w/o effect. He now wants to remove this surgically, however, she is not certain if she wants this done. She noted that she did not massage for first week, as had no instruction to do so; she also had lid lift surgery at the time (of the face lift,) & surgeon did not want any pressure on surgical site. She reported her concomitant medications as estradiol, gabapentin (neurontin), for trigeminal neuralgia & facial non-specific neuralgia; also a multivitamin. Add'l medical history included trigeminal neuralgia & facial non-specific neuralgia both following the accident. No further medical info reported. Add'l info for sculptra from ptc report case (b)(4) dated (b)(6)2008, received by (b)(6) on 25mar08: b/c no lot # is available, an investigation has been performed on the documentation of all potentially involved manufactured batches. The review of the device history reports & of the analytical results of these batches did not show any anomaly that could be related to the event which occurred. • Repository: Manufacturer and User Facility Device Experience (MAUDE) ABBYY - 2012

  49. Medical Subject Headings (MeSH) controlled vocabulary Encyclopedic knowledge Sample Search – Supporting Ontologies ABBYY - 2012

  50. CHiPS™ Demo • Hybrid MeSH-MedRA ontology • NIH Medical Subject Headings (MeSH) taxonomy • http://www.nlm.nih.gov/mesh/ • Medical Dictionary for Regulatory Activities (MedRA) • http://www.meddramsso.com/ • 29,302 concepts • 38,828 semantic relations (ISA) • Document repositories • FDA MAUDE document repository • Manufacturer And User facility Device Experience • Database of adverse medical events • http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm • NIH MEDLINE document repository • journal citations and abstracts for biomedical literature from around the world • http://www.nlm.nih.gov/bsd/pmresources.html ABBYY - 2012

More Related