1 / 29

Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search

Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search. Peter Clark John Thompson Lisbeth Duncan Heather Holmback Knowledge Systems Boeing, Mathematics and Computing Technology. Overview. Problem: searching for information in particular, for human experts Approach:

mcolburn
Download Presentation

Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search Peter Clark John Thompson Lisbeth Duncan Heather Holmback Knowledge Systems Boeing, Mathematics and Computing Technology

  2. Overview • Problem: searching for information • in particular, for human experts • Approach: • Search using concepts, not words • Use a thesaurus as the initial ontology • Enhance it using simple AI techniques • The Application: • Two deployed “Expert Locator” applications

  3. “tube placement” Query words Search Engine Human Experts Document repositories Web pages Databases ... Overall Picture

  4. Problems with word searches.. • Words have many senses (polysemy) • e.g. “plane” finds both airplanes and geometry • Many words mean the same thing (synonymy) • e.g. “tail fin” misses “vertical stabilizer” • Lack of world knowledge • e.g. “jet engine” misses “propulsion systems” Goal: organize search around concepts, not words  Need a conceptual vocabulary (“ontology”)

  5. The Ontology Bottleneck The Approach • Massive up-front cost to build an ontology • Use a technical thesaurus, enhanced with AI techniques • Boeing’s Thesaurus: • Highly customized to aerospace and Boeing • Massive knowledge repository • 37,000 concepts, 18,000 synonyms • 100,000 relationships (3 types) • Many person-years investment of effort

  6. A (tiny) fragment of the ontology... ignition lift Turbojet engines thrust Pneumatic equipment starting Propulsion systems engines Engine starters Jet engines Flame propagation Ramjet engines Hydrogen fuels flameout Flame stability combustion afterburning Combustion stability Burning rate spray Jet spray

  7. Converting Words to Concepts • Search word: “jet” ignition ? lift Turbojet engines thrust Pneumatic equipment starting Propulsion systems engines Engine starters Jet engines ? Flame propagation Ramjet engines Hydrogen fuels ? flameout Flame stability combustion afterburning Combustion stability Burning rate spray ? Jet spray

  8. Matching Query and Target Concepts • Semantic distance between “ignition” and “jet engines”? ignition lift Turbojet engines thrust Pneumatic equipment starting Propulsion systems engines Engine starters Jet engines Flame propagation Ramjet engines Hydrogen fuels flameout Flame stability combustion afterburning Combustion stability Burning rate spray Jet spray

  9. Expert Locator Demo (see end of this presentation for the demo in powerpoint form)

  10. generalization Engine related-to Space Shuttle Enhancing the Thesaurus:1. Increase connectivity using subsumption • 100,000 links are not enough! • 40% of concepts are “orphans” • But: Many concept names are phrases • Can add links by analyzing these phrases Space Shuttle Main Engine

  11. Engine Vehicle Engine Vehicle Vehicle Main Space Space Engine Space Vehicle Vehicle Main Engine Space Shuttle Engine Space Vehicle Main Engine Shuttle Space Shuttle Main Space Shuttle Subsumption Computation Algorithm 1. Compute all possible generalizations by “word chopping” and “word generalization”... Space Shuttle Main Engine

  12. Subsumption Computation Algorithm 2. Identify existing Thesaurus concepts and links within these Engine Vehicle Engine Vehicle Vehicle Main Space Space Engine Space Vehicle Vehicle Main Engine Space Shuttle Engine Space Vehicle Main Engine Shuttle Space Shuttle Main Engine Space Shuttle Main Space Shuttle

  13. Subsumption Computation Algorithm 3. Add missing connections to nearest existing concepts Engine Vehicle Engine Vehicle Vehicle Main Space Space Engine Space Vehicle Vehicle Main Engine Space Shuttle Engine Space Vehicle Main Engine Shuttle Space Shuttle Main Engine Space Shuttle Main Space Shuttle

  14. Some Example Inferred Links Halogen Compounds Equipment Measuring Instruments Fourine Compounds Distance Measuring Equipment Optical Measuring Instruments Nitrogen Fourine Compounds Fourides Range Finders Optical Range Finders Nitrogen Flourides • 21,000 generalization/specialization and 37,000 related-to links added • Number of “orphans” down from 40% to 13%

  15. made-of New: Metal Tube Metal Enhancing the Thesaurus:2. Use NLP to refine the “related-to” links related to Current: Metal Tube Metal • 27 relationship types chosen (causes, location, …) • heuristic noun-noun rules selects relationship, e.g For compound “X Y” (e.g. “metal tube”): IF X is a Material AND Y is a Physical-Object THEN Y made-of X • Can use relation type to help compute semantic distance

  16. NLP Movable attribute isa Flap isa: Airfoil attribute: Movable attached-to: Wing part-of: Airplane purpose: Increase object: Lift, Drag Airfoil purpose bt Increase Flap object rt Airplane purpose attached-to Drag part-of Increase Wing object Lift Enhancing the Thesaurus:3. Knowledge from Text Definition: “Flap: A movable airfoil attached to an airplane’s wing, and used to increase lift or drag.”

  17. Status and Evaluation • The Applications • Two “Expert Locators” deployed and in use • Sustained usage (~20 searches / day) • Plans to quickly expand them further • more experts • also cover projects and work groups • add in attribute filters (years at Boeing, location, …) • How do the Thesaurus Enhancements Affect Search? • Study: Expert assessed relevance of “hit” concepts • Recall increased (44%  75%) with only minimal effect on precision (58%  57%)

  18. Discussion • “Number N of links”  “relevance”? • only for very small N! • The useful bias of a domain-specific Thesaurus: • only contains relevant concepts • massively reduces errors in Thesaurus enhancement • only contains relevant links • provides very domain-specific search • Limitations: • ignored “quality” of expert, social issues, etc. • what if the concept you want isn’t there? • Generality: Applies to any resource, not just experts

  19. Summary • Search using concepts, not words • Use of a thesaurus as an initial ontology: • Can leverage many years of work by librarians • Made viable using simple AI techniques of • search • subsumption computation • language processing • Domain-specific thesauri provide valuable bias

  20. End - demo in PPT follows

More Related