1 / 34

HyKSS: Hybrid Keyword and Semantic Search

HyKSS: Hybrid Keyword and Semantic Search. Andrew Zitzelberger. 1. Keyword Search. 2. Form Based Search. 3. What about?. over 8,000 meters in elevation. less than 100K miles. faster than 100 mph. 4. 5. HyKSS. Hy brid K eyword and S emantic S earch

raheem
Download Presentation

HyKSS: Hybrid Keyword and Semantic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1

  2. Keyword Search 2

  3. Form Based Search 3

  4. What about? over 8,000 meters in elevation less than 100K miles faster than 100 mph 4

  5. 5

  6. HyKSS • Hybrid Keyword and Semantic Search • Semantics – extracted annotations • Multiple ontologies • Keywords – text 6

  7. Thesis Statement • HyKSS (hybrid search) • Outperforms keyword and semantic search • Dynamic query weighting outperforms various other hybrid search approaches • Allows queries over multiple ontologies • Allows pay-as-you-go improvement 7

  8. Extraction Ontologies 8

  9. Data Frames 9

  10. Indexing Architecture Document Collection Keyword Indexer Semantic Indexer Keyword Index Semantic Index 10

  11. Document Collection Keyword Indexer Semantic Indexer Keyword Index Semantic Index Indexing Architecture Implementation Ontology Library Lucene OntoES Sesame 11 11

  12. Query Processing Free Form Query Keyword Processing Semantic Processing Pre-Process Query Pre-Process Query Execute Query Execute Query Post-Process Query Post-Process Query Combine Results 12

  13. Keyword Query Pre-Processing • Remove Lucene special characters (except quotes) • Remove (inequality) comparison constraints • Remove non-phrase stopwords hondas in "excellent condition" in oremfor under 12 grand hondas “excellent condition” orem 13

  14. Keyword Query Execution and Post-Processing • Executed by Lucene • Empty Post-Processing step 14

  15. Semantic Query Pre-ProcessingIndividual Ontology Scoring hondas in "excellent condition" in oremfor under 12 grand 15

  16. Semantic Query Pre-ProcessingOntology Set Creation • For each ontology sorted by score: • For each remaining ontology: • Add point for each new or subsuming match • If added points > 0 add ontology • Completely subsumed ontologies are removed during query generation 16

  17. Semantic Query Pre-ProcessingOntology Set Creation Vehicle Location Price < 12000 US_City=“orem” Vehicle Price < 12000 Vehicle_Score + 1 ContractualServices Location Contractual Services Price < 12000 US_City=“orem” ContractualServices_Score + 1 Vehicle_Score 17

  18. Semantic Query Pre-ProcessingStructured Query Generation • Open world assumption • SPARQL query 18

  19. Semantic Query Execution and Post-Processing • Sesame query execution • Semantic ranking: • 1 point for each requested projection satisfied • Normalized by # of projections requested hondas in "excellent condition" in oremfor under 12 grand • Projections on Make, Price and US_City 19

  20. Hybrid Query Processing • Linear interpolation: • (kw_weight * kw_score) + (sm_weight * sm_score) • Dynamic solution: • # keywords remaining (#kw) • concept match score (cms) = ½ * (selections + projections) • kw_weight = #kw/(#kw + cms) • sm_weight = cms/(#kw + cms) 20

  21. Basic Search 21

  22. Results Display 22

  23. Form Based Search 23

  24. Results Display

  25. Experimental Setup – Ontology Libraries • 5 Ontology Levels • Number • Generic Units • Vehicle Units • Vehicle • Vehicle+ 25

  26. Experimental Setup – Query Sets • 113 syntactically unique queries from database students • 60 syntactically unique queries from linguistic students 26

  27. Experimental Setup – Document Collection • 250 vehicle advertisements (Craigslist) • 100 training, 50 validation, 100 test • 318 mountain pages (Wikipedia) • 66 roller coaster (Wikipedia) • 88 video game advertisements (Craigslist) 27

  28. Experiments Training queries over test vehicle documents Test queries over test vehicle documents Training queries over test vehicle documents + additional noise Test queries over test vehicle documents + additional noise 5 queries over noisy data (Generic Units only) 28

  29. Experiments - Metric • Mean Average Precision 29

  30. Experimental Results 30

  31. Experimental Results 31

  32. Experimental Results 32

  33. Conclusions • Hybrid search outperforms keyword and semantic search • HyKSS’s dynamic query weighting approach outperforms various other weighting techniques • Using multiple does not outperform selecting and using a single ontology 33

  34. External Image Citations • Slide 2 Google search screenshot: http://www.google.com (07/30/11) • Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11) • Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11) • Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11) • Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11) • Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11) • Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11) 34

More Related