1 / 19

Indexing and Retrieval Semantic Search

Indexing and Retrieval Semantic Search. Fatemeh Lashkari UNB University May 7 th 2014. Outline. Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance. Indexing. Inverted Index Sort-based inversion Single-pass in memory inversion HYB Index

roland
Download Presentation

Indexing and Retrieval Semantic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing and Retrieval Semantic Search FatemehLashkari UNB University May 7th 2014

  2. Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Index Maintenance

  3. Indexing • Inverted Index • Sort-based inversion • Single-pass in memory inversion • HYB Index • Prefix search • Autocompletion search • Expansion query and faceted search • Fast error tolerant search • Support ‘’select’’ and ‘’join’’ in database-style

  4. Outline Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance

  5. Semantic Search http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/ Query: “astronauts walk on moon”

  6. Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Index Maintenance

  7. Semantic Search Architecture Ontology Text Collection Answers of the question Indexing Query Process

  8. Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Parsing • Index Maintenance

  9. Parsing • Preprocessing • Stemming • Lower case General Motors general motors • Remove some of stop words • e.g is, do, a, of, .. • Annotation text • Annotators • Machine learning approaches

  10. Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Parsing • Index Structure • Index Maintenance

  11. Index Structure • The fast and efficient index does not • need the whole vocabulary of the indexed collection in main memory • need to sort postings • need merge postings • cache efficiently

  12. Outline • Indexing • Semantic Search • Semantic Search Architecture • Index Process • Parsing • Index Structure • Building Index • Index Maintenance

  13. Building Index (Tasks to Decide) • How many index do we need? • Index for relation • Index for text • What is the structure of vocabulary? • What is the structure of posting? • What are statistic information that a posting contains? e.g <docId, position, score, entity> apple: <6, 10, 0.3, class: fruit> <4, 2,0.9, class: company>

  14. Building Index (Tasks to Decide) • How to compute score to improve the final result? • How to save index? • Distribute index • Process query parallel • Which methods of compression can be used?

  15. Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Index Maintenance

  16. Index Maintenance • Strategies for maintaining index: • Merge-based (remerge) • In-place • Hybrid index update operation • Geometric partitioning

  17. Thank You

  18. Reference 1] Bast, Hannah, and MarjanCelikik. "Fast construction of the HYB index." ACM Transactions on Information Systems (TOIS) 29.3 (2011): 16. 2] Bast, Holger, and Ingmar Weber. "Type less, find more: fast autocompletion search with a succinct index." Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006 [3]Celikik, Marjan, and Hannah Bast. "Fast single-pass construction of a half-inverted index." String Processing and Information Retrieval. Springer Berlin Heidelberg, 2009. [4] Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003) [5]Celikik, Marjan, and HolgerBast. "Fast error-tolerant search on very large texts." Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009. [6] Bast, Holger, DebapriyoMajumdar, and Ingmar Weber. "Efficient interactive query expansion with complete search." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007.

  19. Reference [7] Bast, Hannah, et al. "A case for semantic full-text search." Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search. ACM, 2012. [8] Bast, Holger, et al. "ESTER: efficient search on text, entities, and relations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007. [9]Bast, Holger, Fabian Suchanek, and Ingmar Weber. "Semantic Full-Text Search with ESTER: Scalable, Easy, Fast." Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on. IEEE, 2008. [10] Bast, Hannah, et al. "Broccoli: Semantic full-text search at your fingertips." arXiv preprint arXiv:1207.2615 (2012). [11] Bast, Hannah, and Elmar Haussmann. "Open information extraction via contextual sentence decomposition." Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013. [12] Cheng, Tao, and Kevin Chen-Chuan Chang. "Beyond pages: supporting efficient, scalable entity search with dual-inversion index." Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010.

More Related