1 / 28

Conceptual structures in modern information retrieval

Conceptual structures in modern information retrieval. Claudio Carpineto Fondazione Ugo Bordoni Roma carpinet@fub.it. Overview. Keyword-based IR and early conceptual approaches Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Research at FUB

Download Presentation

Conceptual structures in modern information retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Roma carpinet@fub.it

  2. Overview • Keyword-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions

  3. Documents Query Vectors of weighted keywords Vector of weighted keywords Matching Retrieved documents Vector-based IR

  4. Term weighting • tf.idf and vector space model (Salton) very popular • in70’s and 80’s • BM25 (Robertson) has been the state of the art • in the 90’s • Several recent term-weighting functions based on • statistical language modeling (Ponte, Lafferty) • A new weighting framework based on deviation • from randomness + information gain (FUB + UG)

  5. Inherent limitations of keyword-based IR • Vocabulary problem • Relations are ignored

  6. Early approaches to conceptual IR • n-grams(Salton 1975, Maarek 1989) • parse tree(Dillon 1983, Metzler 1989) • case relations(Fillmore 1968, Somers 1987) • conceptualgraphs(Dick 1991)

  7. Why early conceptual IR not successful • No best representation scheme • Manual coding too costly • Automated coding too hard • Training required both for the indexer and the user • Effectiveness not clearly demonstrated • Retrieval task often not appropriate

  8. Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions

  9. Evolution of topical IR • Very short queries • Heterogeneous collections • Unreliable sources • Interactive sessions

  10. Docs Query Context Indexing Indexing Ranking Visualization Interaction Use Model of modern topical IR

  11. Performance of retrieval feedback versus query difficulty

  12. Ranking based on interdocument similarity • Cluster hypothesis (van Rijsbergen 1978) • Approaches • - Matching the query against document clusters (Willet 1988) • - Matching the query against transformed document • representations (GVSM, Wong 1987, LSI, Deerwester 1990) • Computing the conceptual distance between query and • documents (Order-theoretical ranking, Carpineto 2000)

  13. 4 KBS 3 1 1 CREDIT 3 KBS BANK FINANCE NNS (D5) 2 NNS 0 4 FINANCE 2 BANK FINANCE CREDIT NN S KBS WATERS KBS BANK (Query) (D6) (D4) 2 3 NNS NNS BANK BANK RIVER ACCOUNT (D2) (D3) 1 1 NNS NNS FINANCE FINANCE CREDIT BANK KBS ACCOUNT (D7) (D1) Order-theoretical ranking

  14. Performance of order-theoretical ranking • Better than hierarchic clustering and comparable to • best matching on the whole collection • Markedly better than both hierarchic clustering and • best matching on non-matching relevant documents • Order-theoretical ranking does not scale up well but • it is synergistic with best matching document ranking

  15. Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions

  16. Question Answering Task: Closed-class questions in unrestricted domains with no guarantee of answer and result possibly scattered over multiple documents

  17. Question Answering • Approach: • Recognize type of queries • Retrieve relevant documents • Find sought entities near question words • Fall back to best-matching passage • retrieval in case of failure

  18. Web Information Retrieval

  19. Web Information Retrieval Current tasks: named-entity finding task topic distillation task • Approach: • Use of multiple methods • Combination of results via interpolation and • normalization schemes

  20. XML document retrieval Goal: Use document structure to improve precision and recall of unstructured queries “concerts this weekend at Sofia under 20 euros” • Approaches: • Automatic inference of query structure • Semi-automatic query annotation • Hybrid query languages

  21. Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions

  22. Recommender systems “Related keyword” feature versus Context-dependent query reformulation

  23. Combining text retrieval and text mining with concept lattices Goal Integration of multiple search strategies (querying, browsing, thesaurus climbing, bounding) into a unique Webinterface

  24. Conclusions The use of conceptual structures surfaces in traditional topic relevance retrieval and it is at the heart of many non-topical retrieval tasks Towards conceptual search • Understand term meaning • Adapt to the user • Can translate between applications • Explainable • Capable of filtering and summarization

More Related