html5-img
1 / 14

A Discourse-based Information Retrieval Approach

A Discourse-based Information Retrieval Approach. Guided Study Presentation. WANG Da Yu 22 Dec 2005. Motivation. IR’s Task: to fill the two gaps: Between query and documents Between information need and query. Documents in collection. information need in mind. Query.

harlow
Download Presentation

A Discourse-based Information Retrieval Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Discourse-based Information Retrieval Approach Guided Study Presentation WANG Da Yu 22 Dec 2005

  2. Motivation • IR’s Task: to fill the two gaps: • Between query and documents • Between information need and query Documents in collection information need in mind Query

  3. Query and Documents • Assume no gap, use the query directly. • Different IR models: Boolean model, VSM, probability model, 2-Poisson model, language model • Query is not adequate • query expansion: PRF, a lot of methods for term selection… • Query uses different words with collection • Interactive retrieval

  4. Query and Information need • Shannon: information depends not only on message but also on receivers • Language model: words generated from mind • System-oriented and user-oriented relevance ( K. L. Maglaughlin and D. H. Sonnenwald 2002) • Between information space and cognitive space(G. B. Newby 2001)

  5. TREC Queries • Long and structured queries can present information need better than short and unstructured ones. • TREC ad hoc queries (T, D, N parts) • We assume that we can obtain the information need from TREC ad hoc queries • Study information need based on 250 TREC queries

  6. Concept of Discourse Discourse of information need: - Properties and relations that can not exist independently in the description of information need.

  7. Discourse Performance Average =0.2768

  8. 8 queries in the category of Advantage/Disadvantage Category Example

  9. Problems • Query need to contain “advantage” • Containing “advantage” as a query term is not enough because: • Not all text containing “advantage” talks about advantage. e.g. take advantage of • The text talking about advantage is not necessary to contain term “advantage” • E.g. “it contains no chemicals capable of triggering an adverse reaction from the body's immune system.”

  10. Observation I • the space-frame structure has the advantage ofbeing able to maintain rigidity despite the adoption of materials less rigid than steel • The major advantage is that it can be manufactured domestically. The disadvantage is that its capacity is too low and the total installed capacity is also too low. • They held that the disadvantage of the plan is that it cannot solve all the problems at one fell swoop; nor can it demonstrate the superiority of socialism.

  11. Observation II • Example text: The theoretical advantages of fusion are that it uses virtually inexhaustible raw materials (deuterium extracted from sea-water and tritium made inside the fusion reactor from the light metal lithium), it produces far less radioactive waste than fission and it is inherently safe because the reaction stops as soon as anything goes wrong.

  12. Between query and documents

  13. Concept of IU • Assumption: terms around topic terms are more important to relevance of document than other terms elsewhere. • Definition of IU: Given a set of topic terms and a document, information units (IU) are the sliding windows that have 2w+1 words and the (w+1)-th word is one of the topic terms. wwords w words t Seq No: 1, 2, …………… w w+1 w+2,………………2w+1

  14. Graph-base Model

More Related