1 / 16

An Information Retrieval Approach based on Discourse Type

NLDB 2006. An Information Retrieval Approach based on Discourse Type. Department of Computing The Hong Kong Polytechnic University 1 Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2 Department of Computer Science City University of New York.

anneke
Download Presentation

An Information Retrieval Approach based on Discourse Type

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLDB 2006 An Information Retrieval Approach based on Discourse Type Department of Computing The Hong Kong Polytechnic University 1Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2Department of Computer Science City University of New York D. Y. Wang, R. W. P. Luk, K.F. Wong1 and K.L. Kwok2 DY Wang @ 2006

  2. Content • Introduction • Motivation • Discourse Type • Information Unit • Problem Formulation • Score of topic terms • Score of discourse type • Document Re-ranking • Experimental Results • Conclusion DY Wang @ 2006

  3. Motivation • The effectiveness of information retrieval (IR) systems varies substantially from one topic to another. • One reason: Users’ Information need is very diverse • Our approach: finding the discourse type of the topic and adopt appropriate strategy DY Wang @ 2006

  4. Discourse Type • Definition of discourse type: The functions (including properties and relations that cannot exist independently) of the independent entities DY Wang @ 2006

  5. Performance Difference Average =0.2768 DY Wang @ 2006

  6. Why Choose “Advantage / Disadvantage” as our example? • Its performance is worse than the average • 0.204 v.s. 0.277 • It is relatively abstract and therefore it is unlikely to be investigated before. • Compared with concrete things (e.g. people, country) • It is related to some cue phrases (e.g., “more than”) that are composed of stop words. • Conventional IR ignores stop words DY Wang @ 2006

  7. Why Choose “Advantage / Disadvantage” as example? (cont.) • It is a popular discourse type of information need. • we found that there are at least 40 questions that are asking about advantages and disadvantages of something at a website (http://www.answerbag.com). • It has a reasonable amount (i.e., eight) of TREC topics for investigation • See next slide DY Wang @ 2006

  8. Eight Queries with discourse type Advantage / Disadvantage DY Wang @ 2006

  9. Information Unit (IU) w words w words t A document …………........................ term1........................ ……………............................................................. ……………................................... term2................. ……………...... term1.............................................. DY Wang @ 2006

  10. Why IU? • Assumption: terms inside an IU (around topic terms) are more important to relevance of document than the terms outside the IU • Simplify the processing of the documents • Compute score for each IU • Aggregate the scores of all IU as the score of the document DY Wang @ 2006

  11. Score of Topic Terms • sumtf = 4 • Dtf = 3 (d: distinct) Graph-based Model: • atS3 = 1/1+1/5+1/3 • atS4 = 1/5+1/3 1 5 3 DY Wang @ 2006

  12. Example: Score of Discourse Type • more (comparative words)=3 support=[' back ',' confirm ',' contest ',' contrari ',' defend ',' encourag ',' endors ',' object ',' oppon ',' oppos ',' opposit ',' prove ',' quibbl ',' refer ',' sponsor ',' support '] ( from www.answers.com ) • support=2 DY Wang @ 2006

  13. Documents Re-ranking • IU score before re-ranking: S0 • S0: similarity score of the document that contains the IU • IU re-ranking score S’ • S’= S0* score of topic terms • S’= S0 * score of discourse type • S’= S0 * score of topic term* score of discourse type • Aggregate the re-ranking score of all IUs in a document as the final score of the document. • Re-rank the documents by the final score. DY Wang @ 2006

  14. Re-ranking Results in MAP DY Wang @ 2006

  15. Conclusion • Re-ranking based on topic terms and discourse type can both improve the retrieval performance. • Combining above two can improve the results most significantly (at 95% confidence level, already considering the sample size). • This approach is promising and is worth further investigation. Acknowledgement: We thank the Center for Intelligent Information Retrieval, University of Massachusetts, for facilitating Robert Luk to develop the basic IR system, when he was on leave there. This work is supported by the CERG Project # PolyU 5226/05E. DY Wang @ 2006

  16. DY Wang @ 2006

More Related