1 / 1

20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005

Answer Pattern 1 Answer Pattern n. Esfinge on the Web http://www.linguateca.pt/Esfinge/. 20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005. General domain question answering system.

julius
Download Presentation

20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answer Pattern 1 Answer Pattern n Esfinge on the Web http://www.linguateca.pt/Esfinge/ 20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005 • General domain question answering system. • The starting point was the architecture described in Brill, Eric. ‘Processing Natural Language without Natural Language Processing’, in A. Gelbukh (ed.), CICLing 2003, LNCS 2588, Springer-Verlag Berlin Heidelberg, 2003, pp. 360-9: • Exploring the redundancy existent in the Web. • Exploring the fact that Portuguese is one of the most used languages in the Web. • Available on the Web. • Participation at CLEF 2004 and 2005. Two strategies were tested: • Searching the answers in the Web and using the CLEF document collection to confirm them (Strategy 1). • Searching the answers only in the CLEF document collection (Strategy 2). • Additional experiments using Strategy 1 were performed after error analysis and system debugging (Post-CLEF). Esfinge overview ? Question reformulation module Strategy 2 Strategy 1 Submition of answer patterns to Google Passage extraction from CLEF document collection Doc.s Found? Doc.s Found? No What was new at CLEF 2005? • Use of the named entity recognizer SIEMES (detection of humans, countries, settlements, geographical locations, dates and quantities). • List of not interesting websites (jokes, blogs, etc.) • Available Brazilian Portuguese document collection. • Use of the stemmer Lingua::PT::Stemmer for the generalization of search patterns. • Filtering of “undesired answers”. A list of these answers was built based on the logs of last year’s participation and tests performed afterwards. • Searching longer answers: the system does not stop when it finds an acceptable answer. Instead keeps searching for longer acceptable answers containing the latter. • Participation in the EN-PT multilingual task. • Correction of problems detected last year. Yes Yes No => Stem patterns Stemmed Pattern 1 Stemmed Pattern n Passages N-gram Harvesting Passage extraction from CLEF document collection Yes Luís Costa Luis.costa@sintef.no Linguateca / SINTEF ICT PB 124, Blindern NO-0314 Oslo, Norway http://www.linguateca.pt N-grams Doc.s Found? Q. pattern enables use of NER? No Esfinge’s performance No Yes => SIEMES NER Answer = NIL Filters (A+B+C+D) N-grams Filters (B+C+D) Any N-grams? No * Two further right answers were found after the official results were released. • The results in the runs using the Web (Strategy 1) were slightly better than the runs using only the CLEF document collection on both participations. • The results using Strategy 2 for the questions of type People and Date are better both comparing to the other types of questions and to the same type of questions using Strategy 1. This suggests that both strategies are still worthwhile to experiment and study further. • The analysis of the individual modules shows that the NER system helps the system mainly in the questions of type “People”, “Quantity” and “Date”, while the morphological analyser is more influential in the questions of type “Which X” , “Who was <HUMAN>” and “What is”. • The results show that Esfinge improved comparing to last year: the results are better both with this year’s and last year’s questions. Yes No Any N-grams? Answer = best scored N-gram Answer = NIL Yes Filters: A: Interesting PoS B: Answer contained in question C: Undesired answer D: Supporting document Answer = best scored N-gram

More Related