LING 573: Deliverable 3
Download
1 / 13

LING 573: Deliverable 3 - PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on

LING 573: Deliverable 3. Group 7 Ryan Cross Justin Kauhl Megan Schneider. The Basics. Implemented in Python with Indri For document retrieval used standard #combine (“query”) operator #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' LING 573: Deliverable 3' - stamos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

LING 573: Deliverable 3

Group 7

Ryan Cross

Justin Kauhl

Megan Schneider


The Basics

  • Implemented in Python with Indri

    • For document retrieval used standard #combine (“query”) operator

      • #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n)

    • Used passage#:# to get windows for passage retrieval (100:50, 150:50, 150:75, also 150:10, 150:15, and longer windows)

    • Used regexes to clean up the Indri printPassages output


Approaches

  • Stemming

  • Stop word removal

  • Question word removal

  • Query expansion


Approaches (cont.)

  • Stemming

    • Tried with stemming in index and stemming query

    • Porter and Krovetz stemmers

    • Krovetz performed better (less aggressive)


Approaches (cont.)

  • Stop word removal

    • Made runtime faster when removed from index

    • Offered improvement in all circumstances if removed from queries

  • Question word removal

    • Performed in almost all cases for query; some improvement.

    • Largely intuitive. However some questions had slightly better results when left in because of Q&A files in the corpus.


Approaches (cont.)

  • Query expansion

    • Tried adding synonyms from Wordnet

    • Only added synonyms for nouns, verbs, adjectives, and adverbs

    • Restricted synonyms added based on a word’s POS (as predicted by NLTK.pos_tag)

    • Also tried not restricting synonyms by POS


Approaches (cont.)

  • Query expansion

    • In both cases, retrieval results were worse with query expansion


Approaches (cont.)

  • Passage retrieval

    • Used Indri #combine[passage size:increment]( “query” ) operator

    • Originally intended to only use documents returned from document retrieval phase

    • Decided instead to run passage retrieval as a standalone system.


Approaches (cont.)

  • Passage retrieval results

    • Attempted with a few different variables.

    • Krovetz stemming, stopwords + question words removed.

    • Trying to get a window size that did not return too many characters and meaningful increments.


Overall

  • Krovetz stemmer

  • Stopwords removed from query(kept in index)


Critical Analysis

  • Our query expansion attempts did not help

    • Too many misleading terms were introduced

  • Stopword based results were unusual

    • Assumed that removing them from the index would help.

  • Passage retrieval yielded better results than document retrieval

    • It is more meaningful to see a query term in a passage


References

  • Hitesh Sabnani, Prasenjit Majumder. Question Answering System: Retrieving Relevant Passages. In Proceedings of Cross-Language Evaluation Forum - CLEF.

  • Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. 2003. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.



ad