Quality-aware Collaborative Question Answering: Methods and Evaluation

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng LimSingapore Management University Aixin SunNanyang Technological University Roger H.L. ChiangUniversity of Cincinnati WSDM 2009, Barcelona, Spain

Outline • Motivation and objectives • Quality-aware QA framework • Expertise-based methods • Experimental Setup • Results • Conclusions

Collaborative Question Answering • Finding answers to questions using community QA portals Community QA Portal QuestionInterface AnswerInterface SearchEngine Question andAnswer Database

Collaborative QA • Simple Idea: Use the search engine provided by community QA portals. • Limitations: • Assume that related questions are available. • Search engines do not guarantee answer relevance and quality. • Users can vote best answers but votes are unreliable. • Users may not be experts. • Collaborative QA needs to address quality issues (answer quality problem)

Research Objectives • Develop methods to find good answers for a given question using QA database of a community QA portal • Benefits: • Better answers compared with traditional QA methods. • Reduce duplicate questions

Quality-Aware Framework Question Good Answer Relevant Answer Quality Answer Content Quality User Expertise

Quality-Aware Framework Answer relevance score(q,a)= rscore(q,a) • qscore_<model>([q,]a) Compute AnswerRelevance Score (rscore) user question (q) Select Answers by Overall Score (score) Search QAPortal candidate answers+ questions answers QA Database Compute AnswerQuality Score (qscore) Answer quality

Expertise-based Methods Quality Answer Asking Expertise User Expertise Content Quality Answering Expertise Question Dependent Expertise Peer Expertise Dependency NT Method [Jeon et. al,2006] EX_QD EXHITS_QD EXHITS EX_QD’

Users, Questions and Answers

Question Independent Expertise • EXHITS • Expert askers have questions answered by expert answerers. • Expert answerers answer questions by expert askers. • Content quality not considered. Asking expertise Answering expertise [Jurczyk and Agichtein, 2007a, 2007b]

Question Dependent Expertise • EXHITS_QD: • Expert askers have q related questions with good answers posted by expert answers. • Expert answerers post good answers to q related questions from expert askers Answer content quality Answer relevance

Question Dependent Expertise • EX_QD: • Non-peer expertise dependent counterpart of EXHITS_QD • Expert askers ask many q related questionsthat attract many good answers

Question Dependent Expertise • EX_QD’: • EX_QD without using answer quality to measure asker expertise

Experimental Setup • Answer relevance • Yahoo! Answers search engine • Query likelihood retrieval model Jelinek-Mercer background smoothing (λ=0.2)

Baseline Methods • BasicYA: • Use question relevance ranking by Yahoo! Answer. • Returns the best answers only. • Search options: • BasicYA(s+c): question subject and content. • BasicYA(b+s+c): best answer+ question subject + content • BasicQL • Query likelihood model • BasicQL(s) • BasicQL(s+c)

Baseline Method • NT: • qscore_nt(a) = p(good|a) • 9 non-text features [Jeon et. al,2006] • Proportion of best answers given by answerer • Answer length • # stars given by the asker to the answer should it be selected as the best answer. Otherwise a zero value is assigned • # answers the answerer has provided so far • # categories that the answer is declared the top contributor at (cap at 3) • # times the answer is recommended by other users • # times the answer is dis-recommended by other user • # answers for the question associated with the answer • # points that the answerer receives from answering giving best answers,voting and signing in.

QA Dataset • Randomly select 50 popular test questions in the computer and internet domain • For each test question, get top 20 questions and their best answers from Yahoo! Answers → 1000 answers • Annotators label each of 1000 answers • Good vs bad quality • Used for training NT method • 50 test questions divided into • Cat A (23): with ≥4 bad quality answers • Cat B (27): with <4 good quality answers

Steps to construct QA Dataset 50 popular test questions

QA Dataset Statistics

Relevance and Quality Judgement • 9 annotators → 3 groups • Pooled top 20 answers for each test questions by all methods → 8617 question/answer pairs • Label each question/answer pair: • {relevant, irrelevant} to test question • {good, bad} quality answer • ≥ 2 annotator groups agree

Summary of Methods Used Little weight to asking expertise No weight to asking expertise

Evaluation of Methods • Best Answers vs All Answer options (with *) • Top 20 answers are judged • P_q@k Precision of quality at top k • P_r@k Precision of relevance at top k • P@k Precision of both quality and relevance at top k • k = 5, 10, 20

Compare Basic and NT Methods • BasicYA and BasicQL performs more poorly in Cat A → poor precision in quality • BasicQL(s) generally better than other Basic methods • NT better than BasicQL(s) in Cat A • NT* is better than NT → all answers option is good

Performance of Expertise Methods • Answerer’s asking expertise is important: • (σ=0.8) is better than (σ=1) • Question dependent is better than question independent • Peer expertise dependency is not essential • EX_QD and EX_QD’ are the best • Much better than NT in Cat A • Better than BasicQL in Cat B

Performance of Expertise Methods • All answer option better than best answer option • Non-best answers can be good quality • Results consistent when stricter judgement is imposed.

Conclusions • Collaborative QA is a viable alternative to traditional QA. • Quality is an essential criteria for ranking answers. • Question dependent expertise improves answer quality measurement. • Other extensions: • Questions/answers from other domains. • Personalized answers vs best answers.

Related Work • Jeon et. al 2006 • Measurement of content quality. • Jurczyk and Agichtein 2007a, 2007b • Proposed answering and asking expertise. • Bian, et al 2008 • Combine both content quality and relevance. • User expertise not considered. • Expert finding • Find experts of a given topic by constructing user profiles using answers posted by their users. • Liu and Croft 2005

Quality-aware Collaborative Question Answering: Methods and Evaluation