270 likes | 377 Views
Explore quality-aware QA methods and expertise-based approaches for collaborative question answering. Research objectives, experimental setup, results, and conclusions are discussed with emphasis on answer relevance and quality. Evaluate the effectiveness of different methods through precision metrics.
E N D
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng LimSingapore Management University Aixin SunNanyang Technological University Roger H.L. ChiangUniversity of Cincinnati WSDM 2009, Barcelona, Spain
Outline • Motivation and objectives • Quality-aware QA framework • Expertise-based methods • Experimental Setup • Results • Conclusions
Collaborative Question Answering • Finding answers to questions using community QA portals Community QA Portal QuestionInterface AnswerInterface SearchEngine Question andAnswer Database
Collaborative QA • Simple Idea: Use the search engine provided by community QA portals. • Limitations: • Assume that related questions are available. • Search engines do not guarantee answer relevance and quality. • Users can vote best answers but votes are unreliable. • Users may not be experts. • Collaborative QA needs to address quality issues (answer quality problem)
Research Objectives • Develop methods to find good answers for a given question using QA database of a community QA portal • Benefits: • Better answers compared with traditional QA methods. • Reduce duplicate questions
Quality-Aware Framework Question Good Answer Relevant Answer Quality Answer Content Quality User Expertise
Quality-Aware Framework Answer relevance score(q,a)= rscore(q,a) • qscore_<model>([q,]a) Compute AnswerRelevance Score (rscore) user question (q) Select Answers by Overall Score (score) Search QAPortal candidate answers+ questions answers QA Database Compute AnswerQuality Score (qscore) Answer quality
Expertise-based Methods Quality Answer Asking Expertise User Expertise Content Quality Answering Expertise Question Dependent Expertise Peer Expertise Dependency NT Method [Jeon et. al,2006] EX_QD EXHITS_QD EXHITS EX_QD’
Question Independent Expertise • EXHITS • Expert askers have questions answered by expert answerers. • Expert answerers answer questions by expert askers. • Content quality not considered. Asking expertise Answering expertise [Jurczyk and Agichtein, 2007a, 2007b]
Question Dependent Expertise • EXHITS_QD: • Expert askers have q related questions with good answers posted by expert answers. • Expert answerers post good answers to q related questions from expert askers Answer content quality Answer relevance
Question Dependent Expertise • EX_QD: • Non-peer expertise dependent counterpart of EXHITS_QD • Expert askers ask many q related questionsthat attract many good answers
Question Dependent Expertise • EX_QD’: • EX_QD without using answer quality to measure asker expertise
Experimental Setup • Answer relevance • Yahoo! Answers search engine • Query likelihood retrieval model Jelinek-Mercer background smoothing (λ=0.2)
Baseline Methods • BasicYA: • Use question relevance ranking by Yahoo! Answer. • Returns the best answers only. • Search options: • BasicYA(s+c): question subject and content. • BasicYA(b+s+c): best answer+ question subject + content • BasicQL • Query likelihood model • BasicQL(s) • BasicQL(s+c)
Baseline Method • NT: • qscore_nt(a) = p(good|a) • 9 non-text features [Jeon et. al,2006] • Proportion of best answers given by answerer • Answer length • # stars given by the asker to the answer should it be selected as the best answer. Otherwise a zero value is assigned • # answers the answerer has provided so far • # categories that the answer is declared the top contributor at (cap at 3) • # times the answer is recommended by other users • # times the answer is dis-recommended by other user • # answers for the question associated with the answer • # points that the answerer receives from answering giving best answers,voting and signing in.
QA Dataset • Randomly select 50 popular test questions in the computer and internet domain • For each test question, get top 20 questions and their best answers from Yahoo! Answers → 1000 answers • Annotators label each of 1000 answers • Good vs bad quality • Used for training NT method • 50 test questions divided into • Cat A (23): with ≥4 bad quality answers • Cat B (27): with <4 good quality answers
Steps to construct QA Dataset 50 popular test questions
Relevance and Quality Judgement • 9 annotators → 3 groups • Pooled top 20 answers for each test questions by all methods → 8617 question/answer pairs • Label each question/answer pair: • {relevant, irrelevant} to test question • {good, bad} quality answer • ≥ 2 annotator groups agree
Summary of Methods Used Little weight to asking expertise No weight to asking expertise
Evaluation of Methods • Best Answers vs All Answer options (with *) • Top 20 answers are judged • P_q@k Precision of quality at top k • P_r@k Precision of relevance at top k • P@k Precision of both quality and relevance at top k • k = 5, 10, 20
Compare Basic and NT Methods • BasicYA and BasicQL performs more poorly in Cat A → poor precision in quality • BasicQL(s) generally better than other Basic methods • NT better than BasicQL(s) in Cat A • NT* is better than NT → all answers option is good
Performance of Expertise Methods • Answerer’s asking expertise is important: • (σ=0.8) is better than (σ=1) • Question dependent is better than question independent • Peer expertise dependency is not essential • EX_QD and EX_QD’ are the best • Much better than NT in Cat A • Better than BasicQL in Cat B
Performance of Expertise Methods • All answer option better than best answer option • Non-best answers can be good quality • Results consistent when stricter judgement is imposed.
Conclusions • Collaborative QA is a viable alternative to traditional QA. • Quality is an essential criteria for ranking answers. • Question dependent expertise improves answer quality measurement. • Other extensions: • Questions/answers from other domains. • Personalized answers vs best answers.
Related Work • Jeon et. al 2006 • Measurement of content quality. • Jurczyk and Agichtein 2007a, 2007b • Proposed answering and asking expertise. • Bian, et al 2008 • Combine both content quality and relevance. • User expertise not considered. • Expert finding • Find experts of a given topic by constructing user profiles using answers posted by their users. • Liu and Croft 2005