1 / 26

Analyzing and Predicting Question Quality in Community Question Answering Services

Analyzing and Predicting Question Quality in Community Question Answering Services. Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012, Lyon, France. Introduction. Question Quality. VS. Number of tag-of-interests. Number of answers. Motivation.

rey
Download Presentation

Analyzing and Predicting Question Quality in Community Question Answering Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012, Lyon, France

  2. Introduction

  3. Question Quality VS Number of tag-of-interests Number of answers

  4. Motivation • Question quality affects answer quality • Low quality questions hinder the CQA services • High quality questions promote the development of the community • Identifying question quality facilitates question search and recommendation

  5. Outline • Problem Definition • Data • Two Studies • Factors Affecting Question Quality • Prediction of Question Quality • Discussion and Conclusion

  6. Problem Definition Figure 1. Construct of question quality in CQA

  7. Data Description Table 1. Summary of data in Entertainment & Music category and its subcategories

  8. Ground Truth Table 2. Rule base for the ground truth setting RM NTA: number of tag-of-interests + number of answers RM: reciprocal of the minutes for getting the best answer Table 3. Summary of questions in four levels 2 3 1 4

  9. Study One: Factors AffectingQuestion Quality • Possible Factors • Process • Select the two most popular subcategories (say, Music and Movies) and check their distributions of question quality • Track askers with at least five questions in both these two subcategories Topics Askers

  10. Observations Table 4. Summary of question quality for different askers

  11. Observations

  12. Study Two: Prediction of Question Quality • Modeling the relationships among questions, topics and askers as a bipartite graph Question Quality Asking Expertise

  13. Mutual Reinforcement Label Propagation for Predicting Question Quality ? ? ? ? ? ? ?

  14. MRLP similar users’ asking expertise question quality asking expertise similar questions’ quality

  15. Data for Study Two

  16. Methods Comparison • Logistic Regression • LG_Q and LG_QA • Stochastic Gradient Boosted Tree (Friedman, J. H., 1999) • SGBT_Q and SGBT_QA • Harmonic Function (Zhou et al., 2007) • HF_Q and HF_QA

  17. Experimental Results: Accuracy

  18. Sensitivity & Specificity • Sensitivity measures the algorithm’s ability to identify high quality questions Sensitivity = TP/(TP+FN) • Specificity measures the algorithm’s ability to identify low quality questions Specificity = TN/(TN+FP)

  19. Experimental Results: Music

  20. Experimental Results: Movies

  21. Discussion • MRLP is more effective in distinguishing high quality questions from low quality ones than state-of-the-art methods • At present, neither MRLP nor other methods achieves satisfactory performance due to the influence of features

  22. Discussion • Salient features? • User study via crowdsourcing sytems

  23. Conclusion • Define Question Quality in CQA • Conduct two studies to investigate question quality in CQA services • Analyze the factors influencing question quality • Propose a mutual reinforcement-based label propagation algorithm to predict question quality • Future Work • Explore more salient features • Utilize question quality to improve question search and question recommendation

  24. Thank You! Q&A

  25. Data Description • 238,549 resolved questions under the Entertainment & Music category of Yahoo! Answers • Question Features • Text, post time, etc. • Asker Features • Total points, No. of questions asked, No. of questions resolved, etc.

  26. MRLP n × n probabilistic transition matrix For the question part of the bipartite graph, we create edges between any two questions within same topics: For the asker part of the bipartite graph, we generate the probabilistic transition matrix M similarly.

More Related