350 likes | 478 Views
This paper presents a classification-based method for detecting questions and answering posts in discussion boards. Utilizing a range of classification methods, including N-grams and graphical models, it explores efficient mechanisms for identifying question-related threads and optimizing answer detection. The study analyzes data from extensive discussion boards to propose a ranking scheme that combines various features, significantly improving performance over traditional relevance-based methods. The findings enhance our understanding of the dynamics in question answering within online forums.
E N D
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009
Outline • Introduction • Related Work • Problem Definition • Classification Methods • Experiments • Conclusion
Introduction • Online users share ideas, discuss issues and form communities within discussion boards(online forums) • Knowledge discovery and information extraction • Several potential applications about mining QA content: • Search engines • Online QA services • Experts in social media • Knowledge base of automatic chat-bots
Related Work • Cong et al., 2008 • They developed a classification-based method for question detection • sequential pattern features extracted from both questions and non-questions in forums • Preprocess by applying a POS tagger while keeping 5W1H and modal words • Time-consuming problem • Focus on question sentences or question paragraphs
Related Work(cont’d) • Knowledge acquisition from discussion boards • Zhou and Hovy, 2005 • Feng et al., 2006 • Using non-textual features like click count to predict the quality of answers • Jeon et al., 2006 In general all related work does not need to detect questions
Tasks • Tasks: • Identifying question-related first posts • Fining potential answers in subsequent responses within the corresponding threads • Some questions…
Tasks(cont’d) • Some questions: • Can we detect question-related threads in an efficient and effective manner? • What other features can be used to improve the performance? • How much can the combinations of some simple heuristics improve performance? • Are traditional relevance-based approaches suitable to these QA content?
Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:
Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:
Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:
Problem Definition(cont’d) • Answers • If one of the replied posts contains answers to the questions proposed in the first post, then regard that reply as an answer post • Also consider replied post not containing the actual content of answers but providing links to other potential answers an answer posts. • Result from the system: Question-answer post pairs
Classification Methods(1/3) • NTU CSIE LIBSVM 2.88 • Question detection: • Question mark • 5W1H words • Total number of posts within one thread • Authorship • N-gram
Classification Methods(2/3) • Answer detection • The position of the answer post • Authorship • N-gram • Stop words • Query likelihood model score
Classification Methods(3/3) • Cong et al., 2008 • Sequential pattern mining • Graph-based model • Query likelihood language model • KL-divergence language model
Experiments(1/9) • Data crawled • 555,954 threads from Ubuntudataset • 721,422 threads from Photography On The Net • Question detection task: • Randomly sampled 572 threads from Ubuntu dataset and 500 threads from the DC dataset • Answer detection task: • Randomly sampled 500 question-related threads from both dataset
Experiments(8/9) • Propose a ranking scheme • Ranking score: V1: position + authorship, V2: position, V3: authorship
Conclusion • Use of N-grams and the combination of several non-content features can improve the performance • Relevance-based retrieval methods would not be effective in tackling the problembut the performance can be improved by combining with non-content features • Design a simple ranking scheme that outperforms previous approaches
Combine several potential answers together to make a better answer ? • A good understanding of the interaction of question answering in the discussion boards