140 likes | 497 Views
Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia. Chin-Yew LIN cyl@microsoft.com. Generating Questions from Queries. Where is the next Hannah Montana concert?. Q2Q. Hannah Montana concert. Q2Q as a question generation shared task.
E N D
Automatic Question Generation from QueriesNatural Language Computing, Microsoft Research Asia Chin-Yew LIN cyl@microsoft.com
Generating Questions from Queries Where is the next Hannah Montana concert? Q2Q Hannah Montana concert Q2Q as a question generation shared task
Remember Ask Jeeves? “How large is British Columbia?”
Naver Knowledge iN (Korea) Naver “Knowledge iN “Service • Opened at October 2002 • 70 MillionsKnowledge iN DB are collected (2007. 06) • # of Users: 12 millions • Upper level users (higher than Kosu): 6,648 (0.05%) • Distribution of knowledge • Education, Learning: 17.78% • Computer, Communication: 12.89% • Entertainments, Arts: 11.42% • Business, Economy: 11.42% • Home, Life: 7.44%
Baidu Zhidao (China) 17,012,767 resolved questions in two years’ operation. 8,921,610 are knowledge related. 96.7% of questions are resolved. 10,000,000 daily visitors. 71,308 new questions per day. 3.14 answers per question. http://www.searchlab.com.cn (中国人搜索行为研究/User Research Lab of Chinese Search)
Yahoo! Answers (Global;Marciniak) Launched in December 2005. 20 million users in the U.S. (> 90 million worldwide). 33,557,437 resolved questions (US; April 2008). ~70,000* new questions per day (US). 6.76* answers per question (US).
Question Taxonomy ISI’s question answer typology (Hovy et al. 2001 & 2002) • Results of analyzing over 20K online questions • 140 different question types with examples • http://www.isi.edu/natural-language/projects/webclopedia/Taxonomy/taxonomy_toplevel.html Liu et al. (COLING 2008)’s cQA question taxonomy • Derived from Broder’s (SIGIR Forum 2002) web serach taxonomy • Results of analyzing 100 randomly sampled questions from top 4 Yahoo! Answers categories • Entertainment & Music, Society & Culture, Health, and Computer & Internet
Main Task: Q2Q Generate questions given a query • Query: “Hannah Montana concert” • Questions: • “How do I get Hannah Montana concert tickets for a really good price?” • “What should i wear to a hannah montana concert?” • “How long is the Hannah Montana concert?” • … Subtasks • Predict user goals • Learn question templates • Normalize questions
Data Preparation • cQA archives • Live Search QnA • Yahoo! Answers • Ask.com • Other sources • Query logs • MSN/Live Search • Yahoo! • Ask.com • TREC and other sources • Possible process • Sample queries from search engine query logs • Ensure broad topic coverage • Find candidate questions from cQA archives given queries • Create mapped Q2Q corpus for training and testing
Intrinsic Evaluation Given a query term • Generate a rank list of questions related to the query term • Open set – use pooling approach • Pool all questions from participants • Rate each question as relevant or not • Compute recall/precision/F1 scores • Closed set – use test set data as gold standard • Metrics • Diversity, interestingness, utility, and so on.
Extrinsic Evaluation A straw man scenario • Task – online information seeking • Setup • A user select a topic (T) she is interested in. • Generate a set of N queries given T and a query log. • The user select a query (q) from the set. • Generate a set of M questions given q. • The user select the question (Q) that she has in mind. • If the user does not select any question, record it as not successful. • Send q to a search engine (S); get results X. • Send q, Q, and anything inferred from Q to S; get results Y. • Compare results X and Y using standard IR relevance metrics.
Summary Task: Question generation from queries Data: • Search engine query logs • cQA question answer archives • Question taxonomies Evaluation: • Intrinsic – evaluate specific technology areas • Extrinsic – evaluate its effect on real world scenarios Real data, real task, and real impact