Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia

Automatic Question Generation from QueriesNatural Language Computing, Microsoft Research Asia Chin-Yew LIN cyl@microsoft.com

Generating Questions from Queries Where is the next Hannah Montana concert? Q2Q Hannah Montana concert Q2Q as a question generation shared task

Remember Ask Jeeves? “How large is British Columbia?”

Live Search QnA (English)

Naver Knowledge iN (Korea) Naver “Knowledge iN “Service • Opened at October 2002 • 70 MillionsKnowledge iN DB are collected (2007. 06) • # of Users: 12 millions • Upper level users (higher than Kosu): 6,648 (0.05%) • Distribution of knowledge • Education, Learning: 17.78% • Computer, Communication: 12.89% • Entertainments, Arts: 11.42% • Business, Economy: 11.42% • Home, Life: 7.44%

Baidu Zhidao (China) 17,012,767 resolved questions in two years’ operation. 8,921,610 are knowledge related. 96.7% of questions are resolved. 10,000,000 daily visitors. 71,308 new questions per day. 3.14 answers per question. http://www.searchlab.com.cn (中国人搜索行为研究/User Research Lab of Chinese Search)

Yahoo! Answers (Global;Marciniak) Launched in December 2005. 20 million users in the U.S. (> 90 million worldwide). 33,557,437 resolved questions (US; April 2008). ~70,000* new questions per day (US). 6.76* answers per question (US).

Question Taxonomy ISI’s question answer typology (Hovy et al. 2001 & 2002) • Results of analyzing over 20K online questions • 140 different question types with examples • http://www.isi.edu/natural-language/projects/webclopedia/Taxonomy/taxonomy_toplevel.html Liu et al. (COLING 2008)’s cQA question taxonomy • Derived from Broder’s (SIGIR Forum 2002) web serach taxonomy • Results of analyzing 100 randomly sampled questions from top 4 Yahoo! Answers categories • Entertainment & Music, Society & Culture, Health, and Computer & Internet

Main Task: Q2Q Generate questions given a query • Query: “Hannah Montana concert” • Questions: • “How do I get Hannah Montana concert tickets for a really good price?” • “What should i wear to a hannah montana concert?” • “How long is the Hannah Montana concert?” • … Subtasks • Predict user goals • Learn question templates • Normalize questions

Data Preparation • cQA archives • Live Search QnA • Yahoo! Answers • Ask.com • Other sources • Query logs • MSN/Live Search • Yahoo! • Ask.com • TREC and other sources • Possible process • Sample queries from search engine query logs • Ensure broad topic coverage • Find candidate questions from cQA archives given queries • Create mapped Q2Q corpus for training and testing

Intrinsic Evaluation Given a query term • Generate a rank list of questions related to the query term • Open set – use pooling approach • Pool all questions from participants • Rate each question as relevant or not • Compute recall/precision/F1 scores • Closed set – use test set data as gold standard • Metrics • Diversity, interestingness, utility, and so on.

Extrinsic Evaluation A straw man scenario • Task – online information seeking • Setup • A user select a topic (T) she is interested in. • Generate a set of N queries given T and a query log. • The user select a query (q) from the set. • Generate a set of M questions given q. • The user select the question (Q) that she has in mind. • If the user does not select any question, record it as not successful. • Send q to a search engine (S); get results X. • Send q, Q, and anything inferred from Q to S; get results Y. • Compare results X and Y using standard IR relevance metrics.

Summary Task: Question generation from queries Data: • Search engine query logs • cQA question answer archives • Question taxonomies Evaluation: • Intrinsic – evaluate specific technology areas • Extrinsic – evaluate its effect on real world scenarios Real data, real task, and real impact

Analyze cQA Questions (Liu et al. COLING 08)

Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia

Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia

Presentation Transcript

Natural Language Generation

Natural Language Generation: Discourse Planning

Natural Language Question Answering

Automatic generation of natural language descriptions of visual scenes

SQuaD the starting point of web intelligence Natural Language Computing, Microsoft Research Asia

Natural Language Generation

Building Natural Language Generation Systems

Natural Language Generation An Introductory Tour

CS5545: Natural Language Generation

Natural Language Processing Question Answering

Natural Language Generation 74.793 Research Presentation

CSA3050: Natural Language Generation

Natural Language Computing and Reasoning

Introduction to Natural Language Generation

Natural Language driven Image Generation

Formal Issues in Natural Language Generation

Natural Language Generation

Automatic Question Generation for Vocabulary Assessment

Automatic Computing

CS5545: Natural Language Generation

Building Natural Language Generation Systems

Survey on Natural Language Generation