Deep Processing for Restricted Domain QA

Deep Processing for Restricted Domain QA Yi Zhang Universität des Saarlandes yzhang@coli.uni-sb.de

Why Deep? Is Shallow Processing Enough? • For TREC-like QA evaluation • (in most cases) YES • However, for restricted domain QA • More complicated questions • Less information redundancy for data intensive approach • Domain knowledge available

Deep Processing Provides • More fine-grained linguistic analysis • Long distance dependency • Agreements • … • Semantic Representation • MRS/RMRS

General Problems with Deep Processing • Robustness • Lexicon • Compound NP • Specificity • “John saw Mary” • Efficiency (not discussed here)

Deep Processing • MRS/RMRS • (Robust) Semantic representation with underspecification. • HPSG Grammars • LinGO ERG Grammar • Other grammars (German, Japanese, Modern Greek, Norwegian, Chinese, …) • HoG • Hybrid shallow & deep processing architecture with uniformed semantic representation (RMRS).

QA in QUETAL (1) • Hybrid shallow & deep approach • Cross-lingual QA • QA on • Texts • Semi-structured documents • Database

Info Source Texts IE Fact DB QA in QUETAL (2) • Seman Ana. • Seman Q. Ana. • Q-type • A-type • Q-focus NLQ • Syntax Ana. • Dependency Parser • TAG for En/De Q. IR Schema Ans. Planning & Generation GetData IR Query Planner Result Merge

QA in QUETAL (3) Deep processing in QUETAL • HPSG grammar used for question analysis. • Documents are processed with relatively shallow methods. • Answer matching with RMRS.

Restricted Domain QA • More complicated questions • Less documents with better quality • Domain specific ontology available

Restricted Domain QA – an Example Where is the City Hall of Shanghai? Shanghai City Planning Exhibition Hall[LOC_1] is located to the east of the City Hall[LOC_2], …, setting off with the crystal-like GrandTheatre[LOC_3]to the west. Between Shanghai City Planning Exhibition Hall and the Grand Theatre. Domain Onto.

Open Topics • Grammar extension & automated lexicon acquisition • Robust deep processing • Semantic answer matching • Cross-lingual

Grammar Extension Tourism Domain • ERG extended for • “RONDANE” -- Norway mountain area tourism • 1.4K sentences • 15 word/sentence • coverage > 74% • Shanghai tourist guide from http://www.shanghai.gov.cn • 1,600 sentences • 18 word/sentence

Test on RONDANE corpus

Test on RONDANE Corpus

Grammar Extension • ERG lexicon • It is relatively easier to automated the lexicon acquisition for nouns

Automated Lexicon Acquisition • POS tagging • Name entity recognition • Statistical models finding the best lexical type for unknown noun.

Robust Deep Processing • Back-off to RMRS generated with intermediate or shallow parsers (HoG architecture). • Keep non-full parsing charts and corresponding MRS fragments for semantic answer matching.

Parse Disambiguation • Select the best parse with statistical models (Toutanova et al. 2002)

Answer Matching with (R)MRS • Semantic answer matching • Create semantic patterns for each question type. • where -> locate_v(e, x1, x2) • Semantic distance measurement. • pred1(x)&pred2(x) <-> pred1(x)&pred2(y) • Query expansion • Synonym substitution • Semantic structure replacement • give_v(e1, x1, x2, x3) => receive_v(e2, x2, x1, x3)

Work Plan • Narrow down my focus onto one of the topics above. • Continue the Chinese HPSG grammar development.

References • Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim and Stephan Oepen (to appear) Road-testing the English Resource Grammar over the British National Corpus, In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal. • Ulrich Callmeier. 2002. PET – a platform for experimentation with efficient HPSG processing techniques. In Collaborative Language Engineering. CSLI Publications, Stanford, USA. • Hans Uszkoreit. 2002. New chances for deep linguistic processing. In Proc. of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan. • Ann Copestake, Dan Flickinger, Ivan A. Sag, and Carl Pollard. 2003. Minimal recursion semantics: An introduction. Under review. • Timothy Baldwin and Francis Bond. 2003. Learning the countability of English nouns from corpus data. In Proc. of the 41st Annual Meeting of the ACL, pages 463–70, Sapporo, Japan. • Carol, J. and Fang, A. Automatic Acquisition of Verb Subcategorisations and their Impact on the Performance of an HPSG Parser. IJCNLP 2004 • Oepen, Stephan, Dan Flickinger, Kristina Toutanova, Christoper D. Manning. 2002. LinGO Redwoods: A Rich and Dynamic Treebank for HPSG In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria. • Toutanova, Kristina, Christoper D. Manning, Stephan Oepen. 2002. Parse Ranking for a Rich HPSG Grammar In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria. • Stephan Oepen. [incr tsdb()] - Competence and Performance Laboratory. User Manual.Technical Report. Computational Linguistics. Saarland University (in preparation). • Robert Malouf and Gertjan van Noord. 2004. "Wide coverage parsing with stochastic attribute value grammars." In IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses. • Toutanova, Kristina, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263. Sozopol, Bulgaria.

Deep Processing for Restricted Domain QA