Localizing Dependencies: Consequences in NLP

Localizing Dependencies: Consequences in NLP Srinivas Bangalore AT&T Labs-Research srini@research.att.com

Overview • Localization • Localization in SuperTags • Consequences • FERGUS – trainable surface realizer Experiments and Evaluation • Evaluation Metrics • Experiments on quality and quantity of training data

Localization • The kinds of information that are accessible in an elementary unit of the grammar or a probabilistic model. R1: S  NP VP R2: VP like NP R3: NP you : p1 R4: NP peanuts S  NP(S) VP(S) VP(S) like NP(VP) NP(S)  you : p2 NP(VP) you : p3 NP(VP)  peanuts NP(S)  peanuts S1: you like peanuts S2: peanuts like you

Localization in Statistical Parsing • Localization in PCFG models • Structural (Parent, Grandparent, Siblings) • Lexical (Head-Dependent) • Magerman 1995 • Charniak 1997 • Collins 1997 • Ratnaparkhi 1997 • Rens Bod 2000 • Localization driven by improvement in parsing accuracy.

S S NP VP NP VP V NP S* V control VP VP* PP P NP with Predicate-Argument Localization (SuperTags) • Elementary objects of Tree-Adjoining Grammars • Grammatical function • Subcategorization frame (active valency) • Passive valency with direction of modification • Realization of arguments (passive, wh-movement, relative clause) S NP S NP VP V NP control control e

Localization in SuperTags • Consequence of predicate-argument localization • Increased local ambiguity • Richer features available to exploit in disambiguation model • Linguistic relationships between SuperTags can be exploited • Helpful in partial parsing and understanding • SuperTags among different languages can be related

Parsing

S N NP S Adj N NP VP Adv S N underground V NP now poachers control S VP NP N N S S NP VP Adv VP Det NP N N N N NP NP VP VP V NP now the S NP e Adj V V poachers trade VP underground S VP Adv control control S NP NP VP now NP N V NP trade e N e trade SuperTagging for Analysis (1) Poachers now control the underground trade S S NP VP S NP V NP NP VP e N V NP e poachers : : : e Adj : : underground

SuperTagging for Analysis (2) Poachers now control the underground trade

SuperTagging for Analysis (3) • Trigram Model for SuperTag disambiguation • Probabilities estimated from annotated corpora Emit Probability Contextual Probability

a6 arg0 arg1 mod control a9 a2 mod mod poachers now b1 trade b5 b4 underground the Dependency Analysis • SuperTagging results in an “almost” parse. • Dependency requirements of each supertag can be used to form a dependency tree.

Language Generation

Speech Interactive System Text Generation System Interaction Controller Tables Propositions, Communicative Goals, ?? Multimodal Output Generation • Process of converting a system-internal representation to an human interpretable form.

Natural Language Generation (NLG) • Definition • Input: Set of communicative goals • Output: Sequence of words • Example (in the context of a dialog system) • Input • Implicit-confirm(orig-city:NEWARK), Implicit-confirm(dest-city:SEATTLE), Request(depart-date) • Output • Flying from Newark to Seattle. What date would you like to leave? • Alternative: You are flying from Newark. What date would you like to leave? You are flying to Seattle.

Template-based Generation • Language generation is achieved by manipulating text strings • Example: “FLYING TO dest-city, LEAVING FROM orig-city, AND WHAT DATE DID YOU WANT TO LEAVE?” • Number of templates grows quickly • No syntactic generalization; different templates for singular and plural versions • Not much variation in output • No planning of text; simple concatenation of strings • Maintainability • Complex to assemble and change with large number of templates

NL-based Generation • Language generation is achieved in three steps (Reiter 1994) • Text planning • Transform communicative goals into sequence of elementary communicative goals • Sentence planning • Choose linguistic resources to express the elementary communicative goals • Surface realization • Produce surface word order according to the grammar of the language.

Corpus-based NLG • Most NLG systems are hand-crafted • Recent research on statistical models for NLG components • Surface Realizer: Langkilde and Knight 1998, Bangalore and Rambow 2000, Ratnaparkhi 2000 • Sentence Planner: Walker et.al 2001 • Statistical translation models can be seen as incorporating statistical NLG. • Our Approach: Combine linguistic models with statistical models derived from annotated corpora

A Dialogue System with NLG Prosody-Annotated Text String Communicative Goals Semantic (?) Representation Text String/Lattice

FERGUS Surface Realizer • FERGUS: Flexible Empirical/Rationalist Generation Using Syntax • Input: Underspecified dependency tree • Output: Sequence of words control poachers now control the underground trade poachers now trade the underground

VP VP* PP P NP with Treatment of Adjuncts • Adjunct supertags specify • category of node at which to adjoin • direction of modification S PP S* P NP with

Treatment of Adjuncts • Collapse all adjunct (b) trees identical except for adjunction category and/or direction into a g-tree • Like argument trees, g-trees only specify active valency, not passive valency • Passive valency recorded in separate g-table • Similar approach in TAG chosen by McDonald (1985) and D-Tree Substitution Grammar (Rambow et.al 1995). ?? PP P NP with

FERGUS System Architecture Tree Model Language Model TAG Input: underspecifed dependency tree Output: word string Tree Chooser LP Chooser Unraveler semi-specified TAG derivation tree word lattice

Dep. Word Dep. POS Dep. STAG Dep. Morph Dep. Lexclass Function words Head Word Treasury NNP A_NXN Treasury LC453 the said said VBD B_nx0Vs1 say_PAST LC70 - U.S. NNP A_NXN U.S. LC256 the default default VB A_nx0V default_INF LC604 will said on IN B_vxPnx on LC150 default Nov. NNP B_Nn Nov. LC15 ninth ninth CD A_NXN ninth LC482 on Corpus Description • Dependency tree annotated with • Part-of-speech • Morphological information • Syntactic information (“SuperTag”) • Grammatical functions • Sense tags (derived from WordNet)

control poachers trade now underground the Input Representation • Dependency Tree • Semi-specified: any feature can be specified • Role information • Function words

control control a2 a2, a3, a4 poachers trade now trade now poachers a1 a1, g3 a1 g1 g1 a1, g3 underground the underground the g2 g3 g2 a1, g3 Tree Chooser Model 1: • Given a dependency tree assign the appropriate supertag to each node • Probability of daughter supertag: • Choose most probable supertag compatible with mother supertag

control control a2 a2, a3, a4 poachers trade now trade now poachers a1 a1, g3 a1 g1 g1 a1, g3 underground the underground the g2 g3 g2 a1, g3 Tree Chooser (2) Model 2: • Given a dependency tree assign the appropriate supertag to each node • Probability of mother supertag: • Treelet is modeled as a set of independent mother-daughter links. • Choose most probable supertag compatible with grandmother supertag

Representing Supertag as a String • A supertag can be uniquely represented as a string. S NP VP V NP S NP NP VP V V NP NP VP S L L R L L R L R L R 0 1 2 3 4 5 6 7 8 9

control 4 1 6 control 0,3,8,9 poachers trade 4 a2 1 now 1 3 0 poachers trade now underground the a1 1 1 a1 g1 underground the g2 g3 Unraveler • Dependency tree + supertags = semi-specified derivation tree • Argument positions need not be fixed in input • Adjunction sites of g-trees not specified • Given a semi-specified derivation tree, produce a word lattice • XTAG grammar specifies possible positions of daughters in mother supertag.

now poachers control the underground now trade poachers control now the trade underground Unraveler (2) • Tree associated with supertag represented as string (=frontier) • Bottom-up word lattice construction • Lattice is determinized and minimized control 4 1 6 0,3,8,9 poachers trade 4 1 1 now 3 0 underground the 1 1

Linear Precedence Chooser • Given a word lattice produce the most likely string • A stochastic n-gram model is • Used to rank the likelihood of the strings in the lattice • Represented as a weighted finite-state automaton • Composed with the word lattice • Retrieves the best path through the composed lattice Bestpath (Lattice o LanguageModel)

Overview • Background • FERGUS – trainable surface realizer • SuperTags – linguistic model underlying FERGUS • Statistical components of FERGUS • Experiments and Evaluation • Evaluation Metrics • Experiments on quality and quantity of training data

Experiment Setup True Dependency Structure Random Dependency Structure No SuperTags SuperTags Baseline Model Gamma Trees TM-LM Beta Trees TM-XTAG TM-XTAG-LM

Experiment Setup • Baseline Model • Random dependency tree structure • Compute distribution of daughter to left (right) of mother • TM-LM Model • Annotated dependency tree structure for computing left(right) distribution • TM-XTAG Model • Annotated derivation tree structure (without gamma trees) • Compute distribution of a daughter supertag given mother information • TM-XTAG-LM Model • Annotated derivation tree structure with gamma trees

Evaluation Metric • Complex issue • Metric • Objective and automatic • Without human intervention • Quick turnaround • These metrics not designed to compare realizers (but …)

Two String-based Evaluation Metrics • String edit distance between reference string (length in words: R) and result string • Substitution (S) • Insertions (I) • Deletions (D) • Moves = pairs of deletions and insertions (M) • Remaining insertions (I’) and deletions (D’) • Example: poachers now control the underground trade trade poachers the underground control now i . dd . . s i Simple String Accuracy = 1- (I+D+S)/R Generation String Accuracy = 1 – (M+I’+D’+S)/R

Model Simple String Accuracy Generation String Accuracy Baseline 0.412 0.562 TM-LM 0.529 0.668 TM-XTAG 0.550 0.684 TM-XTAG-LM (=FERGUS) 0.589 0.724 Experiments and Evaluation • Training corpus: One million words of WSJ corpus • Test corpus: • 100 randomly chosen sentences • Average sentence length 16.7 words

estimate there was no cost for phase the second Two Tree-based Evaluation Metrics • Not all moves equally bad: moves which permute nodes in tree better than moves which “scramble” tree (projectivity) • Simple Tree Accuracy: calculate S, D, I on each treelet • Generation Tree Accuracy: calculate S, M, I’, D’ on each treelet • Example: There was estimate for phase the second no cost estimate estimate there was for no cost there was for no cost phase phase the second second the

Measuring Performance Using Evaluation Metrics • Baseline: randomly assigned dependency structure, learn position of dependent to head • Training corpus: One million words of WSJ corpus • Test corpus • 100 randomly chosen sentences • average sentence length 16.7 words

Experimental Validation • Problem: how are these metrics motivated? • Solution (following Walker et.al. 1997) • Perform experiments to elicit human judgements on sentences • Relate human judgements to metrics

Experimental Setup • Web-based • Human subjects read short paragraph from WSJ and three or five variants of last sentence constructed by hand • Humans judge: • Understandability: How easy is this sentence to understand? • Quality: How well-written is this sentence? • Values: 1-7; 3 values have qualitative labels • Ten subjects; each subject made a total of 24 judgements • Data normalized by subtracting mean for each subject and dividing by standard deviation; then each variant averaged over subjects

Results of Experimental Validation • Strong correlations between normalized understanding and quality judgements • The two tree-based metrics correlate with both understandability and quality. • The string-based metrics do not correlate with either understandability or quality.

Experimental Validation: Finding Linear Models • Other goal of the experiment: find better metrics • Series of linear regressions • Dependent measures: normalized understanding and quality • Independent measures: different combinations of : • the four metrics • sentence length • the “problem variables (S,I,D,M,I’,D’) • Can improve on explanatory power of original four metrics

Experimental Validation: Linear Models

Experimental Validation:Model of Understanding Normalized Understanding = 1.4728*simple tree accuracy – 0.1015*substitutions – 0.0228*length – 0.2127

Experimental Validation: Model of Quality Normalized Quality = 1.2134*simple tree accuracy – 0.0839*substitutions –0.0280*length – 0.0689

Two New Metrics • Don’t want length to be included in the metrics • Understandability Accuracy = (1.3147*simple tree accuracy – 0.1039*substitutions – 0.4458)/0.8689 • Quality Accuracy = (1.0192*simple tree accuracy – 0.0869*substitutions – 0.3553) /0.6639 • Scores using new metrics:

Summary • FERGUS combines a linguistic model and a statistical model for surface realization. • Tree model combined with linear model outperforms either of them alone. • Four evaluation metrics introduced and validated. • Created two new metrics which better correlate with human judgments.

Motivation: What are the effects of quality and quantity on resulting model? • Corpus quality • Manually-checked treebank: (English) Penn Treebank (Marcus, et.al. 93) • Parsed, not manually-checked treebank: BLLIP Treebank (Charniak 2000) • Grammar quality • Hand-crafted grammar (XTAG) (XTAG-Group 2001) • Auto-extracted grammar (Chen, Vijay-Shanker 2000) • Corpus size • Small ( < 1M words) or Large ( > 1M words)

Localizing Dependencies: Consequences in NLP