Soft Syntactic Constraints for Hierarchical Phrase-Based Translation

Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguisticsand the Laboratory for Computational Linguistics and Information Processing (CLIP)at the Institute for Advanced Computer Studies (UMIACS)University of Maryland, College Park, MD 20742-7505, USA {ymarton, resnik} @t umiacs.umd.edu ACL’08, Columbus, Ohio, June 2008

Why Has Source-side Syntax Not Helped SMT as Much as Target-side Syntax? • Most previous work: Syntactic representations  data-driven patterns • Chiang 05, ours: Data-driven patterns  syntactic constraints • Why the failure in latter direction? • Noisy / inaccurate parsing info? • Too coarse a usage of syntax info? • We argue the latter: rule granularity and constraint conditionsare key • We show that adding (soft) syntactic constraints to data-driven patternsyields substantial improvements.

Outline • Background • Hiero • Soft Syntactic Constraints • Adding Syntax • Rule Granularity • Constraint Conditions • Experiments • Conclusions + Future Work

Univ. Hard Univ. Soft Knowledge and Constraints • Syntactic-tree-based vs. Data-driven • Formal vs. linguistic syntax (Chiang 2005) • Formal Syntax (e.g., Synchronous CFG) • Linguistic Syntax (parses) • Hard vs. Soft Constraints • Hard constraint: limit possible space (only allow rules compatible with constraint) • Soft constraint: skew space towards constraint (but clear patterns in data ‘win’ even if incompatible with constraint) • Soft syntactic constraint: boost weight of data-driven rules that are compatible with parsing info.

Hiero • Chiang 2005, 2007 • Weighted synchronous CFG • Unnamed non-terminals: X <e, f >e.g., X  <今年X1, X1 this year> • Translation model features:e.g., log p(e|f) • Log-linear model: + rule penalty feature, “glue” rules 的竞选Election 投票在初选voted inthe primaries

Soft Syntactic Constraints • Chiang’s 2005 constituency feature • Boost rule’s score if rule’s source-side matches a constituent span • Constituency-incompatible emergent patterns can still ‘win’ (in spite of no boost) • Good idea -- Neg-result  • But what if…

Rule granularity • Chiang: Single weight for all constituents (parse tags) • … But what if we can assign a separate feature and weight for each constituent? • E.g., NP-only: (NP= ) • Or VP-only: (VP= )

Constraint Conditions • VP-only, revisited: • We saw VP-match (VP= ):exact match of a VP sub-tree span • We can also incur a cost for crossingconstituent boundaries: e.g., VP-cross (VP+ )

Feature Space • {NP, VP, IP, CP, …} x {match=,cross-boundary+} • Basic translation models: • For each feature, add (only it) to default feature set, assigning it a separate weight. • Feature “combo” translation models: • NP2 (double feature): add both NP+ and NP= with separate weights each • NP_ (conflated feature) ties weights of NP+, NP= • XP=, XP+, XP2, XP_:conflate all labels that correspond to “standard” X-bar Theory XP constituents in each condition. • All-labels= (Chiang’s), All-labels+, All-labels_,All-labels2

Settings • Hiero Default feature set for baseline • Chinese baseline also included a specialized number translation feature (Chiang 2007) • LM: SRI Language Modeling Toolkit (Stolcke, 2002) with modified Kneser-Ney smoothing (Chen & Goodman, 1998). • Word-level alignments: GIZA++ (Och & Ney, 2000). • Source-side parses: • Chinese: Huang et al. (2008) • Arabic: Stanford Parser v.2007-08-19 (Klein & Manning 2003) • Optimized using MERT (Och 2003) • with BLEU (Papineni et al. 2002) • and the NIST-implemented “shortest” effective ref. length. • Dev set: Chinese NIST MT03; Arabic NIST MT02.

Chinese-English • Replicated Chiang 2005 constituency feature (negative result) • NP=, QP+, VP+ up to .74 BLEU points better. • XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better. • Validated on the NIST MT08 test set *,**: sig. better than baseline+,++: better than Chiang-05

Arabic-English • New result for Chiang’s constituency feature (MT06, MT08) • PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline. • AP2, AdvP2 up to 1.94 better. • Validated on the NIST MT08 test set *,**: sig. better than baseline+,++: better than Chiang-05

PP+Example: Arabic MT06

Discussion • Direct contribution • Feature better translates related phrases (not shown here) • Indirect contribution • Translation of other parts can be (and is) influenced(to appoint a representative of syria to the united nationsvs. to appoint syria to the united nations representative) • Feature combinations do not always help • In fact, some combos do worse than each feature alone. • Within-language consistency across test sets • Chinese: NP, VP, IP, (XP, all-labels) • Arabic: PP, AP,AdvP, (IP, VP, XP) • Across-language variation, but IP & VP do well.

Conclusion: Our Approach Data-driven approach (Stat MT) using Formal syntax (SCFG) while adding Soft constraints (weights) of linguistic syntax (parses) with fine-grained constituent features (NP, VP, …) and constraint conditions (match=,cross+)

Main Contributions • First time to achieve improvement using (soft) syntax info in Hiero • Previous (Chiang 2005) negative result – not (or not only) due to noisy parses • Finer syntactic rule resolutionhelps (NP, VP,…) • Finer (soft) constraint conditions help (NP=, NP+, VP=, VP+, …) • Selective application: parse labels that are not ”standard” XP constituent labels seem to be more noisy than helpful • Feature combos do not always help (might do worse) • Inter-language variation, but IP and VP generally do well cross-linguistically. • Within-language consistency (across test sets)

Future Work • Why do feature combos’ contributions sometimes cancel each other out? • We found no simple correlation between finer-grained feature scores (and/or boundary condition) and combination or conflation scores. • Why did no NP variant yield much gain in Arabic? • Exploit other forms of soft constraints

Thanks • This work was supported in part by DARPA prime agreement HR0011-06-2-0001. • Thanks to David Chiang and Adam Lopez for making their source code available; • Thanks to the Stanford Parser team and Mary Harper for making their parsers available; • Thanks to David Chiang, Amy Weinberg, and CLIP Laboratory colleagues, particularly Adam Lopez, Chris Dyer, and Smaranda Muresan, for discussion and invaluable assistance.

Hiero Default Feature Set and the “Standard” XP Label Set • Hiero default feature set: • LM, p(e|f), p(f|e), plex(e|f), plex(f|e), rule (phrase) penalty and glue rule feature weights. • Chinese-only: number translation feature • “Standard” linguistic labels: {CP, IP, NP, VP, PP, ADJP, ADVP, QP, LCP, DNP} • Excluding non-maximal projection labels such as VV, NNP, etc. • excluding labels such as PRN (parentheses), FRAG (fragment), etc. • XP= : disjunction of {CP=, IP=, …, DNP=}

Training sets+ full results

AdvP2 example in Arabic MT06

PP+.AdvP= Example: Arabic MT06

PP+ Example: Arabic MT06 Note that this example might be misleading in not being a good representative example of the feature’s contribution.

Soft Syntactic Constraints for Hierarchical Phrase-Based Translation

Soft Syntactic Constraints for Hierarchical Phrase-Based Translation

Presentation Transcript

CP nets and Soft Constraints

Statistical Machine Translation Part III – Phrase- based SMT / Decoding

Scheduling with Soft Constraints

Statistical Machine Translation Part V – Phrase-based SMT

Morphological Analysis for Phrase-Based Statistical Machine Translation

Machine Translation Phrase Alignment

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang

Soft constraints

Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Machine Translation Decoder for Phrase-Based SMT

Fine-Grained Soft Semantic Constraints

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure

A Syntax-Driven Bracketing Model for Phrase-Based Translation

Semiring -based Soft Constraints

Morphological Analysis for Phrase-Based Statistical Machine Translation

Machine Translation Decoder for Phrase-Based SMT

Soft Constraints: Exponential Models

Semiring -based Soft Constraints

Soft Constraints: Exponential Models

Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation