Advanced Tree Transducers for Machine Translation

Training Tree Transducers Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou

Outline • Finite State Transducers (FSTs) and R • Trees and Regular Tree Grammars • xR and Derivation Tree • Inside-Outside algorithm and EM training • Turning tree to string (xRS) • Example and Related Work • My thought/questions

b:y a:x q0 q1 Finite State Transducers (FSTs) • Finite-state Transducer: from what we’ve learned->

R transducer • An R transducer compactly represent a potentially infinite set of input/output tree pairs. • While a FST compactly represent such a set of input/output string pairs. • R is a generalization of FST.

S PRO VP he V NP drinks water Example of R He drinks water

q S • S • S PRO VP qleft.vp.v VP qleft.vp.v VP qpro PRO qpro PRO qright.vp.np VP qright.vp.np VP he V V PRO NP NP drinks water Example for R cont Rule 1: Rule: 2,3,4 English order S(PRO, VP(V, NP)) Arabic order S(V,PRO,NP)

Trees • Definitions:

Regular Tree Grammars (RTG) • Regular Tree Grammar, a common way of compactly representing a potentially infinite set of trees. • wRTG is just like WFSA. • wRTG G : (∑,N,S,P) ∑: alphabet N: nonterminals S: start nonterminal : Weighted productions

Sample wRTG

Extended-LHS Tree Transducer (xR) • Different from R: explicitly represent the lookahead and movement with a more specified LHS • Form of LHS is: The pattern will be used to match an input subtree. • There is a set of finite tree patterns.

Binary Relation:

Derivation Tree • So many trees now, but this derivation tree is a representation of the transducer, neither the input tree nor the output tree. • But derivation tree can deterministically produce a single weighted output tree.

Derivation tree & derivation wRTG X X’

Inside-Outside algorithm • Basic idea of inside-outside algorithm: Use current probability of rules to estimate the expected frequencies of certain types of derivation steps and compute new probabilities for those rules.[1] • Generally for inside probability is to recalculate p of A->a may go through A->BC for outside probability is to recalculate p of C->AB or C->BA

Inside-Outside for wRTG • Inside weights using G are given by βG: • Outside weights αG:

EM training • EM training: to maximized the corpus likelihood, repeatedly estimating the expectation of decision and maximizing by assigning counts to parameter and renormaliztion. • Algorithm 2 implements EM xR training by repeatedly computing inside-outside weights.

From tree to string • Although we can use Extended-LHS Tree Transducer (xR) to get an output tree from an input tree (say parse trees), but still, it is a (parse) tree, not the sentence in another language (for machine translation). • Now we have xRS—tree to string transducer.

Tree-to-string transducer • Weighted extended-lhs root-to-frontier tree-to-string transducer: X=(∑,Δ,Q, Qi, R) • It is similar to xR, but the rhs is strings instead of trees.

Example • Implemented the translation model of (Yamada and Knight 2001) • There is a trainable xRS tree-to-string transducer that embodies:

Example

Related Work • TSG vs RTG (equivalent) • xR vs weighted synchronous TSG (similar) • EM training vs forward backward algorithm for finite state (string) transducer and also for HMM

Questions • Is there any future work on this tree transducer especially for Machine Translation? • Precision? Recall? • Also a little bit confused in the descriptions of those two relationships =>x and =>G • Not very sure about inside-outside algorithm. Questions?

Thank you!!

Reference • 1 Fernando Pereira, Yves Schabes INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA 1992

What might be useful • An Overview of Probabilistic Tree Transducers for Natural Language Processing Kevin Knight and Jonathan Graehl

– R: Top-down transducer, introduced before. • – F: Bottom-up transducer (“Frontier-to-root”), with similar rules, but transforming the leaves of the input tree first, and working its way up. • – L: Linear transducer, which prohibits copying subtrees. Rule 4 in Figure 4 is example of a copying production, so this whole transducer is R but not RL. • – N: Non-deleting transducer, which requires that every left-hand-side variable also appear on the right-hand side. A deleting R-transducer can simply delete a subtree (without inspecting it). The transducer in Figure 4 is the deleting kind, because of rules 34-39. It would also be deleting if it included a rule for dropping English determiners, e.g., q NP(x0, x1) q x1. • – D: Deterministic transducer, with a maximum of one production per <state, symbol> pair. • – T: Total transducer, with a minimum of one production per <state, symbol> pair. • – PDTT: Push-down tree transducer, the transducer analog of CFTG [36]. • – subscript: Regular-lookahead transducer, which can check to see if an input subtree is tree-regular, i.e., whether it belongs to a specified RTL. Productions only fire when their lookahead conditions are met.

Advanced Tree Transducers for Machine Translation

Advanced Tree Transducers for Machine Translation

Presentation Transcript

Transducers

Pressure Transducers

DIGITAL TRANSDUCERS

Transducers

Transducers

Transducers

Transducers

Higher-Order Tree Transducers and Their Expressive Power

Transducers

Streaming Tree Transducers

THE COMPLEXITY OF TRANSLATION MEMBERSHIP FOR MACRO TREE TRANSDUCERS

TRANSDUCERS

TRANSDUCERS

Training Tree

Multi-Return Macro Tree Transducers

Learning transducers

ultrasound Transducers

Transducers

TRANSDUCERS

TRANSDUCERS