Daniel gildea 2003 loosely tree based alignment for machine translation
1 / 27

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation - PowerPoint PPT Presentation

  • Uploaded on

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation. Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006. Overview. Gildea presents an alignment model he describes as “loosely tree-based” Builds on Yamada & Knight (2001), a tree-to-string model

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation' - macayle-faughnan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Daniel gildea 2003 loosely tree based alignment for machine translation

Daniel Gildea (2003):Loosely Tree-Based Alignment for Machine Translation

Linguistics 580(Machine Translation)Scott Drellishak, 2/21/2006


  • Gildea presents an alignment model he describes as “loosely tree-based”

  • Builds on Yamada & Knight (2001), a tree-to-string model

  • Gildea extends it with a clone operation, and also into a tree-to-tree model

  • Wants to keep performance reasonable (polynomial in sentence length)

  • Background

  • Tree-to-String Model

  • Tree-to-Tree Model

  • Experiment


  • Historically, two approaches to MT: transfer-based and statistical

  • More recently, though, hybrids

  • Probabilistic models of structured representations:

    • Wu (1997) Stochastic Inversion Transduction Grammars

    • Alshawi et. al. (2000) Head Transducers

    • Yamada & Knight (2001) (see below)

Gildea s proposal
Gildea’s Proposal

  • Need to handle drastic changes to trees (real bitexts aren’t isomorphic)

  • To do this, Gildea adds a new operation to the Y&K’s model: subtree clone

  • This operation clones a subtree from the source tree to anywhere in the target tree.

  • Gildea also proposes a tree-to-tree model that uses parallel tree corpora.

  • Background

  • Tree-to-String Model

  • Tree-to-Tree Model

  • Experiment

Yamada and knight 2001
Yamada and Knight (2001)

  • Y&K’s model is tree-to-string: the input is a tree and output is a string of words.

  • (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!)

Y k tree to string model
Y&K Tree-to-String Model

  • Three steps to turn input into output:

    • Reorder the children of each node (for m nodes, m! orderings; conditioned only on the category of the node and its children)

    • Optionally insert words at each node either before or after all the children (conditioned only on foreign word)

    • Translate words at leaves (conditioned on P(f|e); words can translate to NULL)

Aside y k suitability
Aside: Y&K Suitability

  • Recall that this model was used for translating English to Japanese.

  • Their model is well-suited to this language pair:

    • Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these.

    • Japanese marks subjects/topics and objects with postpositions. Insertion handles this.

Y k em algorithm
Y&K EM Algorithm

  • EM algorithm estimates inside probabilities β bottom-up:

    for all nodes εiin input tree T do for all k, l such that 1 < k < l < N do for all orderings ρof the children ε1… εmof εido for all partitions of span k, l into k1, l1…km, lmdo

    end for end for end forend for

Y k performance
Y&K Performance

  • Computation complexity O(|T|Nm+2), where T = tree, N = input length, m = fan-out of the grammar

  • “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n3m!2m)

  • Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n

  • If |T| is O(n) then the whole thing is O(n4)

Y k drawbacks
Y&K Drawbacks

  • No alignments with crossing brackets:


    B Z

    X Y

  • XZY and YZX are impossible

  • Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases

Adding clone
Adding Clone

  • Gildea adds clone operation to Y&K’s model

  • For each node, allow the insertion of a clone of another node as its child.

  • Probability of cloning εi under εj in two steps:

    • Choice to insert:

    • Node to clone:

  • Pclone is one estimated number, Pmakeclone is constant (all nodes equally probable, reusable)

  • Background

  • Tree-to-String Model

  • Tree-to-Tree Model

  • Experiment

Tree to tree model
Tree-to-Tree Model

  • Output is a tree, not a string, and it must match the tree in the target corpus

  • Add two new transformation operations:

    • one source node → two target nodes

    • two source nodes → one target node

  • “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.”

Calculating probability
Calculating Probability

  • From the root down. At each level:

    • At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children)

    • Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together.

    • Lexical leaves translated as before.

Elementary trees
Elementary Trees?

  • Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together:

    A A

    B Z → X Z Y

    X Y

Em algorithm
EM algorithm

  • Estimates inside probabilities β bottom-up:

    for all nodes εain source tree Ta in bottom-up order do for all elementary trees ta rooted in εado for all nodes εb in target tree Tb in bottom-up order do for allelementary trees tb rooted in εbdo for all alignments α of the children of ta and tbdo

    end forend for end for end forend for


  • Outer two loops are O(|T|2)

  • Elementary trees include at most one child, so choosing e-trees is O(m2)

  • Alignment is O(22m)

  • Which nodes to insert or clone is O(22m)

  • How to reorder is O((2m)!)

  • Overall: O(|T|2m242m(2m)!), quadratic (!) in size of the input sentence.

Tree to tree clone
Tree-to-Tree Clone

  • Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non-isomorphism”

  • So, as before, add a clone operation

  • Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform)

  • Background

  • Tree-to-String Model

  • Tree-to-Tree Model

  • Experiment

The data
The Data

  • Parallel Korean-English corpus

  • Trees annotated by hand on both sides

  • “in this paper we will be using only the Korean trees, modeling their transformation into the English text.”

  • (That can’t be right—only true for TTS?)

  • 5083 sentence: 4982 training, 101 eval

Aside suitability
Aside: Suitability

  • Recall that Y&K’s model was suited to the English-to-Japanese task.

  • Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair?

  • In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related).


  • Alignment Error Rate Och & Ney (2000):

Results detailed
Results Detailed

  • The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform

  • Best results when Pins set to 0.5 rather than estimated (!)

  • “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall”

How d tts and ttt do
How’d TTS and TTT Do?

  • The best results were with tree-to-string, surprisingly

  • Y&K + clone was ≈ to IBM, fixing Pins was best overall

  • Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic)

  • Still, disappointing results for TTT


  • Model allows syntactic info to be used for training without ordering constraints

  • Clone operations improve alignment results

  • Tree-to-tree + clone is better only in performance (but he’s hopeful)

  • Future directions: bigger corpora, conditioning on lexicalized trees