Loading in 2 Seconds...

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation

Loading in 2 Seconds...

108 Views

Download Presentation
##### Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Daniel Gildea (2003):Loosely Tree-Based Alignment for**Machine Translation Linguistics 580(Machine Translation)Scott Drellishak, 2/21/2006**Overview**• Gildea presents an alignment model he describes as “loosely tree-based” • Builds on Yamada & Knight (2001), a tree-to-string model • Gildea extends it with a clone operation, and also into a tree-to-tree model • Wants to keep performance reasonable (polynomial in sentence length)**Background**• Tree-to-String Model • Tree-to-Tree Model • Experiment**Background**• Historically, two approaches to MT: transfer-based and statistical • More recently, though, hybrids • Probabilistic models of structured representations: • Wu (1997) Stochastic Inversion Transduction Grammars • Alshawi et. al. (2000) Head Transducers • Yamada & Knight (2001) (see below)**Gildea’s Proposal**• Need to handle drastic changes to trees (real bitexts aren’t isomorphic) • To do this, Gildea adds a new operation to the Y&K’s model: subtree clone • This operation clones a subtree from the source tree to anywhere in the target tree. • Gildea also proposes a tree-to-tree model that uses parallel tree corpora.**Background**• Tree-to-String Model • Tree-to-Tree Model • Experiment**Yamada and Knight (2001)**• Y&K’s model is tree-to-string: the input is a tree and output is a string of words. • (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!)**Y&K Tree-to-String Model**• Three steps to turn input into output: • Reorder the children of each node (for m nodes, m! orderings; conditioned only on the category of the node and its children) • Optionally insert words at each node either before or after all the children (conditioned only on foreign word) • Translate words at leaves (conditioned on P(f|e); words can translate to NULL)**Aside: Y&K Suitability**• Recall that this model was used for translating English to Japanese. • Their model is well-suited to this language pair: • Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. • Japanese marks subjects/topics and objects with postpositions. Insertion handles this.**Y&K EM Algorithm**• EM algorithm estimates inside probabilities β bottom-up: for all nodes εiin input tree T do for all k, l such that 1 < k < l < N do for all orderings ρof the children ε1… εmof εido for all partitions of span k, l into k1, l1…km, lmdo end for end for end forend for**Y&K Performance**• Computation complexity O(|T|Nm+2), where T = tree, N = input length, m = fan-out of the grammar • “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n3m!2m) • Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n • If |T| is O(n) then the whole thing is O(n4)**Y&K Drawbacks**• No alignments with crossing brackets: A B Z X Y • XZY and YZX are impossible • Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases**Adding Clone**• Gildea adds clone operation to Y&K’s model • For each node, allow the insertion of a clone of another node as its child. • Probability of cloning εi under εj in two steps: • Choice to insert: • Node to clone: • Pclone is one estimated number, Pmakeclone is constant (all nodes equally probable, reusable)**Background**• Tree-to-String Model • Tree-to-Tree Model • Experiment**Tree-to-Tree Model**• Output is a tree, not a string, and it must match the tree in the target corpus • Add two new transformation operations: • one source node → two target nodes • two source nodes → one target node • “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.”**Calculating Probability**• From the root down. At each level: • At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) • Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. • Lexical leaves translated as before.**Elementary Trees?**• Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: A A B Z → X Z Y X Y**EM algorithm**• Estimates inside probabilities β bottom-up: for all nodes εain source tree Ta in bottom-up order do for all elementary trees ta rooted in εado for all nodes εb in target tree Tb in bottom-up order do for allelementary trees tb rooted in εbdo for all alignments α of the children of ta and tbdo end forend for end for end forend for**Performance**• Outer two loops are O(|T|2) • Elementary trees include at most one child, so choosing e-trees is O(m2) • Alignment is O(22m) • Which nodes to insert or clone is O(22m) • How to reorder is O((2m)!) • Overall: O(|T|2m242m(2m)!), quadratic (!) in size of the input sentence.**Tree-to-Tree Clone**• Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non-isomorphism” • So, as before, add a clone operation • Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform)**Background**• Tree-to-String Model • Tree-to-Tree Model • Experiment**The Data**• Parallel Korean-English corpus • Trees annotated by hand on both sides • “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” • (That can’t be right—only true for TTS?) • 5083 sentence: 4982 training, 101 eval**Aside: Suitability**• Recall that Y&K’s model was suited to the English-to-Japanese task. • Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? • In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related).**Results**• Alignment Error Rate Och & Ney (2000):**Results Detailed**• The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform • Best results when Pins set to 0.5 rather than estimated (!) • “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall”**How’d TTS and TTT Do?**• The best results were with tree-to-string, surprisingly • Y&K + clone was ≈ to IBM, fixing Pins was best overall • Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) • Still, disappointing results for TTT**Conclusions**• Model allows syntactic info to be used for training without ordering constraints • Clone operations improve alignment results • Tree-to-tree + clone is better only in performance (but he’s hopeful) • Future directions: bigger corpora, conditioning on lexicalized trees