170 likes | 454 Views
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem. Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009. Introduction. Word-based & Phrase-based Machine Translation (MT) Statistical machine translation (SMT) Successful in practice
E N D
Phrase-Based Statistical Machine Translation as a Traveling SalesmanProblem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009
Introduction • Word-based & Phrase-based Machine Translation (MT) • Statistical machine translation (SMT) • Successful in practice • Open Source Moses, Google Translate, etc. • cette traduction automatique est curieuse (this automatic translation is curious) Biphrase table
Decoding Complexity • Decoding: Perform MT given models. • Translation, language, distortion, etc. • Word-based SMT is NP-hard • Any NP problem can be reduced to Travelling Salesman Problem (TSP) • Any TSP instance can be reduced to word-based SMT • It is in NP • So it is NP-complete • Kevin Knight. 1999. Decoding Complexity in Word-Replacement Translation Models. Computational Linguistics.
Goal • TSP is NP-complete • Word-based SMT is in NP • So SMT can be reduced to TSP, theoretically. • Goal • Reduce SMT to TSP • Directly apply existing TSP solvers to SMT
Traveling Salesman Problem • STSP (Symmetric TSP) • Most standard and studied • Undirected graph G on N nodes, where the edges carry real-valued costs. • Goal: find a Hamiltonian Circuit of minimal cost • ATSP (Asymmetric TSP) • Graph G is directed • Edges (i,j) and (j,i) may carry different costs
Traveling Salesman Problem (2) • SGTSP (Symmetric Generalized TSP) • Undirected graph G of |G| nodes • Given partition of these |G| nodes into m non-empty, disjoint clusters • Find a circular sequence of m nodes of minimal total cost, where each cluster is visited exactly once. C2 C1 C3 Cm C4
Traveling Salesman Problem (3) • AGTSP (Asymmetric Generalized TSP) • Directed SGTSP • Edges (i,j) and (j,i) may carry different costs • Reductions • SMT --> AGTSP • This paper • AGTSP --> ATSP • C. Noon and J.C. Bean. 1993. An efficient transformation of the generalized traveling salesman problem. INFOR, pages 39–44. • ATSP --> STSP • David L. Applegate et al, 2007. The Traveling Salesman Problem: A Computational Study (Princeton Series in Applied Mathematics). Princeton University Press, January.
Phrase-based Decoding as AGTSP • Translating the French sentence "cette traduction automatique est curieuse" into English. • Biphrase table
Clusters in AGTSP • Graph nodes are all the possible pairs (w, b). • b = biphrase, w = source word contained by b • biphrase ht contributes (cette, ht) and (traduction, ht) • Clusters are the subsets of the graph nodes that share a common source word w. • # of clusters = # of words in the sentence • 5 words in this case
Example Graph traduction cluster cette cluster Start cluster automatique cluster est cluster curieuse cluster
Transition Cost • Transition between nodes M and N • M is (w1, b) and N is (w2, b), and w1 and w2 are consecutive words in b. • Source side of b is "......w1w2...." • Cost = 0, because of same biphrase
Transition Cost • M is (w1, b1), where w is the rightmost source word in b1, and N = (w2, b2), where w2 is the leftmost source word in b2 • Meaning: combine biphrases b1 and b2 • Costs of b1 and b2 • Language model, translation model, etc. • Costs of combining them • Language model • Distortion model
Example Circuit This machine translation is strange Output: This machine translation is strange
Experiment 1 • Given English (target) word sequence in French (source) order. The goal is to reconstruct "bad English" into "good English" with pure language model. • One node for each cluster. • Example • this translation automatic is curious (cette traduction automatique est curieuse) • Reorder the sentence into this automatic translation is curious • Corpus • Training: 50000 sentences from NewsCommentary corpus • Testing: 170 sentences, average length is 17 words
Experiment 1 • Exact TSP solver (Concorde) vs. SMT (Moses) • Better performance for both bigram & trigram • Wrong sentence with higher score than correct sentence is possible Bigram Trigram
Experiment 2 • Machine Translation task • LK (Lin-Kernighan) TSP solver implemented in Concorde • Not exact solver, since node size is too large • Data: Europarl • Training: 2.81 million sents • Testing: 500 sents
Comment • Main contribution • Transform SMT to TSP • Directly solve MT with TSP solver • Problem • Experiment 1 • Word reordering is less practical • Experiment 2 • No significant test, diff(BLEU) < 1 • BLEU score is too low (30 in 2003) • Experiment • Sentence length (17) for test • Sentence number (170, 500) for test