1 / 35

Local Search for Optimal Permutations

with Very Large-Scale Neighborhoods. Local Search for Optimal Permutations. in Machine Translation. Jason Eisner and Roy Tromble. Motivation. MT is really easy! Just use a finite-state transducer! Phrases, morphology, the works!. 1. 1. 4. 5. 5. 6. 6. 3. 2.

elijahlopez
Download Presentation

Local Search for Optimal Permutations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. with Very Large-Scale Neighborhoods Local Searchfor Optimal Permutations in Machine Translation Jason Eisner and Roy Tromble

  2. Motivation • MT is really easy! • Just use a finite-state transducer! • Phrases, morphology, the works! Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  3. 1 1 4 5 5 6 6 3 2 4 3 2 Mary hasn’t seen me Permutation search in MT NNP Marie NNP Marie NEG ne NEG ne PRP m’ PRP m’ AUX a AUX a NEG pas NEG pas VBN vu VBN vu initial order(French) best order(French’) easy transduction Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  4. Motivation • MT is really easy! • Just use a finite-state transducer! • Phrases, morphology, the works! • Have just to fix that pesky word order. Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  5. Often want to find an optimal permutation … • Machine translation: Reorder French to French-prime (Brown et al. 1992) So it’s easier to align or translate • MT eval: How much do you need to rearrange MT output so it scores well under an LM derived from ref translations? • Discourse generation, e.g., multi-doc summarization: Order the output sentences (Lapata 2003) So they flow nicely • Reconstruct temporal order of eventsafter info extraction • Learn rule ordering or constraint ranking for phonology? • Multi-word anagrams that score well under a LM Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  6. 1 1 4 5 5 6 6 3 4 2 3 2 How can we find this needlein the haystack of N!possible permutations? Permutation search: The problem initial order best orderaccording tosome costfunction Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  7. 1 1 4 5 5 6 6 3 4 2 3 2 4 before 3 …? 1…2…3? Cost models These costs are enough to encodeTraveling Salesperson Many other NP-complete problems IBM Model 4 and more … initial order cost of this order: • Does my favorite WFSA like it as a string? • Non-local pair order ok? • Non-local triple order ok? • Add these all up … Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  8. state remembers what we’ve generated so far(but not in what order) arc weight = cost of picking 5 next if we’ve seen {1,2,4} so far Traditional approach: Beam search Approx. best path through a really big FSA N! paths: one for each permutation only 2N states Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  9. An alternative: Local search (Germann et al. 2001) The “swap” neighborhood 1 3 2 4 5 6 cost=20 2 1 3 4 5 6 cost=26 1 2 3 4 5 6 cost=22 1 2 4 3 5 6 cost=19 1 2 3 4 5 6 cost=22 1 2 3 5 4 6 cost=25 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  10. An alternative: Local search (Germann et al. 2001) The “swap” neighborhood 1 2 3 4 5 6 cost=22 1 2 4 3 5 6 cost=19 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  11. cost=19 cost=17 cost=16. . . An alternative: Local search The “swap” neighborhood 1 4 5 6 2 3 cost=22 Why are the costs always going down? How long does it take to pick your swap? How many swaps might you need to reach answer? What if you get stuck in a local min? we pick bestswap O(N)*O(1)? O(N2) random restarts Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  12. Hill-climbing vs. random walks 1 3 2 4 5 6 cost=20 2 1 3 4 5 6 cost=26 1 2 3 4 5 6 cost=22 1 2 4 3 5 6 cost=19 1 2 3 4 5 6 cost=22 1 2 3 5 4 6 cost=25 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  13. 4 5 cost=17 Larger neighborhoods – fewer local mins? (Germann et al. 2001, Germann 2003) “Jump” neighborhood • Now we can get to our destination in O(N) steps instead of O(N2)  • But each step has to consider O(N2) neighbors instead of O(N)  • Push the runtime down here, it pops up there … • Can we do better? 1 6 2 3 cost=22 Yes! Consider exponentially manyneighbors by dynamic programming Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  14. = swap children Let’s define each neighbor by a tree 1 4 5 6 2 3 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  15. = swap children 1 4 5 6 2 3 Let’s define each neighbor by a tree Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  16. = swap children 5 6 2 3 Let’s define each neighbor by a tree 1 4 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  17. If that was the optimal neighbor … … now look for its optimal neighbor new tree! 1 5 6 4 2 3 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  18. 1 5 6 4 2 If that was the optimal neighbor … … now look for its optimal neighbor new tree! 3 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  19. If that was the optimal neighbor … … now look for its optimal neighbor … repeat till reach local optimum At each step, consider all possible treesby dynamic programming (CKY parsing) 1 4 5 6 2 3 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  20. 1 1 4 5 5 6 6 3 4 2 3 2 Dynamic program must pick the tree that leads to the lowest-cost permutation initial order cost of this order: • Does my favorite WFSA like it as a string? Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  21. A bigram model as a WFSA After you read 1, you’re in state 1 After you read 2, you’re in state 2 After you read 3, you’re in state 3 … and this state determines the cost of the next symbol you read Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  22. 2 4 2 61 42 23 14 I5 56 Including WFSA costs via nonterminals A possible preterminal for word 2is an arc in A that’s labeled with 2. The preterminal 42 rewrites as word 2 with a cost equal to the arc’s cost. 4 5 6 1 2 3 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  23. This constituent’s total cost is the total cost of the best 63 path . I3 I3 I3 cost of the new permutation . 4 6 5 1 2 3 4 1 2 3 6 I 5 1 4 2 3 6 1 4 2 3 63 63 63 13 43 I6 I6 I6 61 61 42 42 23 23 14 14 I5 I5 56 56 4 4 5 5 6 6 1 1 2 2 3 3 Including WFSA costs via nonterminals Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  24. 1 1 4 5 6 5 6 3 4 2 3 2 4 before 3 …? Dynamic program must pick the tree that leads to the lowest-cost permutation initial order cost of this order: • Does my favorite WFSA like it as a string? • Non-local pair order ok? Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  25. 1 4 2 3 Incorporating the pairwise ordering costs This puts {5,6,7} before {1,2,3,4}. So this hypothesis must add costs 5 < 1, 5 < 2, 5 < 3, 5 < 4, 6 < 1, 6 < 2, 6 < 3, 6 < 4, 7 < 1, 7 < 2, 7 < 3, 7 < 4 Uh-oh! So now it takesO(N2) time to combine twosubtrees, instead of O(1) time? Nope – dynamic programmingto the rescue again! 5 6 7 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  26. 1 4 2 3 already computed at earlier steps of parsing Incorporating the pairwise ordering costs This puts {5,6,7} before {1,2,3,4}. So this hypothesis must add costs 5 6 7 = + - + Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  27. Incorporating 3-way ordering costs • See the paper … • A little tricky, but • comes “for free” if you’re willing to accept a certain restriction on these costs • more expensive without that restriction, but possible Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  28. How many steps to get from here to there? initial order 6 2 8 4 7 5 3 1 One twisted-tree step? Not always … (Dekai Wu) 1 2 4 5 7 3 6 8 best order Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  29. Can you get to the answer in one step? German-English, Giza++ alignment not always(yay, local search) often(yay, big neighborhood) Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  30. How many steps to the answer in the worst case? (what is diameter of the search space?) 6 2 8 4 7 5 3 1 claim: only log2N steps at worst (if you know where to step) Let’s sketch the proof! 1 2 4 5 7 3 6 8 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  31. Quicksort anything into, e.g., 1 2 3 4 5 6 7 8 right-branchingtree 6 2 8 4 7 5 3 1  4  5 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  32. sequence of right-branchingtrees 2 4 1 7 5 3 8 6 Quicksort anything into, e.g., 1 2 3 4 5 6 7 8 Only log2 N steps to get to 1 2 3 4 5 6 7 8 …… or to anywhere!  4  5  2  3  6  7 Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  33. Speedups (read the paper!) • We’re just parsing the current permutation as a string – and we know how to speed up parsers! • pruning • A* • best-first • coarse-to-fine • Can restrict to a subset of parse trees • Gives us smaller neighborhoods, quicker to search,but still exponentially large • Right-branching trees, asymmetric trees … • Note: Even w/o any of this, super-fast and effective on the LOP (no WFSA  no grammar const). Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  34. More on modeling (read the paper!) • Encoding classical NP-complete problems • Encoding translation decoding in general • Encoding IBM Model 4 • Encoding soft phrasal constraints via hidden bracket symbols • Costs that depend on features of source sentence • Training the feature weights Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

  35. Summary • Local search is fun and easy • Popular elsewhere in AI • Closely related to MCMC sampling • Probably useful for translation • Can efficiently use huge local neighborhoods • Algorithms are closely related to parsing and FSMs • We know that stuff better than anyone! Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006

More Related