1 / 25

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing. Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group. Paper info. Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

leann
Download Presentation

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghuali Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group

  2. Paper info Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing ACL-08 Long Paper Cited :Thirty Seven Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea

  3. Core Ideas • VariationalBayes • Tic-tac-toe pruning • Word-to-phrase bootstrapping

  4. Outline • Paper present • Pipeline • Model • Training • Parsing (Pruning) • Result • Shortcomings • Discussion

  5. Summary of the Pipeline • Run IBM Model 1 on sentence-aligned data • Use tic-tac-toe pruning to prune the bitext space • Word-based ITG , VariationalBayes training , get the Viterbi alignment • Non-compositional constraints to constrain the space of phrase pairs • Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment

  6. Phrasal Inversion Transduction Grammar

  7. Dirichlet Prior for Phrasal ITG

  8. Review : Inside-Outside Algorithm root Shujieliu i Forward-backward Algorithm: not only used for HMM, but also for any State Space Model 0/0 s/u t/v T/V Inside-Outside Algorithm is a special case of Forward-backward Algorithm. X1 …….. Xn-1 Zn Xn+1 …….. XN …….. …….. ……..

  9. VB Algorithm for Training SITGs - E1 Copy from liu • Inside probabilities : Initialization : Recursion : i(s/u-t/v) j(s/u-S/U) k(S/U-t/v) s/u t/v S/U

  10. VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u

  11. VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u

  12. VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u

  13. VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u

  14. VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) j(s/u-t/v) k(S/U-s/u) i(S/U-s/u) i(s/u-t/v) k(s/u-t/v) t/v S/U s/u S/U t/v s/u

  15. VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) j(s/u-t/v) k(S/U-s/u) i(S/U-s/u) i(s/u-t/v) k(s/u-t/v) t/v S/U s/u S/U t/v s/u

  16. VB Algorithm for Training SITGs - M • s=3 , is the number of right-hand-sides for X • m is the number of observed phrase pairs • ψ is the digamma function

  17. Pruning • Tic-tac-toe pruning (HaoZ hang 2005) • Fast Tic-tac-toe pruning (Hao Z hang 2008) • High-precision alignments pruning (HaghighiACL2009) • Prune all bitext cells that would invalidate more than 8 of high-precision alignments • 1-1 alignment posterior pruning (HaghighiACL2009) • Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models

  18. Tic-tac-toe pruning (Hao Z hang 2005)

  19. Non-compositional Phrases Constraint e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring

  20. Word Alignment Evaluation • Both 10 iterations training • EM : lowest AER is achieved after the second iteration , which is 0.40 . At iteration 10, AER for EM increase to 0.42 • VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.

  21. End-to-end Evaluation NIST Chinese-English training data NIST 2002 evaluation datasets for tuning and evalution 10-reference development set was used for MERT 4-reference test set was used for evaluation.

  22. Shortcomings • Grammar is not perfect • Itgordering is context independent • Phrasal pairs are sparse

  23. Grammar is not perfect • Over-counting problem • alternative ITG parse trees have the same word alignment matching, which is called over-counting problem. ITG Parser Tree Space Word Alignment Space I am rich ! ^^ vv

  24. A better-constrained grammar • A series of nested constituents with the same orientation will always have a left-heavy derivation • And the second parser tree of the former example will not be generated. ? B -> <A B> B -> <C C> A -> [C C] C->1/3 C->2/4 C-> 3/2 C-> 4/1

  25. Thanks Q&A

More Related