1 / 11

On the Consistency of Probabilistic Context-Free Grammars

On the Consistency of Probabilistic Context-Free Grammars. C ă t ă lin-Ionu ţ T î rn ă uc ă GRLMC, Rovira i Virgili University c atalin@isi.edu catalinionut.tirnauca@estudiants.urv.cat. Probabilistic languages. NOTE: The slides are just a survey of previous work on the subject.

ulema
Download Presentation

On the Consistency of Probabilistic Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Consistency of Probabilistic Context-Free Grammars Cătălin-Ionuţ Tîrnăucă GRLMC, Rovira i Virgili University catalin@isi.edu catalinionut.tirnauca@estudiants.urv.cat

  2. Probabilistic languages NOTE: The slides are just a survey of previous work on the subject. • Probabilistic languages are languages with a probability attached to each string in the language and have a probability distribution: the sum of probabilities of all strings should be 1. • This condition is often called consistency when the strings are generated by grammars. ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  3. Are all PCFGs consistent? • Consider the PCFG • S  SS | 0.6 • S  a | 0.4 • P(ai)=Catalan(i) * 0.6i-1 * 0.4i, for i1. • The (total) string probability i1P(ai) is 2/3…so far away from 1. • It seems that we lose information….some of the production probabilities are not distributed to the terminal strings…How is this possible? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  4. Imagine the following worst-case scenario: S  SS | 0.99999 S  a | 0.00001, and an S shows up in a sentential form, the production S  SS is more likely to be used to rewrite S than S  a because of the way probabilities are attached. So two S’s appear instead of one. Each of them is likely to be replaced by 2 more S’s, and so on, and so on, multiply without bound. S  SS  aSSS  aSSSS  aSSSSS  aaSSSSSSSSS  …  aaaaSSS…SSSSS … never ending story Some derivations which are supposed to halt and produce strings fail to do so and the total weight mass is too small because of the missing strings. Can the problem of assigning probabilities to the rules in a way that guarantees consistency be solved? Keep following me and the slides . Losing information: infinite derivations ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  5. Some questions that we attacked • How can we determine that a PCFG is consistent? Is it decidable? • Is there any method to make a PCFG consistent while maintaining relative string probabilities? • Are automatically-learned PCFGs consistent? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  6. Consistency test • There are necessary but not sufficient conditions to test consistency. • We have to compute eigenvalues and pick the greatest value among them via modulus: • If this value is >1, then the PCFG is not consistent; • Otherwise, the PCFG is consistent. • This is hard to do (huge grammars) and not always possible by usual linear algebra (hence we have to approximate them by numerical analysis). • It is not always decidable if a PCFG is consistent: what happens if the total string probability is 0.99999999...? Or the eigenvalue is approximated by 1.00000…….1? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  7. Normalization => consistent PCFG I • The probability of a rule is normalized to the portion it represents of the total weight mass of derivations from the left-hand side nonterminal of the rule. • For example, the PCFG • S  SS | 0.6 • S  a | 0.4 after normalization becomes consistent PCFG • S  SS | 0.4 • S  a | 0.6 ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  8. S S S a a S S S S S S S S S S S a a a a a a a S Normalization => consistent PCFG II The probabilities of all derivations of terminal strings change by the same factor - the total mass weight (e.g. 0.4/0.6 = 2/3). Derivation ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  9. Consistent trained PCFGs • There are methods for the empirical estimation of PCFGs based on the optimization of some function on the probabilities of the observed data, methods that guarantee consistency: • Supervised maximum likelihood estimation: the data are fully observed (sample tree bank); the ML estimate of each probability rule is the ratio between the total number of occurrences of that production in the samples and the total number of occurrences of the head of the rule in the samples. • Unsupervised maximum likelihood estimation. ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  10. Conclusions and further ideas NO YES YES NOT YET I am  but ask Jon YES • Are all PCFGs consistent? • Is there a method to make a PCFG consistent? • Can we train PCFGs such that they will be consistent? • Can we bring something new to the tests? • Can we implement the results in a unitary framework in Tiburon? • Will it be useful for applications? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

  11. That’s all folks! Thank you! Graçias! Moltes gràcies! Thank you! Grazie! To the TREEWORLD people: Kevin, Jon, Steve To ISI and all the persons here To the other interns 謝謝你。 Arigato Danke! Spasiva! Mila esker! Mulţumesc! Kiitos ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

More Related