On the Consistency of Probabilistic Context-Free Grammars

On the Consistency of Probabilistic Context-Free Grammars Cătălin-Ionuţ Tîrnăucă GRLMC, Rovira i Virgili University catalin@isi.edu catalinionut.tirnauca@estudiants.urv.cat

Probabilistic languages NOTE: The slides are just a survey of previous work on the subject. • Probabilistic languages are languages with a probability attached to each string in the language and have a probability distribution: the sum of probabilities of all strings should be 1. • This condition is often called consistency when the strings are generated by grammars. ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Are all PCFGs consistent? • Consider the PCFG • S  SS | 0.6 • S  a | 0.4 • P(ai)=Catalan(i) * 0.6i-1 * 0.4i, for i1. • The (total) string probability i1P(ai) is 2/3…so far away from 1. • It seems that we lose information….some of the production probabilities are not distributed to the terminal strings…How is this possible? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Imagine the following worst-case scenario: S  SS | 0.99999 S  a | 0.00001, and an S shows up in a sentential form, the production S  SS is more likely to be used to rewrite S than S  a because of the way probabilities are attached. So two S’s appear instead of one. Each of them is likely to be replaced by 2 more S’s, and so on, and so on, multiply without bound. S  SS  aSSS  aSSSS  aSSSSS  aaSSSSSSSSS  …  aaaaSSS…SSSSS … never ending story Some derivations which are supposed to halt and produce strings fail to do so and the total weight mass is too small because of the missing strings. Can the problem of assigning probabilities to the rules in a way that guarantees consistency be solved? Keep following me and the slides . Losing information: infinite derivations ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Some questions that we attacked • How can we determine that a PCFG is consistent? Is it decidable? • Is there any method to make a PCFG consistent while maintaining relative string probabilities? • Are automatically-learned PCFGs consistent? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Consistency test • There are necessary but not sufficient conditions to test consistency. • We have to compute eigenvalues and pick the greatest value among them via modulus: • If this value is >1, then the PCFG is not consistent; • Otherwise, the PCFG is consistent. • This is hard to do (huge grammars) and not always possible by usual linear algebra (hence we have to approximate them by numerical analysis). • It is not always decidable if a PCFG is consistent: what happens if the total string probability is 0.99999999...? Or the eigenvalue is approximated by 1.00000…….1? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Normalization => consistent PCFG I • The probability of a rule is normalized to the portion it represents of the total weight mass of derivations from the left-hand side nonterminal of the rule. • For example, the PCFG • S  SS | 0.6 • S  a | 0.4 after normalization becomes consistent PCFG • S  SS | 0.4 • S  a | 0.6 ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

S S S a a S S S S S S S S S S S a a a a a a a S Normalization => consistent PCFG II The probabilities of all derivations of terminal strings change by the same factor - the total mass weight (e.g. 0.4/0.6 = 2/3). Derivation ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Consistent trained PCFGs • There are methods for the empirical estimation of PCFGs based on the optimization of some function on the probabilities of the observed data, methods that guarantee consistency: • Supervised maximum likelihood estimation: the data are fully observed (sample tree bank); the ML estimate of each probability rule is the ratio between the total number of occurrences of that production in the samples and the total number of occurrences of the head of the rule in the samples. • Unsupervised maximum likelihood estimation. ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

Conclusions and further ideas NO YES YES NOT YET I am  but ask Jon YES • Are all PCFGs consistent? • Is there a method to make a PCFG consistent? • Can we train PCFGs such that they will be consistent? • Can we bring something new to the tests? • Can we implement the results in a unitary framework in Tiburon? • Will it be useful for applications? ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

That’s all folks! Thank you! Graçias! Moltes gràcies! Thank you! Grazie! To the TREEWORLD people: Kevin, Jon, Steve To ISI and all the persons here To the other interns 謝謝你。 Arigato Danke! Spasiva! Mila esker! Mulţumesc! Kiitos ISI/USC, Marina del Rey, 08/22/2008: C. Tîrnăucă, On the Consistency of PCFGs

On the Consistency of Probabilistic Context-Free Grammars

On the Consistency of Probabilistic Context-Free Grammars

Presentation Transcript

Context-Free Grammars

Context-Free Grammars

Context-Free Grammars

Context-Free Grammars

More on Context Free Grammars

Context-Free Grammars

Context-Free Grammars

Context-Free Grammars

Context Free Grammars

Probabilistic Context Free Grammars

Context-free Grammars

Natural Language Processing : Probabilistic Context Free Grammars

Natural Language Processing : Probabilistic Context Free Grammars

Authorship Attribution Using Probabilistic Context-Free Grammars

Context Free Grammars

Context-Free Grammars

Context-Free Grammars

CONTEXT-FREE GRAMMARS

Context-Free Grammars

Context-Free Grammars