1 / 15

CSA2050: Introduction to Computational Linguistics

CSA2050: Introduction to Computational Linguistics. Evaluation Criteria for CFGs Limitations of CFGs Introduction to PATR2. Weak and Strong Equivalence. A grammar/lexicon G generates a characteristic language L(G) Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2)

yamin
Download Presentation

CSA2050: Introduction to Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA2050: Introduction to Computational Linguistics Evaluation Criteria for CFGs Limitations of CFGs Introduction to PATR2 CSA2050: CFG Limitations

  2. Weak and Strong Equivalence • A grammar/lexicon G generates a characteristic language L(G) • Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2) • A grammar G also assigns one or more phrase structures to any s in L(G) • Weakly equivalent grammars G1 and G2 are said to be strongly equivalent if in addition they assign identical phrase structures to any s in L(G1). CSA2050: CFG Limitations

  3. Weak Equivalence • A grammar should generate all and only sentences in the language under investigation. • Let H be language under investigation and G be the grammar we are developing. • The grammar should generate allsentences in the language, i.e. for any s in H, s is also in L(G). • The grammar should generate onlysentences in the language, i.e. for any s in L(G), s is also in H. CSA2050: CFG Limitations

  4. Overgeneration • Basic Problem: L(G) is larger than H • There are sentences generated by the grammar that are not in H. • The “only” constraint is violated. • The grammar is too weak. • Example: a grammar which ignores number and gender CSA2050: CFG Limitations

  5. Undergeneration • Basic Problem: H is larger than L(G) • There are sentences in H that are not generated by the grammar. • The “all” constraint is violated. • The grammar is too strong. • Example: a grammar which lacks recursion. CSA2050: CFG Limitations

  6. Appropriate Structure • The structure assigned by the grammar should be appropriate. • The structure should • Be understandable • Allow us to make generalisations. • Reflect the underlying meaning of the sentence. CSA2050: CFG Limitations

  7. Ambiguity • A grammar is ambigious if it assigns two or more structures to the same sentence. • The grammar should not generate too many possible structures for the same sentence. • There is a tradeoff between ambiguity and clarity: too much detail can obscure the design principles. • Too little detail means that the grammar is undercommitted, CSA2050: CFG Limitations

  8. Limitations of CF Grammars • Simple CF Grammars tend to overgenerate • The only mechanism available to control overgeneration is to invent new categories. • Proliferation of categories soon becomes intractable. Problems include • Size of grammar • Understandability of grammar CSA2050: CFG Limitations

  9. Criteria for Evaluating Grammars • Does it undergenerate? • Does it overgenerate? • Does it assign appropriate structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or special cases? • How ambiguous is it? How many structures for a given sentence? CSA2050: CFG Limitations

  10. CF Phrase Structure Rules s → np vp np → d N vp → V vp → V np (4 rules) • Nice grammar – but it overgenerates • Solution – invent more categories nps, nppl, vpsn, vppl etc. CSA2050: CFG Limitations

  11. s -> nps vps s -> nppl vppl nps -> DS NS nppl -> DPL NPL vps -> VS vps -> VS nps vps -> VS nppl vppl -> VPPL vppl -> VPPL nps vppl -> VPPL nppl (10 rules) CF Phrase Structure Ruleswith Number Agreement CSA2050: CFG Limitations

  12. Constraints andInformation Structures • PATR2 handles this problem by augmenting CF rules with constraints between constituents. • Basic idea is that each constituent of a CF rule is associated with an information structure • We then express constraints between information structures. CSA2050: CFG Limitations

  13. Example of a PATR rulewith Number Constraints Rule s -> np vp <npnum> = <vpnum> <snum> = <npnum> CSA2050: CFG Limitations

  14. Example of a Grammarwith Number Constraints s -> np vp <np num> = <vp num> <s num> = <np num> np -> D N <np num> = <D num> <D num> = <N num> vp -> V <vp num> = <V num> CSA2050: CFG Limitations

  15. Summary • Pure CFGs become unwieldy when we try to constrain them to incorporate, for example, agreement information • PATR2 deals with this problem by associating information structures and constraints with each rule constituent. • Information structures are often referred to as F-structures. CSA2050: CFG Limitations

More Related