1 / 29

HPSG Alpino System

Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester 2011-2012 Jesús Calvillo. HPSG Alpino System . Outline. Introduction Overview Part of Speech Tagging Lexical Ambiguity HMM Tagger Tagger Training Results Disambiguation Component Parsing

brina
Download Presentation

HPSG Alpino System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester 2011-2012 Jesús Calvillo HPSG Alpino System

  2. Outline • Introduction • Overview • Part of Speech Tagging • Lexical Ambiguity • HMM Tagger • Tagger Training • Results • Disambiguation Component • Parsing • Recovery of Best Parse • Accuracy • References

  3. Introduction • What is Alpino? • Computational Analyzer for Dutch. • Exploits Knowledge-based (HPSG-grammar and -lexicon) and Corpus-based Technologies. • Aims at accurate, full parsing of unrestricted text, with coverage and accuracy comparable to state-of-the-art parsers for English.

  4. Introduction • Grammar • Wide Coverage Computational HPSG. • About 600 construction specific rules. Rather than general rule schemata and abstract linguistic principles. • Lexicon • About 100,000 entries and 200,000 named entities. • Lexical rules for dates, temporal expressions, etc. • Large variety of unknown word heuristics. • Morphological constructor.

  5. Overview

  6. POS Tagging • Lexical ambiguity has an important negative effect on parsing efficiency. • In some cases, a category assigned is obviously wrong. • I called the man up • I called the man • Application of hand-written rules relies on human experts and is bound to have mistakes.

  7. POS Tagging • Training corpus used by the tagger is labeled by the parser itself (unsupervised learning). • Not forced to disambiguate all words. It only removes about half of the tags assigned by the dictionary. • Resulting System can be much faster, while parsing accuracy actually increases slightly.

  8. HMM Tagger • Variant of a standard trigram HMM tagger • To Discard tags: Compute probabilities for each tag individually: α and β are the forward and backward probabilities as defined: is the total probability of all paths through the model that end at tag t at position i; is the total probability of all paths starting at tag t in position i, to the end.

  9. HMM Tagger • After calculating all the probabilities for all the potential tags... A tag t on position i is removed if there is another t´, such that: is a constant threshold value.

  10. Training the Tagger • Training Corpus constructed by the parser. • Running the parser on a large set of example sentences, and collecting the sequences of lexical category classes that were used by what the parser believed to be the best parse. • Contains Errors. It does not learn the “correct” lexical categorysequences, butrather which sequences are favored by the parser. • Corpus: 4 years of Dutch daily newspaper text. Using only “easy” sentences (sentences <20 words or sentences that take <20 secs of CPU time)

  11. Experimental Results • Applied to the first 220 sentences of the Alpino Treebank. 4 Sentences were removed. • Low threshold -> small number of tags -> fast parsing • High threshold -> higher accuracy -> decrease efficiency. • If all lexical categories for a given sentence are allowed, then the parser can can almost always find a single (but sometimes bad) parse. • If the parser is limited to the more plausible lexical categories, it will more often come up with a robust parse containing two or more partiall parses. • A modest decrease in coverage results in a modest increase in accuracy. • Best threshold: 4.25

  12. Disambiguation Component • Simple rule frequency methods known from context free parsing cannot be used directly for HPSG-like formalism, since these methods rely crucially on the statistical independence of context-free rule applications. • Solution: Maximum Entropy Models.

  13. Stochastic Attribute Value Grammars • A typically large set of features of parses are identified. They distinguish “good” parses from “bad” parses. • Parses represented as vectors. Each cell contains the frequency of a particular feature (40,000 in Alpino). • The features encode: • rule names, • local trees of rule names, • pairs of words and their lexical category, • lexical dependencies between words, etc. Among them a variety of more global syntactic features exists: • features to recognize whether the coordinations are parallel in structure, • features which recognize whether the dependency in a WH-question or a relative clause is local or not, etc.

  14. Stochastic Attribute Value Grammars • In training, a weight is established for each feature indicating that parses containing the corresponding feature should be preferred or not. • The parse evaluation function is the sum of the counts of the frequency of each feature times the weight of the features. • The parse with the largest sum is the best parse. • Drawback: If we train the model, we need access to all parses of a corpus sentence.

  15. Stochastic Attribute Value Grammars • It suffices to train on the basis of representative samples of parses for each training sentence. (Osborne,2000) • Any sub-sample of the parses in the training data which yields unbiased estimates of feature expectations should result in as accurate a model as the complete set of parses.

  16. Dependency Problem Problem: Alpinotreebank contains correct Dependency Structures. • Dependency Structures abstract away from syntactic details. • The training data should contain the full parse as produced by the grammar. Possible Solution: Use the grammar to parse a given sentence and then select the parse with the correct dependency structure. However, the parser will not always be able to produce a parse with the correct dependency structure.

  17. Dependency Problem • Mapping the accuracy of a parse to the frequency of that parse in the training data. • Rather than distinguishing correct and incorrect, we determine the “quality” of each parse: Concept Accuracy (CA) • is the number of relations produced by the parser for sentence i, is the number of relations in the treebank parse , and is the number of incorrect and missing relations produced by the parser. • Thus, if a parse has a CA of 85%, we add the parse to the training data marked with a weight of 0.85.

  18. Parse Forest • The left-corner parser constructs all possible parses. • The Parse Forest is a tree substitution grammar, which derives exactly all derivation trees of the input sentence. • Each tree in the tree substitution grammar is a left-corner spine.

  19. Example: “I see a man at home“

  20. Parse Forest

  21. Parse Forest

  22. Parse Forest

  23. Parse Forest

  24. Best Parse Recovery For each state in the search space maintain only the b best candidates, where b is a small integer (the beam). If the beam is decreased, we run a larger risk of missing the best parse (the result will typically still be a “good” parse); if the beam is increased, then the amount of computation increases.

  25. Beam Recover

  26. Effect of Beam Size

  27. Accuracy • Alpino: development set optimized. • CLEF: Dutch questions from the CLEF Questioning Answering competition (2003,2004 and 2005). • Trouw: First 1400 sentences of the Trouw 2001 newspaper, from the Twente News corpus.

  28. References • [Mal04] Robert Malouf and Gertjan van Noord. Wide coverage parsing with stochastic attribute value grammars. In Proceedings of the IJCNLP-04 workshop: beyond shallow analyses - formalisms and statistical modeling for deep analyses, Hainan Island, China, 2004. • [van06]Gertjan van Noord. At Last Parsing Is NowOperational. In Actes de la 13e conference sur le traitement automatique des langues naturelles (TALN 2006), pages 20–42, Leuven, Belgium, 2006.

  29. Questions??

More Related