Statistical Decision-Tree Models for Parsing

1 / 18

# Statistical Decision-Tree Models for Parsing - PowerPoint PPT Presentation

Statistical Decision-Tree Models for Parsing. NLP lab, POSTECH 김 지 협. Contents. Abstract Introduction Decision-Tree Modeling SPATTER Parsing Statistical Parsing Models Decision-Tree Growing &amp; Smoothing Decision-Tree Training Experiment Results Conclusion. Abstract.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Statistical Decision-Tree Models for Parsing' - laurel-mckee

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistical Decision-TreeModels for Parsing

NLP lab, POSTECH

김 지 협

Contents
• Abstract
• Introduction
• Decision-Tree Modeling
• SPATTER Parsing
• Statistical Parsing Models
• Decision-Tree Growing & Smoothing
• Decision-Tree Training
• Experiment Results
• Conclusion

CS730B

Abstract
• Syntactic NL parser: not adequate for highly-ambiguous large-vocabulary text (ex. Wall Street Journal)
• Premises for develop a new parser
• grammars too complex to develop manually for most domains
• parsing models must rely heavily on contextual information
• existing n-gram model: inadequate for parsing
• SPATTER: a statistical parser based on decision-tree model
• better than a grammar-based parser

CS730B

Introduction
• Parsing as making a sequence of disambiguation decisions
• The probability of a complete parse tree(T) of a sentence(S)
• Automatically discovering the rules for disambiguation
• Producing a parser without a complicated grammar
• Long-distance lexical information is crucial to disambiguate interpretations accurately

CS730B

Decision-Tree Modeling
• Comparison
• Grammarian: two crucial tasks for parsing
• identifying the features relevant to each decision
• deciding which choice to select based on the values of the features
• assigning a probability distribution to the possible choices, and providing a ranking system

CS730B

Continued
• What is a Statistical Decision Tree?
• A decision-making device assigning a probability to each of the possible choices based on the context of the decision
• P ( f | h ) , where f : an element of the future vocabulary

h : a history (the context of the decision)

• The probability determined by asking a sequence of questions
• i th question determined by the answers to the i - 1previous question
• Example: Part-of-speech tagging problem ( Figure 1 )

CS730B

Continued
• Decision Trees vs. n-grams
• Equivalent to an interpolated n - gram model in expressive power
• Model Parameterization
• n -gram model:
• n -gram model can be represented by decision-tree model ( n-1 questions )
• Example: part-of-speech tagging

CS730B

Continued
• Model Estimation
• n-gram model

CS730B

Continued
• decision-tree model
• decision-tree model can be represented by interpolated n- gram

CS730B

Continued
• Why use decision-tree?
• As n grows, the parameter space for an n-gram model grows exponentially
• On the other hand, the decision-tree learning algorithm increases the size of a model only as the training data allows
• So, it can consider much contextual information

CS730B

SPATTER Parsing
• SPATTER Representation
• Parse: as a geometric pattern
• 4 features in node: words, tags, labels, and extensions (Figure 3)
• The Parsing Algorithm
• Starting with the sentence’s words as leaves (Figure 3)
• Gradually tagging, labeling, and extending nodes
• Constraints
• Bottom-up, left-to-right
• No new node is constructed until its children completed
• Using DWC(derivational window constraints), # of active nodes restricted
• A single-rooted, labeled tree is constructed

CS730B

Statistical Parsing Models
• The Tagging Model
• The Extension Model
• The Label Model
• The Derivation Model
• The Parsing Model

CS 730B

Decision-Tree Growing & Smoothing
• 3 main models (tagging, extension, and label)
• Dividing the training corpus into 2 sets: (90% for growing, 10% for smoothing)
• Growing & Smoothing Algorithm
• Figure 3.5

CS730B

Decision-Tree Training
• Parsing model can not be estimated by direct frequency counts because the model contains a hidden component: the derivation model
• In the corpus, no information about orders of derivations
• So, the training process must process discover which derivations assign higher probability to the parses
• Forward-Backward Reestimation used

CS730B

Continued
• Training Algorithm

CS730B

Experiment Results
• IBM computer Manual
• annotated by the University of Lancaster
• 195 part-of-speech tags and 19 non-terminal labels
• trained on 30,800 sentences, and tested on 1,473 new sentences
• 0-crossing-brackets score
• IBM’s rule-based, unification-style PCFG parse: 69%
• SPATTER: 76%

CS730B

Continued
• Wall Street Journal
• To test ability to accurately parse a highly-ambiguous, large-vocabulary domain
• Annotated in the Penn Treebank, version 2
• 46 part-of-speech tags, and 27 non-terminal labels
• Trained on 40,000 sentences, and tested on 1,920 new sentences
• Using PARSEVAL

CS730B

Conclusion
• Large amounts of contextual information can be incorporated into a statistical model for by applying decision-tree learning algorithm
• Automatically discovering rules are possible

CS730B