statistical decision tree models for parsing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Statistical Decision-Tree Models for Parsing PowerPoint Presentation
Download Presentation
Statistical Decision-Tree Models for Parsing

Loading in 2 Seconds...

play fullscreen
1 / 18

Statistical Decision-Tree Models for Parsing - PowerPoint PPT Presentation


  • 188 Views
  • Uploaded on

Statistical Decision-Tree Models for Parsing. NLP lab, POSTECH 김 지 협. Contents. Abstract Introduction Decision-Tree Modeling SPATTER Parsing Statistical Parsing Models Decision-Tree Growing & Smoothing Decision-Tree Training Experiment Results Conclusion. Abstract.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistical Decision-Tree Models for Parsing' - laurel-mckee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
contents
Contents
  • Abstract
  • Introduction
  • Decision-Tree Modeling
  • SPATTER Parsing
  • Statistical Parsing Models
  • Decision-Tree Growing & Smoothing
  • Decision-Tree Training
  • Experiment Results
  • Conclusion

CS730B

abstract
Abstract
  • Syntactic NL parser: not adequate for highly-ambiguous large-vocabulary text (ex. Wall Street Journal)
  • Premises for develop a new parser
    • grammars too complex to develop manually for most domains
    • parsing models must rely heavily on contextual information
    • existing n-gram model: inadequate for parsing
  • SPATTER: a statistical parser based on decision-tree model
    • better than a grammar-based parser

CS730B

introduction
Introduction
  • Parsing as making a sequence of disambiguation decisions
  • The probability of a complete parse tree(T) of a sentence(S)
  • Automatically discovering the rules for disambiguation
  • Producing a parser without a complicated grammar
  • Long-distance lexical information is crucial to disambiguate interpretations accurately

CS730B

decision tree modeling
Decision-Tree Modeling
  • Comparison
    • Grammarian: two crucial tasks for parsing
      • identifying the features relevant to each decision
      • deciding which choice to select based on the values of the features
    • Decision-Tree: above 2 tasks + 3rd task
      • assigning a probability distribution to the possible choices, and providing a ranking system

CS730B

continued
Continued
  • What is a Statistical Decision Tree?
    • A decision-making device assigning a probability to each of the possible choices based on the context of the decision
    • P ( f | h ) , where f : an element of the future vocabulary

h : a history (the context of the decision)

    • The probability determined by asking a sequence of questions
    • i th question determined by the answers to the i - 1previous question
    • Example: Part-of-speech tagging problem ( Figure 1 )

CS730B

continued1
Continued
  • Decision Trees vs. n-grams
    • Equivalent to an interpolated n - gram model in expressive power
    • Model Parameterization
      • n -gram model:
      • n -gram model can be represented by decision-tree model ( n-1 questions )
      • Example: part-of-speech tagging

CS730B

continued2
Continued
  • Model Estimation
    • n-gram model

CS730B

continued3
Continued
  • decision-tree model
  • decision-tree model can be represented by interpolated n- gram

CS730B

continued4
Continued
  • Why use decision-tree?
    • As n grows, the parameter space for an n-gram model grows exponentially
    • On the other hand, the decision-tree learning algorithm increases the size of a model only as the training data allows
    • So, it can consider much contextual information

CS730B

spatter parsing
SPATTER Parsing
  • SPATTER Representation
    • Parse: as a geometric pattern
    • 4 features in node: words, tags, labels, and extensions (Figure 3)
  • The Parsing Algorithm
    • Starting with the sentence’s words as leaves (Figure 3)
    • Gradually tagging, labeling, and extending nodes
    • Constraints
      • Bottom-up, left-to-right
      • No new node is constructed until its children completed
      • Using DWC(derivational window constraints), # of active nodes restricted
    • A single-rooted, labeled tree is constructed

CS730B

statistical parsing models
Statistical Parsing Models
  • The Tagging Model
  • The Extension Model
  • The Label Model
  • The Derivation Model
  • The Parsing Model

CS 730B

decision tree growing smoothing
Decision-Tree Growing & Smoothing
  • 3 main models (tagging, extension, and label)
  • Dividing the training corpus into 2 sets: (90% for growing, 10% for smoothing)
  • Growing & Smoothing Algorithm
    • Figure 3.5

CS730B

decision tree training
Decision-Tree Training
  • Parsing model can not be estimated by direct frequency counts because the model contains a hidden component: the derivation model
  • In the corpus, no information about orders of derivations
  • So, the training process must process discover which derivations assign higher probability to the parses
  • Forward-Backward Reestimation used

CS730B

continued5
Continued
  • Training Algorithm

CS730B

experiment results
Experiment Results
  • IBM computer Manual
    • annotated by the University of Lancaster
    • 195 part-of-speech tags and 19 non-terminal labels
    • trained on 30,800 sentences, and tested on 1,473 new sentences
    • 0-crossing-brackets score
      • IBM’s rule-based, unification-style PCFG parse: 69%
      • SPATTER: 76%

CS730B

continued6
Continued
  • Wall Street Journal
    • To test ability to accurately parse a highly-ambiguous, large-vocabulary domain
    • Annotated in the Penn Treebank, version 2
    • 46 part-of-speech tags, and 27 non-terminal labels
    • Trained on 40,000 sentences, and tested on 1,920 new sentences
    • Using PARSEVAL

CS730B

conclusion
Conclusion
  • Large amounts of contextual information can be incorporated into a statistical model for by applying decision-tree learning algorithm
  • Automatically discovering rules are possible

CS730B