nyu description of the proteus pet system as used for muc 7 st n.
Skip this Video
Loading SlideShow in 5 Seconds..
NYU: Description of the Proteus/PET System as Used for MUC-7 ST PowerPoint Presentation
Download Presentation
NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Loading in 2 Seconds...

play fullscreen
1 / 36

NYU: Description of the Proteus/PET System as Used for MUC-7 ST - PowerPoint PPT Presentation

  • Uploaded on

NYU: Description of the Proteus/PET System as Used for MUC-7 ST. Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002. Outline. Introduction Proteus IE System PET User Interface Performance on the Launch Scenario. Introduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'NYU: Description of the Proteus/PET System as Used for MUC-7 ST' - maia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
nyu description of the proteus pet system as used for muc 7 st

NYU:Description of the Proteus/PET System as Used for MUC-7 ST

Roman Yangarber & Ralph Grishman

Presented by Jinying Chen


  • Introduction
  • Proteus IE System
  • PET User Interface
  • Performance on the Launch Scenario
  • Problem : portability and customization of IE engines at the scenario level
  • To address this problem
    • NYU built a set of tools, which allow the user to adapt the system to new scenarios rapidly through example-based learning
    • The present system operates on two tiers: Proteus & PET
introduction cont
Introduction (Cont.)
  • Proteus
    • Core extraction engine, an enhanced version of the one employed at MUC-6
  • PET
    • GUI front end, through which the user interacts with Proteus
    • The user provide the system examples of events in text, and examples of associated database entries to be created
proteus ie system
Proteus IE System
  • Modular design
    • Control is encapsulated in immutable, domain-independent core components
    • Domain-specific information resides in the knowledge bases
proteus ie system cont
Proteus IE System (Cont.)
  • Lexical analysis module
    • Assign each token a reading or a list of alternative readings by consulting a set of on-line dictionaries
  • Name Recognition
    • Identify proper names in the text by using local contextual cues
proteus ie system cont1
Proteus IE System (Cont.)
  • Partial Syntax
    • Find small syntactic units, such as basic NPs and VPs
    • Marks the phrase with semantic information, e.g. the semantic class of the head of the phrase
  • Scenario Patterns
    • Find higher–level syntactic constructions using local semantic information: apposition, prepositional phrase attachment, limited conjucntions, and clausal constructions.
proteus ie system cont2
Proteus IE System (Cont.)
  • Note:
    • The above three modules are Pattern matching phrases, they operate by deterministic, bottom-up, partial parsing or pattern matching.
    • The output is a sequence of LFs corresponding to the entities, relationships, and events encountered in the analysis.
proteus ie system cont3
Proteus IE System (Cont.)
  • Reference Resolution (RefRes)
    • Links anaphoric pronouns to their antecedents and merges other co-referring expressions
  • Discourse Analysis
    • Uses higher-level inference rules to build more complex event structures
    • E.g. a rule that merges a Mission entity with a corresponding Launch event.
  • Output Generation
pet user interface
PET User Interface
  • A disciplined method of customization of knowledge bases, and the pattern base in particular
  • Organization of Patterns
    • The pattern base is organized in layers
    • Proteus treats the patterns at the different levels differently
    • Acquires the most specific patterns directly from user, on a per-scenario basis

clausal patterns that capture events (scenario-specific)


find relationships among entities, such as between persons and organizations

perform partial syntactic analysis


most general patterns, capture the most basic constructs, such as proper names, temporal expressions etc.


Pattern Lib





pet user interface cont
PET User Interface (Cont.)
  • Pattern Acquisition
    • Enter an example
    • Choose an event template
    • Apply existing patterns (step 3)
    • Tune pattern elements(step 4)
    • Fill event slots(step 5)
    • Build pattern
    • Syntactic generalization

Step 3

Step 4

Step 5

performance on the launch scenario
Performance on the Launch Scenario
  • Scenario Patterns
    • Basically two types: launch events and mission events
    • In cases there is no direct connection between these two events, the post-processing inference rules attempted to tie the mission to a launch event
  • Inference Rules
    • Involve many-to-many relations (e.g. multiple payloads correspond to a single event)
    • Extending inference rule set with heuristics, e.g. find date and site


  • Example-based pattern acquisition is appropriate for ST-level task, especially when training data is quite limited
  • Pattern editing tools are useful and effective
nyu description of the mene named entity system as used in muc 7

NYU:Description of the MENE Named Entity System as Used in MUC-7

Andrew Borthwick, John Sterling etc.

Presented by Jinying Chen


  • Maximum Entropy
  • MENE’s Feature Classes
  • Feature Selection
  • Decoding
  • Results
  • Conclusion
maximum entropy
Maximum Entropy
  • Problem Definition

The problem of named entity recognition can be reduced to the problem of assigning 4*n+1 tags to each token

    • n: the number of name categories, such as company, product, etc. For MUC-7, n=7
    • 4 states: x_start, x_continue, x_end, x_unique
    • other : not part of a named entity
maximum entropy cont
Maximum Entropy (cont.)
  • Maximum Solution
    • compute p(f | h), where f is the prediction among the 4*n+1 tags and h is the history
    • the computation of p(f | h) depends on a set of binary-valued features, e.g.
maximum entropy cont1
Maximum Entropy (cont.)
  • Given a set of features and some training data, the maximum entropy estimation process produces a model:
mene s feature classes
MENE’s Feature Classes
  • Binary Features
  • Lexical Features
  • Section Features
  • Dictionary Features
  • External Systems Features
binary features
Binary Features
  • Features whose “history” can be considered to be either on or off for a given token.
  • Example:
    • The token begins with a capitalized letter
    • The token is a four-digit number
section features
Section Features
  • Features make predictions based on the current section of the article, like “Date”, “Preamble”, and “Text”.
  • Play a key role by establishing the background probability of the occurrence of the different futures (predictions).
dictionary features
Dictionary Features
  • Make use of a broad array of dictionaries of useful single or multi-word terms such as first names, company names, and corporate suffixes.
  • Require no manual editing
external systems features
External Systems Features
  • MENE incorporates the outputs of three NE taggers
    • a significantly enhanced version of the traditional , hand-coded “Proteus” named-entity tagger
    • Manitoba
    • IsoQuest
feature selection
Feature Selection
  • Simple
  • Select all features which fire at least 3 times in the training corpus
  • Simple
    • For each token, check all the active features for this token and compute p(f | h)
    • Run a viterbi search to find the highest probability coherent path through the lattice of conditional probabilities
  • Training set: 350 aviation disaster articles (consisted of about 270,000 words)
  • Test set:
    • Dry run : within-domain corpus
    • Formal run : out-of-domain corpus
  • A new, still immature system. Can improve the performance by:
    • Adding long-range reference-resolution features
    • Exploring compound features
    • Sophisticated methods of feature selection
  • Highly portable
  • An efficient method to combine NE systems