Error analysis for learning based coreference resolution l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Error Analysis for Learning-based Coreference Resolution PowerPoint PPT Presentation


  • 181 Views
  • Uploaded on
  • Presentation posted in: General

Error Analysis for Learning-based Coreference Resolution. Olga Uryupina 27.05.08. Outline. CR: state-of-the-art and our system Distribution of errors Discussion: possible remedies. Coreference Resolution.

Download Presentation

Error Analysis for Learning-based Coreference Resolution

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Error analysis for learning based coreference resolution l.jpg

Error Analysis for Learning-based Coreference Resolution

Olga Uryupina

27.05.08


Outline l.jpg

Outline

  • CR: state-of-the-art and our system

  • Distribution of errors

  • Discussion: possible remedies


Coreference resolution l.jpg

Coreference Resolution

„This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York.

..

Globalstar still needs to raise $ 600 million, and Schwartz said that the company would try..


Coreference resolution4 l.jpg

Coreference Resolution

„This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York.

..

Globalstar still needs to raise $ 600 million, and Schwartz said that the company would try..


Coreference resolution5 l.jpg

Coreference Resolution

„This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York.

..

Globalstar still needs to raise $ 600 million, and Schwartz said that the company would try..


Machine learning approaches l.jpg

Machine Learning Approaches

  • Soon et al (2000)

  • Cardie & Wagstaff (1999)

  • Strube et al. (2002)

  • Ng & Cardie (2001-2004)

  • ACE competition


Features soon et al 2000 l.jpg

Features: Soon et al. (2000)

  • Anaphor is a pronoun

  • Anaphor is a definite NP

  • Anaphor is an NP with a demonstrative pronoun („this“,..)

  • Antecedent is a pronoun

  • Both markables are proper names

  • Number agreement

  • Gender agreement

  • Alias

  • Appositive

  • Same surface form

  • Semantic class agreement

  • Distance in sentences


Features other approaches l.jpg

Features: other approaches

Cardie & Wagstaff: 11 Features

Strube et al.: 17 Features (the same standard features + approximate matching (MED))

Ng & Cardie: 53 Features (no improvement on the extended feature set, better results (F=63.4) with manual feature selection)


Performance soon et al l.jpg

Performance: Soon et al.

Soon et al‘s system:

Our reimlementation:


Performance soon et al10 l.jpg

Performance: Soon et al.

Learning Curve for C5.0


Tricky and easy anaphors l.jpg

Tricky and easy anaphors

Cristea et al. (2002): state-of-the-art coreference resolution systems have essentially the same performance level

Pronominal anaphora – 80%

Full-scale coreference – 60%

Hypothesis: tricky vs. easy anaphors


Our system l.jpg

Our system

Goal:

Bridge the gap between the theory and the practice:

sophisticated linguistic knowledge + data-driven coreference resolution algorithm


New features l.jpg

New Features

Different aspects of CR:

  • Surface similarity (122 features)

  • Syntax (64)

  • Semantic Compatibility (29)

  • Salience (136)

  • (Anaphoricity)

    More or less sophisticated linguistic theories exist for all these phenomena


Evaluation l.jpg

Evaluation

Methodology

  • Standart dataset (MUC-7)

  • Standard learning set-up

  • Compare to Soon et al. (2001)


Performance f l.jpg

Performance (F)


Performance l.jpg

Performance

Learning Curve, SVM


Error analysis l.jpg

Error analysis

Different approaches – same performance:

  • Same errors?

  • „Tricky anaphors“? (Cristea et al., 2002)

    Extensive error analysis needed!


Outline18 l.jpg

Outline

  • CR: state-of-the-art and our system

  • Distribution of errors

  • Discussion: possible remedies


Recall errors l.jpg

Recall errors


Recall errors markables l.jpg

Recall errors - markables

  • Auxilliary doc parts

  • Tokenization

  • Modifiers

  • Bracketing/labeling


Recall errors markables21 l.jpg

Recall errors - markables

.. there was no requirement for tether to be manufactured in a contaminant-free enviroment.

A mesmerizing set.


Recall errors pronouns l.jpg

Recall errors - pronouns

1st pl – reconstructing the group:

The retiring Republican chairman of the House Committee on Science want U.S. Businesses to <..> „We need to make it easier for the private sector..“ Walker said

3rd sg, 3rd pl – (non-)salience:

[The explanation] for the History Channel‘s success begin with its association with another channel owned by the same parent consortium.


Recall errors nominal l.jpg

Recall errors - nominal

Mostly common noun phrases with different heads, WordNet does not help much

.. a report on the satellites‘ findings <..> the abilities of U.S. Reconnaissance technology <..> the use of advanced intelligence-gathering tools <..> Remote-sensing instruments..


Precision errors l.jpg

Precision errors


Precision errors pronouns l.jpg

Precision errors- pronouns

  • incorrect Parsing/Tagging

    Two key vice presidents, [Wei Yen] and Eric Carlson, are leaving to start their own Silicon Valley companies.

  • (non-)salience

  • matching (propagated R)


Precision errors nominal l.jpg

Precision errors - nominal

Mostly same-head descriptions. Possible solutions:

  • modifiers?

  • anaphoricicty detectors?


P errors nominal modifiers l.jpg

P errors – nominal - modifiers

Idea: „red car“ cannot corefer with „blue car“

Problem: list of mutually incompatible properties?

MUC7 test data:

incompatible modifiers30

„new“ mod for anaphora15

compatible modifiers58

no modifiers 62


P errors nominal dnew l.jpg

P errors – nominal - dnew

Idea: identify and discard unlikely anaphors

Problem: even a very good detector does not help


Outline29 l.jpg

Outline

  • CR: state-of-the-art and our system

  • Distribution of errors

  • Discussion: Possible remedies


Discussion errors l.jpg

Discussion – Errors

Problematic areas:

  • Data

  • Preprocessing modules

  • Features

  • Resolution strategy


Discussion data l.jpg

Discussion - Data

  • bigger corpus

  • more uniform doc selection, text only

  • better definition of COREF

  • better scoring


Discussion preprocessing l.jpg

Discussion - Preprocessing

  • local improvements (e.g. appositions)

  • probabilistic architecture to neutralize errors


Discussion features l.jpg

Discussion - Features

  • feature selection

  • ensemble learning

  • more targeted learning for under-represented phenomena (abbreviations)


Discussion resolution l.jpg

Discussion - Resolution

  • less local: move to the chains level

  • less uniform: specific treatment for different types of anaphors


Discussion conclusion l.jpg

Discussion – Conclusion

  • ML approaches to the Coreference Resolution yield similar performance values

  • Some anaphors are indeed tricky (esp. crucial for precision errors)

  • But some errors can be eliminated within a ML framework

    • improving the training material

    • elaborated integration of preprocessing modules

    • more global resolution strategies


Slide36 l.jpg

Thank You!


Recall errors37 l.jpg

Recall errors


Recall errors muc l.jpg

Recall errors - MUC

Mainly incorrect bracketing

..said <COREF .. MIN=„vice president“>Jim Johannesen, <COREF .. MIN=„vice president“>vice president of site development for McDonald‘s</COREF></COREF>..

Only clear typos etc considered MUC-errors


Recall errors propagated p l.jpg

Recall errors – propagated P

The company also said the Marine Corps has begun testing two of [its radars] as part of a short-range ballistic missile defense program. That testing could lead to an order for the radars.

Crucial for pronouns and indicators for intrasentential coreference


Recall errors matching l.jpg

Recall errors - matching

Mostly ORGANIZATIONs.

Problems:

  • Abbreviations

    Federal Communication Commission

    FCC

  • Hyphenated names

    Ziff-Davis Publishing

    Ziff

  • Foreign names

    Taiwan President Lee Teng-hui

    President Lee


Recall errors syntax l.jpg

Recall errors - syntax

Apposition, copula

Problems:

  • Parsing mistakes

  • Missing constructions

    ..the venture will become synonymous with JSkyB

  • P/R trade-off

    ..Kevlar, a synthetic fiber, and Nomex..

    Quantitative constructions

    .. More than quadruple the three-month daily average of 88,700 shares


Precision errors42 l.jpg

Precision errors


Precision errors matching l.jpg

Precision errors - matching

Finer NE analysis could help, but mostly too difficult even for humans:

Loral

Loral Space and Communications Corp

Loral Space

Space Systems Loral


Anaphoricity l.jpg

Anaphoricity

Some markables are not anaphors. We can tell that by looking at them, without any sophisticated coreference resolution.

Poesio & Vieira, Ng & Cardie – try to identify Discourse New entities automatically

Not used for this talk


Anaphoricity45 l.jpg

Anaphoricity

Some markables are not anaphors. We can tell that by looking at them, without any sophisticated coreference resolution.

Poesio & Vieira, Ng & Cardie – try to identify Discourse New entities automatically

Not used for this talk


  • Login