A novel discourse parser based on support vector machine classification
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

A Novel Discourse Parser Based on Support Vector Machine Classification PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

A Novel Discourse Parser Based on Support Vector Machine Classification. Source: ACL 2009 Author: David A. duVerle and Helmut Prendinger Reporter: Yong-Xiang Chen. Research problem. Automated annotation of a text with RST hierarchically organized relations To parse discourse

Download Presentation

A Novel Discourse Parser Based on Support Vector Machine Classification

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A novel discourse parser based on support vector machine classification

A Novel Discourse Parser Based onSupport Vector Machine Classification

Source: ACL 2009

Author: David A. duVerle and Helmut Prendinger

Reporter: Yong-Xiang Chen


Research problem

Research problem

  • Automated annotation of a text with RST hierarchically organized relations

    • To parse discourse

    • Within the framework of Rhetorical Structure Theory (RST)

    • Produce a tree-like structure

    • Based on SVM


Rhetorical structure theory rst

Rhetorical Structure Theory (RST)

  • Mann and Thompson (1988)

  • A set of structural relations to composing units (‘spans’) of text

    • 110 distinct rhetorical relations

    • Relations can be of intentional, semantic, or textual nature

  • Two-step process (This study focus on step 2)

    • Segmentation of the input text into elementary discourse units (‘edus’)

    • Generation of the rhetorical structure tree

      • the edus constituting its terminal nodes


A novel discourse parser based on support vector machine classification

  • Edus:

    • Nucleus

      • relatively more important part of the text

    • Satellite

      • subordinate to the nucleus, represents supporting

      • information

out-going arrow

satellite

nucleu

satellite

nucleu


Research restriction

Research restriction

  • A sequence of edus that have been segmented beforehand

  • Use the reduced set of 18 rhetorical relations

    • e.g.: PROBLEM-SOLUTION, QUESTION-ANSWER, STATEMENT-RESPONSE, TOPIC-COMMENT and COMMENT-TOPIC are all grouped under one TOPIC-COMMENT relation

  • Turned all n-ary rhetorical relations into nested binary relations

    • e.g.: LIST relation

  • Only adjacent spans of text can be put in relation within an RST tree (‘Principle of sequentiality’ (Marcu, 2000)


18 rhetorical relations

18 rhetorical relations

  • Attribution, Background, Cause, Comparison, Condition, Contrast, Elaboration, Enablement, Evaluation, Explanation, Joint, Manner-Means,

    Topic-Comment, Summary, Temporal, Topic- Change, Textual-organization,same-unit


Classifier

Classifier

  • Input: given two consecutive spans (atomic edus or RST sub-trees) from input text

  • Score the likelihood of a direct structural relation as well as probabilities for

    • a relation’s label

    • Nuclearity

  • Gold standard: human cross-validation levels


Two separate classifiers

Two separate classifiers

  • to train two separate classifiers:

  • S: A binary classifier, for structure

    • existence of a connecting node between the two input sub-trees

  • L: A multi-class classifier, for rhetorical relation and nuclearity labeling


Produce a valid tree

Produce a valid tree

  • Using these classifiers and a straight-forward bottom-up tree-building algorithm


Classes

Classes

  • 18 super-relations and 41 classes

  • Considering only valid nuclearity options

    • e.g., (ATTRIBUTION, N, S) and (ATTRIBUTION, S, N) are two classes of ATTRIBUTION

    • but not (ATTRIBUTION, N, N)


Reduce the multi classification

Reduce the multi-classification

  • Reduce the multi-classification problem through a set of binary classifiers, each trained either on a single class (“one vs. all”) or by pair (“one vs. one”)


Input data

Input data

  • Annotated documents taken from the RST-DT corpus

    • paired with lexicalized syntax trees (LS Trees) for each sentence

    • a separate test set is used for performance evaluation


Lexicalized syntax trees ls trees

Lexicalized syntax trees (LS Trees)

  • Taken directly from the Penn Treebank corpus then “lexicalized” using a set of canonical head-projection rules

    • tagged with lexical “heads” on each internal node of the syntactic tree


Algorithm

Algorithm

  • Repeatedly applying the two classifiers and following a naive bottom-up tree-construction method

    • obtain a globally satisfying RST tree for the entire text

  • Starts with a list of all atomic discourse sub-trees

    • made of single edus in their text order

  • Recursively selects the best match between adjacent sub-trees

    • using binary classifier S

  • Labels the newly created sub-tree (using multi-label classifier L) and updates scoring for S, until only one sub-tree is left


Features

Features

  • ‘S[pan]’ are sub-tree-specific features

    • Symmetrically extracted from both left and right candidate spans

  • ‘F[ull]’ are a function of the two sub-trees considered as a pair


Textual organization

Textual Organization

  • S features:

    • Number of paragraph boundaries

    • Number of sentence boundaries

  • F features:

    • Belong to same sentence

    • Belong to same paragraph

  • Hypothesize a correlation between span length and rhetorical relation

    • e.g. the satellite in a CONTRAST relation will tend to be shorter than the nucleus

    • span size and positioning

      • using either tokens or edus as a distance unit

      • using relative values for positioning and distance


Lexical clues and punctuation

Lexical Clues and Punctuation

  • Discourse markers are good indications

  • Use an empirical n-gram dictionary (for n∈ {1, 2, 3}) built from the training corpus and culled by frequency

    • Reason: Takes into account non-lexical signals such as punctuation

  • Counted and encoded n-gram occurrences while considering only the first and last n tokens of each span

    • Classifier accuracy improved by more than 5%


Simple syntactic clues

Simple Syntactic Clues

  • For achieving better generalization

    • smaller dependency on lexical content

  • Add shallow syntactic clues by encoding part-of-speech (POS) tags for both prefix and suffix in each span

    • length higher than n = 3 did not seem to improve


Dominance sets

Dominance Sets

  • Extract from the syntax parse trees

  • EX. Difficult to identify the scope of the ATTRIBUTION relation below:


One dominance logical nesting order

One dominance: Logical nesting order

  • Logical nesting order: 1A > 1B > 1C

  • This order allows us to favor the relation between 1B and 1C over a relation between 1A and 1B


Dominance sets1

Dominance Sets

  • S features:

    • Distance to root of the syntax tree

    • Distance to common ancestor in the syntax tree

    • Dominating node’s lexical head in span

    • Relative position of lexical head in sentence

  • F features:

    • Common ancestor’s POS tag

    • Common ancestor’s lexical head

    • Dominating node’s POS tag (diamonds in Figure )

    • Dominated node’s POS tag (circles in Figure )

    • Dominated node’s sibling’s POS tag (rectangles in Figure )


Rhetorical sub structure

Rhetorical Sub-structure

  • Structural features for large spans (higher-level relations)

  • Encoding each span’s rhetorical sub-tree into the feature vector


Evaluation

Evaluation

  • Raw performance of SVM classifiers

  • Entire tree-building task

  • Binary classifier S

    • trained on 52,683 instances

      • Positive: 1/3, Negative:2/3

    • tested on 8,558 instances

  • classifier L

    • trained on 17,742 instances

      • labeled across 41 classes

    • tested on 2,887 instances

Baseline


Baseline reitter s result 2003

Baseline: Reitter’s result 2003

  • A smaller set of training instances

    • 7976 v.s. 17,742 in this case

  • Less classes

    • 16 rhetorical relation labels with no nuclearity, v.s. to our 41 nuclearized relation classes


Full system performance

Full System Performance

  • Comparing structure and labeling of the RST tree produced to that manual annotation

    • perfectly-segmented & SPADE segmenter output

    • blank tree structure (‘S’)

    • with nuclearity (‘N’)

    • with rhetorical relations (‘R’)

    • fully labeled structure (‘F’)


Comparison with other algorithms

Comparison with other Algorithms


A novel discourse parser based on support vector machine classification

The End


Background

Background

  • Coherence relations reflect the authors intent

    • Hierarchically structured set of Coherence relations

  • Discourse

    • Focuses on a higher-level view of text than sentence level


A novel discourse parser based on support vector machine classification

14

  • Due to small differences in the way they were tokenized and pre-treated, rhetorical tree and LST are rarely a perfect match: optimal alignment is found by minimizing edit distances between word sequences


Features1

Features

  • Use n-fold validation on S and L classifiers to assess the impact of some sets of features on general performance and eliminate redundant features

  • ‘S[pan]’ are sub-tree-specific features

    • Symmetrically extracted from both left and right candidate spans

  • ‘F[ull]’ are a function of the two sub-trees considered as a pair


Strong compositionality criterion

Strong Compositionality Criterion


  • Login