Dependency parsing machine learning approaches
Download
1 / 78

Dependency Parsing: Machine Learning Approaches - PowerPoint PPT Presentation


  • 113 Views
  • Updated On :

January 7, 2008. Dependency Parsing: Machine Learning Approaches . Yuji Matsumoto Graduate School of Information Science Nara Institute of Science and Technology (NAIST, Japan). He reckons the current account deficit will narrow to only 1.8 billion in September .

Related searches for Dependency Parsing: Machine Learning Approaches

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dependency Parsing: Machine Learning Approaches' - oleg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dependency parsing machine learning approaches l.jpg

January 7, 2008

Dependency Parsing:Machine Learning Approaches

Yuji Matsumoto

Graduate School of Information Science

Nara Institute of Science and Technology

(NAIST, Japan)


Basic language analyses pos tagging phrase chunking parsing l.jpg

Hereckonsthe current account deficitwill narrowtoonly 1.8 billioninSeptember .

Basic Language Analyses (POS-tagging, phrase chunking, parsing)

Raw sentence

He reckons the current account deficit will narrow to only 1.8 billion in September.

Part-of-speech tagging

POS-tagged sentence

He reckons the current account deficit will narrow to only 1.8 billion in September .

PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .

Base phrase chunking

Base phrase-chunked sentence

Hereckonsthe current account deficitwill narrowtoonly 1.8 billioninSeptember .

NPVPNPVPPPNPPPNP

Dependency parsing

Dependency parsed sentence


Word dependency parsing unlabeled l.jpg

Word dependency parsing

Word Dependency Parsing (unlabeled)

Raw sentence

He reckons the current account deficit will narrow to only 1.8 billion in September.

Part-of-speech tagging

POS-tagged sentence

He reckons the current account deficit will narrow to only 1.8 billion in September.

PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .

Word dependency parsed sentence

Hereckonsthecurrentaccountdeficitwillnarrowtoonly1.8billioninSeptember.


Word dependency parsing labeled l.jpg

Word dependency parsing

MOD

MOD

COMP

SUBJ

MOD

SUBJ

COMP

SPEC

S-COMP

ROOT

Word Dependency Parsing (labeled)

Raw sentence

He reckons the current account deficit will narrow to only 1.8 billion in September.

Part-of-speech tagging

POS-tagged sentence

He reckons the current account deficit will narrow to only 1.8 billion in September.

PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .

Word dependency parsed sentence

Hereckonsthecurrentaccountdeficitwillnarrowtoonly1.8billioninSeptember.


A phrase structure tree and a dependency tree l.jpg
A phrase structure tree anda dependency tree

ounces


Flattened representation of a dependency tree l.jpg
Flattened representation ofa dependency tree

ounces


Dependency structure terminology l.jpg
Dependency structure —terminology

Label

  • Child

  • Dependent

  • Modifier

SUBJ

Thisis

  • Parent

  • Governor

  • Head

  • The direction of arrows may be drawn from head to child

  • When there is an arrow from w to v, we write w→v.

  • When there is a path (a series of arrows) from w to v,

  • we write w→*v.


Definition of dependency trees l.jpg
Definition of Dependency Trees

  • Single head: Except for the root (EOS), all words have a single parent

  • Connected: It should be a connected tree

  • Acyclic: If wi→wj , then it will never be

    wj→*wi

  • Projective: If wi →wj , then for all k between

    i and j, either wk →* wiorwk →* wjholds

    (non-crossing between dependencies).


Projective dependency tree l.jpg
Projective dependency tree

ounces

Projectiveness: all the words between here finally

depend on either on “was” or “.”

(e.g., light →* was)


Non projective dependency tree l.jpg

NNP VBD DT NN NN WP VBD IN DT NN

John saw a man yesterday who walked along the river

Non-projective dependency tree

root

Direction of edges: from a child to the parent


Non projective dependency tree11 l.jpg
Non-projective dependency tree DT NN

root

*taken from: R. McDonald and F. Pereira, “Online Learning of Approximate

Dependency Parsing Algorithms,” European Chapter of Association for

Computational Linguistics, 2006.

Direction of edges: from a parent to the children


Two different strategies for structured language analysis l.jpg
Two Different Strategies for Structured Language Analysis DT NN

  • Sentences have structures

    • Linear sequences: POS tagging, Phrase/Named Entity chunking

    • Tree structure: Phrase structure trees, dependency trees

  • Two statistical approaches to structure analysis

    • Global optimization

      • Eg., Hidden Markov Models, Conditional Ramdom Fields for Sequential tagging problems

      • Probabilistic Context-free parsing

      • Maximum Spanning Tree Parsing (graph-based)

    • Repetition of local optimization

      • Chunking with Support Vector Machine

      • Deterministic parsing (transition-based)


Statistical dependency parsers l.jpg
Statistical dependency parsers DT NN

  • Eisner (COLING 96, Penn Technical Report 96)

  • Kudo & Matsumoto (VLC 00, CoNLL 02):

  • Yamada & Matsumoto (IWPT 03)

  • Nivre (IWPT 03, COLING 04, ACL 05)

  • Cheng, Asahara, Matsumoto (IJCNLP 04)

  • McDonald-Crammer-Pereira (ACL 05a, EMNLP 05b, EACL 06)

Global optimization

Repetition of local optimization


Dependency parsing used as the conll shared task l.jpg
Dependency Parsing used as DT NNthe CoNLL Shared Task

  • CoNLL (Conference on Natural Language Learning)

  • Multi-lingual Dependency Parsing Track

    • 10 languages: Arabic, Basque, Catalan, Chinese, Czech, English, Greek, Hungarian, Italian, Turkish

  • Domain Adaptation Track

    • Dependency annotated data in one domain and a large unannotated data in other domains (biomedical/chemical abstracts, parent-child dialogue) are available.

    • Objective: To use large scale unannotated target data to enhance the performance of dependency parser learned in the original domain so as to work well in the new domain.

Nivre, J., Hall, J., Kubler, S, McDonald, R., Nilsson, J., Riedel, S., Yuret, D., “The CoNLL 2007 Shared Task on Dependency Parsing,” Proceedings of EMNLP-CoNLL 2007, pp.915-932, June 2007.


Statistical dependency parsers to be introduced in this lecture l.jpg
Statistical dependency parsers DT NN(to be introduced in this lecture)

  • Kudo & Matsumoto (VLC 00, CoNLL 02): Japanese

  • Yamada & Matsumoto (IWPT 03)

  • Nivre (IWPT 03, COLING 04, ACL 05)

  • McDonald-Crammer-Pereira (EMNLP 05a, ACL 05b, EACL 06)

Most of them (except for [Nivre 05] and [McDonald 05a])

Assume projective dependency parsing


Japanese syntactic dependency analysis l.jpg
Japanese Syntactic Dependency Analysis DT NN

  • Analysis of relationship between phrasal units (“bunsetsu” segments)

  • Two Constraints:

    • Each segment modifies one of right-hand side segments (Japanese is head final language)

    • Dependencies do not cross one another (projectiveness)


An example of japanese syntactic dependency analysis l.jpg

私は彼女と京都に行きます DT NN

(I go to Kyoto with her.)

Raw text

Morphological analysis and

bunsetsu chunking

私は / 彼女と / 京都に / 行きます

I with her to Kyoto go

Dependency Analysis

私は / 彼女と / 京都に / 行きます

An Example of Japanese Syntactic Dependency Analysis


Model 1 probabilistic model l.jpg

1. DT NN Build a Dependency Matrix usingME, DT or SVMs (How probable one segment modifies another)

Modifiee

2. Search the optimal dependencies which maximize the sentence probabilities, using CYK or Chart

2

3

4

1

0.1

0.2

0.7

Modifier

2

0.2

0.8

Output

3

1.0

Dependency Matrix

私は 1 / 彼女と 2 / 京都に 3 / 行きます 4

Model 1: Probabilistic Model

[Kudo & Matsumoto 00]

Input

私は 1 / 彼女と 2 / 京都に 3 / 行きます 4

I-top / with her / to Kyoto-loc / go


Problems of probabilistic model 1 l.jpg
Problems of Probabilistic model(1) DT NN

  • Selection of training examples:

    All pairs of segments in a sentence

    • Depending pairs → positive examples

    • Non-depending pairs → negative examples

  • This produces a total of n(n-1)/2 training examples per sentence (n is the number of segments in a sentence)

  • In Model 1:

    • All positive and negative examples are used to learn an SVM

    • Test example is given to the SVM, its distance from the separating hyperplane is transformed into a pseud-probability using the sigmoid function


Problems of probabilistic model l.jpg
Problems of Probabilistic Model DT NN

  • Size of training examples is large

  • O(n3) time is necessary for complete parsing

  • The classification cost of SVM is much more expensive than other ML algorithms such as Maximum Entropy model and Decision Trees


Model 2 cascaded chunking model l.jpg
Model 2: Cascaded Chunking Model DT NN

[Kudo & Matsumoto 02]

  • Parse a sentence deterministically only deciding whether the current segment modifies the segment on its immediate right-hand side

  • Training examples are extracted using the same parsing algorithm


Example training phase l.jpg

彼は DT NN1  彼女の2  真心に4  感動した。5

?

?

?

彼は1  彼女の2  真心に4  感動した。5

O DD

彼は1   真心に4  感動した。5

?

?

?

?

?

?

彼は1   真心に4  感動した。5

彼は1  彼女の2  温かい3  真心に4  感動した。5

O D

O O DD

彼は1     感動した。5

彼は1     感動した。5

SVM-learning

after accumulation

?

D

SVMs

Example: Training Phase

Annotated sentence

彼は1  彼女の2  温かい3  真心に4  感動した。5

He her warm heart was moved

(He was moved by her warm heart.)

彼は1  彼女の2  温かい3  真心に4  感動した。5

Pairs of tag (D or O) and context(features) are stored as training data for SVMs

Training

Data


Example test phase l.jpg

彼は DT NN1  彼女の2  真心に4  感動した。5

?

?

?

彼は1  彼女の2  真心に4  感動した。5

O DD

彼は1   真心に4  感動した。5

?

?

?

?

?

?

彼は1   真心に4  感動した。5

彼は1  彼女の2  温かい3  真心に4  感動した。5

O D

O O DD

彼は1     感動した。5

彼は1     感動した。5

?

D

Example: Test Phase

Test sentence

彼は1  彼女の2  温かい3  真心に4  感動した。5

He her warm heart was moved

(He was moved by her warm heart.)

彼は1  彼女の2  温かい3  真心に4  感動した。5

Tag is decided by SVMs built in training phase

SVMs


Advantages of cascaded chunking model l.jpg
Advantages of Cascaded Chunking model DT NN

  • Efficiency

    • O(n3) (Probability model) v.s. O(n2) (Cascaded chunking model)

    • Lower than O(n2) since most segments modify the segment on their immediate right-hand-side

    • The size of training examples is much smaller

  • Independence from ML methods

    • Can be combined with any ML algorithm which works as a binary classifier

    • Probabilities of dependency are not necessary


Features used in implementation l.jpg

B DT NN

A

C

Features used in implementation

Modify or not?

彼の1友人は2  この本を3  持っている4 女性を5 探している6

His friend-top this book-acc have lady-acc be looking for

modifier

head

His friend is looking for a lady who has this book.

  • Static Features

    • modifier/modifiee

      • Head/Functional Word: surface, POS, POS-subcategory,

        inflection-type, inflection-form, brackets, quotations, punctuations,…

    • Between segments: distance, case-particles, brackets,

      quotations, punctuations

  • Dynamic Features

    • A,B: Static features of Functional word

    • C: Static features of Head word


Settings of experiments l.jpg
Settings of Experiments DT NN

  • Kyoto University Corpus 2.0/3.0

    • Standard Data Set

      • Training: 7,958 sentences / Test: 1,246 sentences

      • Same data as [Uchimoto et al. 98], [Kudo, Matsumoto 00]

    • Large Data Set

      • 2-fold Cross-Validation using all 38,383 sentences

  • Kernel Function: 3rd polynomial

  • Evaluation method

    • Dependency accuracy

    • Sentence accuracy


Results l.jpg
Results DT NN

Data Set

Standard (8,000 sentences)

Large (20,000 sentences)

Model

Cascaded Chunking

Probabilistic

Cascaded Chunking

Probabilistic

Dependency Acc. (%)

89.29

89.09

90.45

N/A

Sentence Acc. (%)

47.53

46.17

53.16

N/A

# of training sentences

7,956

7,956

19,191

19,191

# of training examples

110,355

459,105

251,254

1,074,316

Training time (hours)

8

336

48

N/A

Parsing time (sec./sent.)

0.5

2.1

0.7

N/A



Smoothing effect in cascade model l.jpg
Smoothing Effect (in cascade model) DT NN

  • No need to cut off low frequent words


Combination of features l.jpg
Combination of features DT NN

  • Polynomial Kernels for taking into combination of features (tested with a small corpus (2000 sentences))


Deterministic dependency parser based on svm l.jpg
Deterministic Dependency Parser based on SVM DT NN

[Yamada & Matsumoto 03]

  • Three possible actions:

    • Right: For the two adjacent words, modification goes from left word to the right word

    • Left: For the two adjacent words, modification goes from right word to the left word

    • Shift: no action should be taken for the pair, and move the focus to the right

      • There are two possibilities in this situation:

      • There is really no modification relation between the pair

      • There is actually a modification relation between them, but need to wait until the surrounding analysis has been finished

      • The second situation can be categorized into a different class (called Wait)

  • Do this process on the input sentence from the beginning to the end, and repeat it until a single word remains





The features used in learning l.jpg
The features used in learning DT NN

SVM is used to make classification

either in 3 class model (right, left, shift)

or in 4 class model (right, left, shift, wait)


Svm learning of actions l.jpg
SVM Learning of Actions DT NN

  • The best action for each configuration is learned by SVMs

  • Since the problem is 3-class or 4-class classification problem, either pair-wise or one-vs-rest method is employed

    • pair-wise method: For each pair of classes, learn an SVM. The best class is decide by voting of all SVMs

    • One-vs-rest method: For each class, an SVM is learned to discriminate that class from others. The best class is decided by the SVM that gives the highest value


Slide37 l.jpg

word pair being considered DT NN

Referred context

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

right

with

a

the

boy

hits

the

dog

rod


Slide38 l.jpg

boy DT NN

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

right

with

a

hits

the

dog

rod


Slide39 l.jpg

boy DT NN

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

shift

with

a

hits

the

dog

rod


Slide40 l.jpg

boy DT NN

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

right

with

a

hits

the

dog

rod


Slide41 l.jpg

boy DT NN

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

shift

with

a

hits

dog

rod

the


Slide42 l.jpg

boy DT NN

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

shift

with

a

hits

dog

rod

the


Slide43 l.jpg

boy DT NN

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

right

with

a

hits

dog

rod

the


Slide44 l.jpg

boy DT NN

dog

the

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

left

with

hits

rod

a


Slide45 l.jpg

boy DT NN

dog

the

the

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

shift

with

hits

rod

a


Slide46 l.jpg

boy DT NN

dog

rod

the

the

a

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

left

with

hits


Slide47 l.jpg

with DT NN

boy

dog

the

the

rod

a

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

left

hits


Slide48 l.jpg

with DT NN

boy

dog

the

the

rod

a

An Example of Deterministic DependencyParsing (Yamada & Matsumoto Algorithm)

End of parsing

hits


The accuracy of parsing l.jpg
The Accuracy of Parsing DT NN

Accuracies for:

Dependency relation

Rood identification

Complete analysis

  • Learned with 30000 English sentences

  • no children: no child info is considered

  • word, POS: only word/POS info is used

  • all: all information is used


Deterministic linear time dependency parser based on shift reduce parsing l.jpg
Deterministic linear time dependency parser based on Shift-Reduce parsing

[Nivre 03,04]

  • There are two stacks S and Q

    • Initialization: S[w1] [w2, w3, …, wn]Q

    • Termination: S[…] []Q

    • Parsing actions:

      • SHIFT: S[…] [wi,…]Q → S[…, wi] […]Q

      • Left-Arc: S[…, wi] [wj, …]Q → S[…] [wj, …]Q

      • Right-Arc: S[…, wi] [wj,…]Q → S[…, wi, wj] […]Q

      • Reduce: S[…, wi, wj] […]Q → S[…, wi] […]Q

wi

wj

Though the original parser uses memory-based learning,

recent implementation uses SVMs to select actions


An example of deterministic dependency parsing nivre s algorithm l.jpg
An Example of Deterministic Dependency Shift-Reduce parsingParsing (Nivre’s Algorithm)

left

S

Q

boy

hits

the

dog

with

a

rod

the


An example of deterministic dependency parsing nivre s algorithm52 l.jpg

boy Shift-Reduce parsing

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

shift

S

Q

hits

the

dog

with

a

rod


An example of deterministic dependency parsing nivre s algorithm53 l.jpg

boy Shift-Reduce parsing

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

left

S

Q

hits

the

dog

with

a

rod


An example of deterministic dependency parsing nivre s algorithm54 l.jpg

boy Shift-Reduce parsing

hits

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

shift

S

Q

the

dog

with

a

rod


An example of deterministic dependency parsing nivre s algorithm55 l.jpg

boy Shift-Reduce parsing

hits

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

shift

S

Q

the

dog

with

a

rod


An example of deterministic dependency parsing nivre s algorithm56 l.jpg

boy Shift-Reduce parsing

hits

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

left

S

Q

the

dog

with

a

rod


An example of deterministic dependency parsing nivre s algorithm57 l.jpg

boy Shift-Reduce parsing

hits

dog

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

right

S

Q

with

a

rod


An example of deterministic dependency parsing nivre s algorithm58 l.jpg

boy Shift-Reduce parsing

hits

dog

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

reduce

S

Q

with

a

rod


An example of deterministic dependency parsing nivre s algorithm59 l.jpg

boy Shift-Reduce parsing

hits

dog

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

right

S

Q

with

a

rod


An example of deterministic dependency parsing nivre s algorithm60 l.jpg

boy Shift-Reduce parsing

hits

dog

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

shift

S

Q

with

a

rod


An example of deterministic dependency parsing nivre s algorithm61 l.jpg

boy Shift-Reduce parsing

hits

dog

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

left

S

Q

with

a

rod


An example of deterministic dependency parsing nivre s algorithm62 l.jpg

boy Shift-Reduce parsing

hits

dog

rod

a

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

right

S

Q

with


An example of deterministic dependency parsing nivre s algorithm63 l.jpg

boy Shift-Reduce parsing

hits

dog

rod

a

the

the

An Example of Deterministic DependencyParsing (Nivre’s Algorithm)

terminate

S

Q

with


Computational costs l.jpg
Computational Costs Shift-Reduce parsing

  • Cascade Chunking (Kudo2, Yamada)

    • O(n2)

  • Deterministic Shift-Reduce Parsing (Nivre)

    • O(n)

  • CKY-based parsing (Eisner, Kudo1)

    • O(n3)

  • Maximum Spanning Tree Parsing (McDonald)

    • O(n2)


Mcdonald s mst parsing l.jpg
McDonald Shift-Reduce parsing’s MST Parsing

  • R. McDonald, F. Pereira, K. Ribarov, and J. Hajiˇc, “Non-projective dependency parsing using spanning tree algorithms,”

    In Proceedings of the Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), 2005.

MST: Maximum Spanning Tree


Definition of non projective dependency trees l.jpg
Definition of Non-projective Dependency Trees Shift-Reduce parsing

  • Single head: Except for the root, all words have a single parent

  • Connected: It should be a connected tree

  • Acyclic: If wi→wj , then it will never be

    wj→*wi

*taken from: R. McDonald and F. Pereira, “Online Learning of Approximate

Dependency Parsing Algorithms,” European Chapter of Association for

Computational Linguistics, 2006.


Notation and definition l.jpg
Notation and definition Shift-Reduce parsing

training data

t-th sentence in T

Dependency structure for t-th sentence

(defined by a set of edges)

The vector of feature functions for edges

The set of all dependency trees

producible from x


The basic idea l.jpg
The Basic Idea Shift-Reduce parsing

The score function of dependency trees is defined by

the sum of the scores of edges, which are defined by

the weighted sum of feature functions

saw

The target optimization problem:

I

girl

a

L(y,y’) is defined as the number of words in y’ that

have different parent compared with y.


Explanation of formulas l.jpg
Explanation of formulas Shift-Reduce parsing

Score of an edge connecting nodes i and j

Feature vector representing information of

words i and j, and relation between them

Weight vector for the feature vectors

for maximizing (learned from

training examples)


Online learning algorithm mira l.jpg
Online learning algorithm (MIRA) Shift-Reduce parsing

  • w0=0; v=0; i=0

  • for n : 1..N

  • for t : 1..T

  • w(i+1)=update w(i) with (xt,yt)

  • v=v+w(i+1)

  • i=i+1

  • w=v/(N*T)

N: predefined number of

iteration

T: number of traning examples


Update of the weight vector l.jpg
Update of the weight vector Shift-Reduce parsing

Since this formulation requires all the possible trees,

this is modified so that only the k-best trees are taken

into consideration

(k=5~20)


Features used in training l.jpg
Features used in training Shift-Reduce parsing

  • Unigram features: word/POS tag of the parent, word/POS tag of the child (when the length of a word is more than 5, only 5-letter prefix is used, this applies all other features)

  • Bigram features: pair of word/POS tags of the parent and child

  • Trigram features: POS tags of the parent and child, and the POS tag or a word appearing in-between

  • Context features: POS tags of the parent and child, and two POS tags before and after them (backed-off by trigrams)


Settings of experiments73 l.jpg
Settings of Experiments Shift-Reduce parsing

  • Convert Penn Treebank into dependency trees based on Yamada’s head rules data(chapters 02-21), development data(22), test data(23)

  • POS tagging is done by Ratnaparkhi’s MaxEnt tagger

  • K-best tree are constructed based of Eisner’s parsing algorithm


Dependency parsing by maximum spanning tree algorithm l.jpg
Dependency Parsing by Shift-Reduce parsingMaximum Spanning Tree Algorithm

  • Learn the optimal w from training data

  • For a given sentence:

    • Calculate the scores to all word pairs in the sentence

  • Get the spanning tree with the largest score

    • There is an algorithm (Chu-Liu-Edmonds algorithm) that requires just O(n2) time to obtain the maximum spanning tree (n: the number of nodes(words))


The image of getting the maximum spanning tree l.jpg
The Image of Getting Shift-Reduce parsingthe Maximum Spanning Tree

“John saw Mary”

Scores to all the pairs or words

(root is the dummy root node)

The maximum

spanning tree


Results of experiments l.jpg
Results of experiments Shift-Reduce parsing

accuracy: proportion of correct edges

root: accuracy of root words

comp.: proportion of sentences completely analyzed


Available dependency parsers l.jpg
Available Dependency Parsers Shift-Reduce parsing

  • Maximum Spanning Tree Parser (McDonald)

    • http://ryanmcd.googlepages.com/MSTParser.html

  • Date-driven Dependency Parser (Nivre)

    • http://w3.msi.vxu.se/~nivre/research/MaltParser.html

  • CaboCha: SVM-based Japanese Dependency Parser (Kudo)

    • http://chasen.org/~taku/software/cabocha/


Slide78 l.jpg

References Shift-Reduce parsing

Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto, "Deterministic dependency analyzer for Chinese," IJCNLP-04: The First International Joint Conference on Natural Language Processing,P.135-140, 2004.

Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto, "Chinese Deterministic Dependency Analyzer: Examining Effects of Global Features and Root Node Finder," Proc. Forth SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop, pp.17-24, October 2005.

Jason M. Eisner, "Three New Probabilistic Models for Dependency Parsing: An Exploration," COLING-96: The 16th International Conference on Computational Linguistics, pp.340-345, 1996.

Taku Kudo and Yuji Matsumoto, “Japanese Dependency Structure Analysis Based on Support Vector Machines,”Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.18-25, October 2000.

Taku Kudo, Yuji Matsumoto, "Japanese dependency analysis using cascaded chunking," Proc. 6th Conference on Natural Language Learning (CoNLL-02), pp.63-69, 2002.

Ryan McDonald, Koby Crammer, Fernando Pereira, "Online Large-Margin Training of Dependency Parsers," Proc. Annual Meeting of the Association for Computational Linguistics, pp.91-98, 2005.

Ryan McDonald, Fernando Pereira, Jan Hajic "Non-Projective Dependency Parsing using Spanning Tree Algorithms,"HLT-EMNLP, 2005.

Ryan McDonald, Fernando Pereira, "Online Learning of Approximate Dependency Parsing Algorithms," Proc. European Chapter of Association for Computational Linguistics, 2006.

Joakim Nivre, "An Efficient Algorithm for Projective Dependency Parsing," Proc. 8th International Workshop on Parsing Technologies (IWPT), pp.149-160, 2003.

Joakim Nivre, Mario Scholz, "Deterministic Dependency Parsing of English Text," COLING 2004: 20th International Conference on Computational Linguistics, pp.64-70, 2004.

Joakim Nivre, Jens Nilsson, "Pseudo-Projective Dependency Parsing," ACL-05: 43rd Annual Meeting of the Association for Computational Linguistics, pp.99-106, 2005.

Joakim Nivre,: Inductive Dependency Parsing, Text, Speech & Language Technology Vol.34, Springer, 2006.

Hiroyasu Yamada and Yuji Matsumoto, "Statistical dependency analysis with Support Vector Machines," Proc. 8th International Workshop on Parsing Technologies (IWPT), pp.195-206, 2003.