Data driven dependency parsing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Data-Driven Dependency Parsing PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Data-Driven Dependency Parsing. Kenji Sagae CSCI-544. Background: Natural Language Parsing. Syntactic analysis String to (tree) structure. S. VP. NP. PARSER. NP. He likes fish. N. Prn. V. He. likes. fish. Input. Output. S. VP. NP. PARSER. NP. He likes fish. N. Prn. V.

Download Presentation

Data-Driven Dependency Parsing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data driven dependency parsing

Data-Driven Dependency Parsing

Kenji SagaeCSCI-544


Background natural language parsing

Background: Natural Language Parsing

  • Syntactic analysis

    • String to (tree) structure

  • S

  • VP

  • NP

PARSER

  • NP

  • He likes fish

  • N

  • Prn

  • V

  • He

  • likes

  • fish

Input

Output


Data driven dependency parsing

  • S

  • VP

  • NP

PARSER

  • NP

  • He likes fish

  • N

  • Prn

  • V

  • He

  • likes

  • fish


Data driven dependency parsing

PARSER

  • He likes fish

  • Useful in Natural Language Understanding

    • NL interfaces, conversational agents

  • Language technology applications

    • Machine translation, question answering, information extraction

  • Scientific study of language

    • Syntax

    • Language processing models

  • S

  • VP

  • NP

  • NP

  • N

  • Prn

  • V

  • He

  • likes

  • fish


Data driven dependency parsing

PARSER

  • He likes fish

  • S

  • VP

Not enough coverage,

Too much ambiguity

  • NP

S → NP VP

NP → N

NP → NP PP

VP → V NP

VP → V NP PP

VP → VP PP

  • NP

  • N

  • Prn

  • V

  • He

  • likes

  • fish

GRAMMAR


Data driven dependency parsing

PARSER

  • He likes fish

  • S

  • S

  • S

  • S

  • S

  • S

  • S

Charniak (1996);

Collins (1996);

Charniak (1997)

  • VP

  • VP

  • VP

  • VP

  • VP

  • VP

  • VP

  • NP

  • NP

  • NP

  • NP

  • NP

  • NP

  • NP

S → NP VP

NP → N

NP → NP PP

VP → V NP

VP → V NP PP

VP → VP PP

  • AdvP

  • AdvP

  • AdvP

  • AdvP

  • NP

  • N

  • Det

  • Det

  • N

  • N

  • Prn

  • N

  • N

  • V

  • V

  • V

  • V

  • V

  • V

  • V

  • Adv

  • Adv

  • Adv

  • Adv

  • The

  • The

  • Dogs

  • Dogs

  • Dogs

  • Dogs

  • He

  • runs

  • run

  • run

  • likes

  • runs

  • run

  • run

  • fast

  • fast

  • fast

  • fish

  • fast

  • N

  • N

  • boy

  • boy

GRAMMAR

TREEBANK


Data driven dependency parsing

PARSER

  • He likes fish

  • S

  • S

  • S

  • S

  • S

  • S

  • S

  • VP

  • VP

  • VP

  • VP

  • VP

  • VP

  • VP

  • NP

  • NP

  • NP

  • NP

  • NP

  • NP

  • NP

S → NP VP

NP → N

NP → NP PP

VP → V NP

VP → V NP PP

VP → VP PP

  • AdvP

  • AdvP

  • AdvP

  • AdvP

  • NP

  • N

  • Det

  • Det

  • N

  • Prn

  • N

  • N

  • N

  • V

  • V

  • V

  • V

  • V

  • V

  • V

  • Adv

  • Adv

  • Adv

  • Adv

  • The

  • The

  • Dogs

  • Dogs

  • Dogs

  • Dogs

  • He

  • runs

  • likes

  • run

  • runs

  • run

  • run

  • run

  • fast

  • fast

  • fast

  • fast

  • fish

  • N

  • N

  • boy

  • boy

GRAMMAR

TREEBANK


Data driven dependency parsing

Phrase Structure Tree

(Constituent Structure)

  • S

  • VP

  • NP

  • NP

  • Det

  • N

  • N

  • Det

  • N

  • V

  • boy

  • cheese

  • sandwich

  • The

  • ate

  • the

Dependency Structure

  • boy

  • cheese

  • sandwich

  • The

  • ate

  • the


Data driven dependency parsing

ate

  • S

ate

  • VP

boy

sandwich

  • NP

  • NP

  • Det

  • N

  • N

  • Det

  • N

  • V

  • boy

  • cheese

  • sandwich

  • The

  • ate

  • the

  • boy

  • cheese

  • sandwich

  • The

  • ate

  • the


Data driven dependency parsing

LABEL

HEAD

ate

OBJ

SUBJ

DEPENDENT

sandwich

boy

DET

MOD

DET

The

the

cheese

OBJ

DET

DET

SUBJ

MOD

  • boy

  • cheese

  • sandwich

  • The

  • ate

  • the


Background linear classification with the perceptron

Background: Linear Classification with the Perceptron

  • Classification: given an input x predict output y

    • Example: x is a document, y ∈ {Sports, Politics, Science}

  • x is represented as a feature vector f(x)

    • Example:

      x f(x) y

  • Just add feature weights given in a vector w

Wednesday night, when the Lakers play the Mavericks at American Airlines Center, they get to see first hand …

# games:5

# Lakers:4

# said:3

# rebounds:3

# democrat:0

# republican:0

# science:0

Sports


Multiclass perceptron

Multiclass Perceptron

  • Learn vectors of feature weights wclass

    for each class c

    wc= 0

    For N iterations

    For each training example (xi, yi)

    zi= argmaxzwz• f(xi)

    if zi≠ yi

    wzi= wzi– f(xi)

    wyi= wyi+ f(xi)

  • Try to classify each example. If a mistake is made, update the weights.


Shift reduce dependency parsing

Shift-Reduce Dependency Parsing

  • Two main data structures

    • StackS (initially empty)

    • QueueQ (initialized to contain each word in the input sentence)

  • Two types of actions

    • Shift: removes a word from Q, pushes onto S

    • Reduce: pops two items from S, pushes a new item onto S

      • New item is a tree that contains the two popped items

  • This can be applied to either dependencies (Nivre, 2004) or constituents (Sagae & Lavie, 2005)


Shift

Shift

Before SHIFT

After SHIFT

SHIFT

to

… and pushes

this new item onto

the stack

a shift action removes the next token

from the input list…

Under a proposal…

Under a proposal…

PMOD

PMOD

expand

IRAs

a

to

expand

IRAs

a

Stack

Input string

Input string

Stack


Reduce

Reduce

expand

to

to expand

VMOD

Under a proposal…

Under a proposal…

PMOD

PMOD

IRAs

a

$2000

IRAs

a

$2000

Before REDUCE

After REDUCE

REDUCE-RIGHT-VMOD

a reduce action

pops these

two items…

… and pushes

this new item

Stack

Input

Stack

Input


Data driven dependency parsing

REDUCE-RIGHT-SUBJ

REDUCE-LEFT-OBJ

SHIFT

SHIFT

SHIFT

Parser Action:

SUBJ

He likes

SUBJ OBJ

He likes fish

He

likes

fish

STACK

QUEUE


Choosing parser actions

Choosing Parser Actions

  • No grammar, no action table

  • Learn to associate stack/queue configurations with appropriate parser actions

  • Classifier

    • Treated as a black-box

    • Perceptron, SVM, maximum entropy, memory-based learning, etc

    • Features: top two items on the stack, next input token, context, lookahead, …

    • Classes: parser actions


Data driven dependency parsing

Features:

stack(0) = likes stack(0).POS = VBZ

stack(1) = Hestack(1).POS = PRP

stack(2) = 0stack(2).POS = 0

queue(0) = fishqueue(0).POS = NN

queue(1) = 0queue(1).POS = 0

queue(2) = 0queue(2).POS = 0

likes

He

fish

STACK

QUEUE


Data driven dependency parsing

Features:

stack(0) = likes stack(0).POS = VBZ

stack(1) = Hestack(1).POS = PRP

stack(2) = 0stack(2).POS = 0

queue(0) = fishqueue(0).POS = NN

queue(1) = 0queue(1).POS = 0

queue(2) = 0queue(2).POS = 0

Class: Reduce-Right-SUBJ

likes

He

fish

STACK

QUEUE


Data driven dependency parsing

Features:

stack(0) = likes stack(0).POS = VBZ

stack(1) = Hestack(1).POS = PRP

stack(2) = 0stack(2).POS = 0

queue(0) = fishqueue(0).POS = NN

queue(1) = 0queue(1).POS = 0

queue(2) = 0queue(2).POS = 0

Class: Reduce-Right-SUBJ

He likes

fish

STACK

QUEUE


Data driven dependency parsing

Features:

stack(0) = likes stack(0).POS = VBZ

stack(1) = Hestack(1).POS = PRP

stack(2) = 0stack(2).POS = 0

queue(0) = fishqueue(0).POS = NN

queue(1) = 0queue(1).POS = 0

queue(2) = 0queue(2).POS = 0

Class: Reduce-Right-SUBJ

He likes

fish

STACK

QUEUE


Data driven dependency parsing

Features:

stack(0) = likes stack(0).POS = VBZ

stack(1) = Hestack(1).POS = PRP

stack(2) = 0stack(2).POS = 0

queue(0) = fishqueue(0).POS = NN

queue(1) = 0queue(1).POS = 0

queue(2) = 0queue(2).POS = 0

Class: Reduce-Right-SUBJ

SUBJ

He likes

fish

STACK

QUEUE


Accurate parsing with greedy search

Accurate Parsing with Greedy Search

  • Experiments:

    • WSJ Penn Treebank

      • 1M words of WSJ text

      • Accuracy: ~90% (unlabeled dependency links)

    • Other languages (CoNLL 2006, 2007 shared tasks)

      • Arabic, Basque, Chinese, Czech, Japanese, Greek, Hungarian, Turkish, …

      • about 75% to 92%

  • Good accuracy, fast (linear time), easy to implement!


Maximum spanning tree parsing mcdonald et al 2005

Maximum Spanning Tree Parsing(McDonald et al., 2005)

  • Dependency tree is a graph (obviously)

    • Words are vertices, dependency links are edges

  • Imagine instead a fully connected weighted graph

    • Each weight is the score for the dependency link

    • Each scores is independent of other dependencies

      • Edge-factored model

  • Find the Maximum Spanning Tree

    • Score for the tree is the sum of the scores of its individual dependencies

  • How are edge weights determined?


Data driven dependency parsing

I ate a sandwich

1 2 3 4

0 (root)

2 (ate)

1 (I)

4 (sandwich)

3 (a)


Data driven dependency parsing

I ate a sandwich

1 2 3 4

12

0 (root)

2 (ate)

-8

-11

2

8

-3

3

1

5

1 (I)

7

3

3

9

3

5

1

4 (sandwich)

0

-2

9

3 (a)

-2


Data driven dependency parsing

I ate a sandwich

1 2 3 4

12

0 (root)

2 (ate)

-8

-11

2

8

-3

3

1

5

1 (I)

7

3

3

-1

3

5

1

4 (sandwich)

0

-2

9

3 (a)

-2


Structured classification

Structured Classification

  • x is a sentence, G is a dependency tree, f(G) is a vector of features for the entire tree

  • Features:

    h(ate):d(sandwich) hPOS(VBD):dPOS(NN)h(ate):d(I)hPOS(VBD):dPOS(PRP)h(sandwich):d(a)hPOS(NN):dPOS(DT)hPOS(VBD)hPOS(NN)dPOS(NN)dPOS(DT)dPOS(NN)dPOS(PRP)

    h(ate)h(sandwich)d(sandwich)

    … (many more)

  • To assign edge weights, we learn a feature weight vector w


Structured perceptron

Structured Perceptron

  • Learn a vector of feature weights w

    w = 0

    For N iterations

    For each training example (xi,Gi)

    G’i= argmaxG’ ∈GEN(xi)w• f(G’)

    if G’i≠ Gi

    w = w + f(Gi) – f(G’i)

  • The same as before, but to find the argmaxwe use MST, since each Gis a tree (which also contains the corresponding input x). If G’iis not the right tree, update the feature vector


Data driven dependency parsing

Question: Are there trees that an MST parser can find, but a Shift-Reduce parser* can’t?(*shift-reduce parser as described in slides 13-19)


Accurate parsing with edge factored models

Accurate Parsing with Edge-Factored Models

  • The Maximum Spanning Tree algorithm for directed trees (Chu & Liu, 1965; Edmonds, 1967) runs in quadratic time

  • Finds the best out of exponentially many trees

    • Exact inference!

  • Edge-factored: each dependency link is considered independently from the others

    • Compare to Shift-Reduce parsing

      • Greedy inference

      • Rich set of features includes partially built trees

  • McDonald and Nivre (2007) show that shift-reduce and MST parsing get similar accuracy, but have different strengths


Parser ensembles

Parser Ensembles

  • By using different types of classifiers and algorithms, we get several different parsers

  • Ensemble idea: combine the output of several parsers to obtain a single more accurate result

Parser A

I like cheese

Parser B

I like cheese

I like cheese

I like cheese

Parser C

I like cheese


Parser ensembles with maximum spanning trees sagae and lavie 2006

Parser Ensembles with Maximum Spanning Trees(Sagae and Lavie, 2006)

  • First, build a graph

    • Create a node for each word in the input sentence (plus one extra “root” node)

    • Each dependency proposed by any of the parsers is an weighted edge

    • If multiple parsers propose the same dependency, add weight to the corresponding edge

  • Then, simply find the MST

    • Maximizes the votes

    • Structure guaranteed to be a dependency tree


Data driven dependency parsing

I ate a sandwich

1 2 3 4

0 (root)

2 (ate)

1 (I)

4 (sandwich)

3 (a)


Data driven dependency parsing

I ate a sandwich

1 2 3 4

0 (root)

2 (ate)

1 (I)

4 (sandwich)

3 (a)


Data driven dependency parsing

I ate a sandwich

1 2 3 4

Parser A

Parser B

Parser C

0 (root)

2 (ate)

1 (I)

4 (sandwich)

3 (a)


Data driven dependency parsing

I ate a sandwich

1 2 3 4

0 (root)

2 (ate)

1 (I)

4 (sandwich)

3 (a)


Data driven dependency parsing

I ate a sandwich

1 2 3 4

0 (root)

2 (ate)

1 (I)

4 (sandwich)

3 (a)


Mst parser ensembles are very accurate

MST Parser Ensembles Are Very Accurate

  • Highest accuracy in CoNLL 2007 shared task on multilingual dependency parsing (a parser bake-off with 22 teams)

    • Nilson et al. (2007); Sagae and Tsujii (2007)

  • Improvement depends on selection of parsers for the ensemble

    • With four parsers with accuracy between 89 and 91, ensemble accuracy = 92.7


  • Login