Learning with latent alignment structures
Download
1 / 25

NLP-lunch-QG-TECRFs - PowerPoint PPT Presentation


  • 551 Views
  • Uploaded on

Learning with Latent Alignment Structures Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment Mengqiu Wang Joint work with Chris Manning, Noah Smith Task definition At a high-level:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'NLP-lunch-QG-TECRFs' - Audrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Learning with latent alignment structures l.jpg

Learning with Latent Alignment Structures

Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment

Mengqiu Wang

Joint work with Chris Manning, Noah Smith


Task definition l.jpg
Task definition

  • At a high-level:

    • Learning the syntactic and semanticrelations between two pieces of text

  • Application-specific definition of the relations

    • Question Answering

      Q: Who is the leader of France?

      A: Bush later met with French President Jacques Chirac

    • Machine Translation

      C: 温总理昨天会见了日本首相安培晋三。

      E: Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday.

    • Summarization

      T: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq.

      S: US rounded up 400 people in Iraq.

    • Textual Entailment (IE, IR, QA, SUM)

      Txt:Responding to Scheuer's comments in La Repubblica, the prime minister's office said the analysts' allegations, "beyond being false, are also absolutely incompatible with the contents of the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler."

      Hyp:Mel Sembler represents the U.S.


The challenges l.jpg
The Challenges

  • Latent alignment structure

    • QA: Who is the leader of France?

      Bush later met with French President Jacques Chirac

    • MT: 温总理昨天会见了日本首相安培晋三。

      Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday.

    • Sum: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq.

      US rounded up 400 people in Iraq.

    • RTE: Responding to … the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler.“

      Mel Sembler represents the U.S.


Other modeling challenges l.jpg
Other modeling challenges

1. Bush later met with French president Jacques Chirac.

2. Henri Hadjenberg, who is the leader of France ’s Jewish community, …

3. …

Who is the leader of France?

Question

Answer Ranking


Semantic tranformations l.jpg
Semantic Tranformations

  • Q:“Who is the leader of France?”

  • A: Bush later met with Frenchpresident Jacques Chirac.


Syntactic transformations l.jpg
Syntactic Transformations

Who

mod

mod

is

the

leader

of

France

?

mod

Bush

met

with

French

president

Jacques

Chirac


Syntactic variations l.jpg
Syntactic Variations

Who

mod

mod

is

the

leader

of

France

?

mod

mod

Henri

Hadjenberb

,

who

is

the

leader

of

France

’s

Jewish

community


What s been done l.jpg
What’s been done?

  • The latent alignment problem

    • Instead of treating alignment as latent variable, treat it as a separate task. First find the best alignment, then proceed with the rest of the task

    • Pros: Usually simple and efficient.

    • Cons: Not very robust, no way to correct alignment errors in later steps.

  • Modeling syntax and semantics

    • Extract features from syntactic parse trees and semantic resources then throw them into a linear classifier. Use syntax and semantic to enrich the feature space, but no principled ways to make use of syntax

    • Pros: No need to worry about trees too much

    • Cons: Ad-hocs


What i think an ideal model should do l.jpg
What I think an ideal model should do

  • Carry alignment uncertainty into final task

    • Treat alignment as latent variables and jointly learn about proper alignment structure and the overall task

    • In other words, model the distribution over alignments and sum out all possible alignments at decoding time.

  • Syntax-based and feature-rich models

    • Directly model syntax

    • Enable the use of rich semantic features and features from other world-knowledge resources.


Road map l.jpg
Road map

  • Present two models that address the raised issues

    • 1: A model based on Quasi-synchronous Grammar (EMNLP 07’)

      • Experiments on Question Answering task

    • 2: A tree-edit CRFs model (current work)

      • Experiments on RTE

  • Discuss and compare these two models

    • Modeling power

    • Pros and cons

  • Future work


Switching gear l.jpg
Switching gear…

  • Quasi-synchronous Grammar for Question Answering


Tree edit crfs for rte l.jpg
Tree-edit CRFs for RTE

  • Extension to McCallum et al. UAI2005 work on CRFs for finite-state String Edit Distance

  • Key attractions:

    • Models the transformation of dependency parse trees (thus directly models syntax), unlike McCallum et al. ’05, which only models word strings

    • Discriminatively trained (not a generative model, unlike QG)

    • Trained on both the positive and negative instances of sentence pairs (QG is only trained on positive Q/A pairs)

    • CRFs – the underlying graphical model is an undirected graphical model (QG is basically a Bayes Net, directed)

      • Joint model over alignments (vs. local alignment models in QG)

      • Feature rich


Te crfs model in details l.jpg
TE-CRFs model in details

  • First of all, let’s look at the correspondence between alignment (with constraints) and edit operations


Slide14 l.jpg

$

root

$

root

Q:

A:

substitute

root

root

met

VBD

is

VB

substitute

subj

obj

subj

with

who

WP

qword

leader

NN

Bush

NNP

person

Jacques Chirac

NNP

person

insert

det

of

Fancy

substitute

nmod

the

DT

France

NNP

location

president

NN

substitute

delete

nmod

French

JJ

location

substitute


Te crfs model in details15 l.jpg

S2

S2

S3

S2

S1

S3

S2

S3

S1

S2

S3

S1

S3

S1

S1

S1

S2

S3

S1

S2

S1

TE-CRFs model in details

  • Each valid tree edit operation sequence that transforms one tree into the other corresponds to an alignment. A tree edit operation sequence is models as a transition sequence among a set of states in a FSM

D, S, I

D, S, I

D, S, I

S1

S2

D, E, I

D, S, I

S3

D, S, I

D, S, I

substitute

insert

substitute

delete

substitute

substitute


Slide16 l.jpg

S2

S3

S2

S2

S2

S2

S3

S3

S3

S1

S3

S2

S1

S1

S3

S2

S1

S1

S1

S1

S1

S2

S1

S3

S1

S3

S1

S1

S1

S1

S2

S1

S1

S3

S1

S1

S1

S1

S1

S1

S1

S2

FSM

substitute

insert

substitute

delete

substitute

substitute

This is for one edit operation sequence

substitute

insert

delete

substitute

substitute

substitute

insert

substitute

substitute

delete

substitute

substitute

substitute

insert

substitute

substitute

delete

substitute

There are many other valid edit sequences


Fsm cont l.jpg

D, S, I

D, S, I

D, S, I

D, S, I

D, S, I

D, S, I

S1

S1

S2

S2

D, S, I

D, S, I

D, S, I

D, S, I

S3

S3

D, S, I

D, S, I

D, S, I

D, S, I

FSM cont.

ε

ε

Positive State Set

Start

Stop

ε

ε

Negative State Set


Fsm transitions l.jpg
FSM transitions

Positive State Set

S1

S1

S2

S3

S2

S3

S3

S3

S2

S3

S1

S2

S1

S1

S2

S2

S2

S2

S2

S3

S3

S2

S1

S3

Stop

Start

Negative State Set

S1

S1

S2

S3

S2

S3

S3

S3

S2

S3

S1

S2

S1

S1

S2

S2

S2

S2

S2

S3

S3

S2

S1

S3


Parameterization l.jpg
Parameterization

substitute

S2

S1

positive or negative

positive and negative


Training using em l.jpg
Training using EM

Jensen’s Inequality

E-step

M-step

Using L-BFGS


Features for rte l.jpg
Features for RTE

  • Substitution

    • Same --Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/Adv/Other

    • Sub/MisSub -- Punct/Stopword/ModalWord

    • Antonym/Hypernym/Synonym/Nombank/Country

    • Different – NE/Pos

    • Unrelated words

  • Delete

    • Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Conditional/If

  • Insert

    • Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Conditional/If

  • Tree

    • RootAligned/RootAlignedSameWord

    • Parent,Child,DepRel triple match/mismatch

  • Date/Time/Numerical

    • DateMismatch, hasNumDetMismatch, normalizedFormMismatch


Tree edit crfs for textual entailment l.jpg
Tree-edit CRFs for Textual Entailment

  • Preliminary results

    • Trained on RTE2 dev, tested on RTE2 test.

    • model taken after 50 EM iterations

    • acc:0.6275, map:0.6407.

  • RTE2 official results

    • Hickl (LCC) acc:0.7538, map:0.8082

    • Tatu (LCC) acc:0.7375, map:0.7133

    • Zanzotto (Milan & Rome) acc:0.6388, map:0.6441

    • Adams (Dallas) acc:0.6262, map:0.6282


Comparison qg vs te crfs l.jpg

Generative

Directed, BayesNet, local

Allow arbitrary swapping in alignment

Allow limited use of semantic features (lexical-semantic log-linear model in mixture model)

Computationally cheaper

Discriminative

Undirected, CRFs, global

No swapping – can’t do substitutions that involve swapping (can be extended, see future work)

Allow arbitrary semantic features

Computationally more expensive

Comparison: QG vs. TE-CRFs

QG

TE-CRFs


Future work l.jpg

Generative

Train discriminatively using Noah’s Contrastive Estimation

Directed, BayesNet, local

Higher-order Markovization

Allow arbitrary swapping in alignment

Allow limited use of semantic features (lexical-semantic log-linear model in mixture model)

Computationally cheaper

Run RTE experiments

Discriminative

Undirected, CRFs, global

No swapping

Constrained unordered trees

Fancy edit operations (e.g. substitute sub-trees)

Allow arbitrary semantic features

More expensive

Run QA and MT alignment experiments

Future work

QG

TE-CRFs


Thank you l.jpg
Thank you!

Questions?


ad