Summarizing documents based on cue phrases and references
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Summarizing documents based on cue-phrases and references PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Summarizing documents based on cue-phrases and references. Goal: coherent focused summaries. What is a focused summary? - reveals on short what the document tells about the key entity (focus), within the context of the whole document Why focused summaries?

Download Presentation

Summarizing documents based on cue-phrases and references

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Summarizing documents based on cue phrases and references

Summarizing documents based on cue-phrases and references


Goal coherent focused summaries

Goal: coherent focused summaries

What is a focused summary?

- reveals on short what the document tells about the key entity (focus), within the context of the whole document

Why focused summaries?

For example, when searching the web about an entity:

  • avoid browsing tremendous list of links to documents mentioning that entity (as given by a normal search engine)

  • read abstracts that mention the searched entity

  • if of minor importance in a document, the searched entity will not appear in a normal abstract


The idea

The idea

summary

discourse structure

cue-phrasesanaphoric references

VT


The proposed method 1

The proposed method (1)

Preparatory phases:

  • POS-tagging

  • Syntactic tagging done by an FDG parser

  • NP-tagging


The proposed method 2

1

2

4

3

5

6

7

8

9

The proposed method (2)

Step 1: segmentation into elementary discourse units (edu-s)

Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.


The proposed method 21

1

2

4

3

5

6

7

8

9

– for instance –

– when –

The proposed method (2)

Step 2: building-up sentence level discourse trees (sdt-s)

Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.

5

6


The proposed method 22

The proposed method (2)

Step 3: anaphora resolution

Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.


The proposed method 23

pdti

pdti-1

sdti

*

foot node

The proposed method (2)

Step 4: integration of sdt-s in a global structure


The proposed method 24

The proposed method (2)

Step 5: generating the summary


Step 1 text segmentation method

Step 1: Text segmentation method

  • Identification of finite verbs

  • Extraction of the FDG-sub-tree rooted in each finite verb

  • (If FDG tagging is correct, then every sub-tree will represent a clause)

  • Grouping clauses, if necessary, into discourse units

Maria went alone to the marketbecauseSimon had to stay at home with the baby.Simon is a good friend of mineandhe also helped me in a number of situations.For instancehe was very helpfulwhenI had the problem with the car.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.


Step 2 inference of the sdt s 1

Cue words or phrases (markers)

Inter-edu-s local dependencies

Inner nodes labeled with markers

Sentence level discourse trees

Terminal nodes labeled with edu-s

Step 2: Inference of the sdt-s (1)


Step 2 inference of the sdt s 2

1

4a

2

3

4b

– so –

because –, –

– and –

Step 2: Inference of the sdt-s (2)

Cue-phrases usually suggest patterns of displacement of the connected arguments, nuclearity included

Ambiguities:

[1] so[2,3,4]

because [2,3,4], [2,3,4]

[1,2]and[3,4]

Inferring the sdt = finding the proper arguments and nuclearities

John is determined to pass the NLP examso,becausehe has missed many coursesandwas only vaguely implicated at the working sessions,he will have a hard time until summer.

1 so,because 2 and 3, 4


Step 2 inference of the sdt s 3 consistency constraints for elementary sdt s

markeri

markerj

Step 2: Inference of the sdt-s (3) Consistency constraints for elementary sdt-s

The “nesting-arguments” rule

If an edu xsub-tree(markeri)sub-tree(markerj) with ij, then one marker is in the other one’s sub-tree.

This rule states that it is impossible to have two inner nodes of the tree, which cover crossing text spans on the terminal frontier


Step 3 anaphora resolution the ar engine

text

AR-engine

AR-model1

AR-model2

AR-model3

anaphoric links

Step 3: Anaphora resolutionThe AR-engine

AR-engine is a general framework for anaphora resolution, able to accommodate different AR-models.


The three layered anaphora resolution process

text layer ……………………….…………………………………………

REa

REb

REc

REd

REx

PSx

projection layer ………………………………………………

DE1

DEm

DEj

semantic layer ………………………………………

The three-layered anaphora resolution process

Reference expressions (RE)

Projected structures (PS)

Discourse entities (DE)


What is an ar model

text layer ……………………….…………………………………………

REa

REb

REc

REd

REx

knowledge sources

PSx

projection layer ………………………………………………

primary attributes

DE1

DEm

DEj

semantic layer ………………………………………

heuristics/rules

domain of referential accessibility

What is an AR-model?


Types of anaphorae resolved

Types of anaphorae resolved

  • Common nouns referring proper nouns

  • Common nouns with different lemmas

  • Pronominal references


Step 4 compiling the final discourse structure 1

for instance

because

and

when

:

7

1

2

3

4

5

6

8

9

b

b

c

b

b

d

b

d

b

d

d

a

b

c

a

a

a

Step 4: Compiling the final discourse structure (1)

A discourse structure tree must be derived by combining the sdt-s

(a = Maria, b = Simon, c = the child, d = I, empty = any other REs)


Step 4 compiling the final discourse structure 2

2

because

+

=>

b

c

because

and

and

1

1

2

3

3

4

4

a

b

c

b

b

b

b

d

d

b

b

d

d

a

pdt1 = sdt1

sdt2

pdt2

Step 4: Compiling the final discourse structure (2)


Step 4 compiling the final discourse structure 3

because

2

2

1

because

b

b

c

c

a

and

and

1

3

4

3

4

for instance

a

b

b

d

b

d

b

b

d

b

d

for instance

when

5

6

when

5

6

b

d

b

d

Step 4: Compiling the final discourse structure (3)

+

=>

pdt2

pdt3

sdt3


Step 4 compiling the final discourse structure 4

because

because

2

2

1

1

b

b

c

c

a

and

a

3

4

4

for instance

and

b

b

d

b

b

d

d

when

3

for instance

5

6

b

b

d

when

7

7

b

d

5

6

b

b

d

d

a

a

b

b

c

c

b

d

Step 4: Compiling the final discourse structure (4)

+

=>

pdt3

pdt4

sdt4


Step 4 compiling the final discourse structure 5

because

because

2

2

1

1

b

b

c

c

a

a

4

4

and

b

b

d

d

and

3

for instance

:

:

3

for instance

b

b

d

when

7

7

8

8

9

9

b

b

d

5

6

when

b

b

d

d

a

a

b

b

c

c

a

a

a

a

5

6

b

d

b

d

Step 4: Compiling the final discourse structure (5)

+

=>

pdt4

pdt5

sdt5


Step 5 generating the summary 1 veins theory

Step 5: Generating the summary (1)Veins Theory

  • Head expression of a node: the sequence of the most important units within the corresponding span of text:

    • the head of a terminal node: its label

    • the head of a non-terminal node: the concatenation of the head expressions of the nuclear children

  • the important units are projected up to the level where the corresponding span is seen as a satellite


Step 5 generating the summary 2 computing head expressions

because

H=1

1

H=2

H=7

2

and

7

:

H=9

H=3

H=8

3

for instance

8

9

H=4

4

when

H=6

H=5

5

6

Step 5: Generating the summary (2)Computing head expressions

H=1

H=2,7

H=7

H=2

H=8,9

H=3,4

H=4

H=5


Step 5 generating the summary 3 veins

Step 5: Generating the summary (3)Veins

Vein expression of a node: the sequence of units that are required to understand the span of text covered by the node, in the context of the whole discourse

to understand a piece of text in the context of the whole discourse one needs the significant units within the span together with other surrounding units


Step 5 generating the summary 4 computing vein expressions

V=v

V=v

H=h

V=v

V=seq(h, v)

V=v

Step 5: Generating the summary (4)Computing vein expressions

Vein expressions are computed top-down starting with the root (vein expression of the root is its head expression)


Step 5 generating the summary 5 vein expressions

because

1

2

and

7

:

3

for instance

8

9

4

when

5

6

Step 5: Generating the summary (5)Vein expressions

V=1

V=1,2,7

V=1

V=1,2,7

V=1,2,7

V=1,2,3,4,7

V=1,2,7

V=1,2,7

V=1,2,7,8,9

V=1,2,7,8,9

V=1,2,3,4,7

V=1,2,7,8,9

V=1,2,3,4,7

V=1,2,3,4,7

V=1,2,3,4,5,7

V=1,2,3,4,5,6,7

V=1,2,3,4,5,7


The summaries

The summaries

  • Maria is referred in edu-s 1,7,8,9 =>

  • summary focused on Maria {1,2,7,8,9}

Maria went alone to the marketbecauseSimon had to stay at home with the baby.I think she has a lot of trust in him to let him alone with the child.You know how Maria is: she is not very hurried to give credit to anybody.

  • Simon is referred in edu-s 2,3,4,5,7=>

  • summary focused on Simon {1,2,3,4,5,7}

  • The child is referred in edu-s 2,7 =>

  • summary focused on the child {1,2,7}

Maria went alone to the marketbecauseSimon had to stay at home with the baby.I think she has a lot of trust in him to let him alone with the child.

  • I is referred in edu-s 3,4,6,7 =>

  • summary focused on I {1,2,3,4,5,6,7}


Results

Results

Segmentation step:

The results show that, if the input contained errors made by the FDG parser, the precision and recall of the segmentation method would be around 75%. If the input was corrected (that means if all words were properly related between them), the precision and recall would be of 100%.

Anaphora resolution step:

The best results proved 100% precision and values of recall in range 70% to 100%. These figures should be taken with care, because of the small dimension of the corpus we used.


Conclusions

Conclusions

  • The method proposed is based on an earlier investigation which showed a correlation between references and vein structure (antecedents can be found along veins - 99,1% references obey this conjecture)

  • It is a deterministic method in the sense that only one tree is obtained

  • Degrees of non-determinism show up at:

    -building sdt-s due to different cue-phrase patterns

    - combining sdt-s into a final discourse tree


Further work

Further work

  • Identify the overall trust in the method

  • Improve the method of building the global structure (scores for the types of antecedents)

  • Transform it by using CT into a beam-search type of processing

  • Derive more sophisticated sdt integration rules by learning

  • Represent only vein expressions, not the entire tree


Summarizing documents based on cue phrases and references

Thank you!


  • Login