1 / 20

From extracts to abstracts: human summary production operations for computer-aided summarisation

This research explores the classification and evaluation of human summary production operations for computer-aided summarisation, focusing on improving coherence and readability in summary production.

snider
Download Presentation

From extracts to abstracts: human summary production operations for computer-aided summarisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From extracts to abstracts: human summary production operations for computer-aided summarisation Laura Hasler University of Wolverhampton L.Hasler@wlv.ac.uk CALP 2007: 30.09.07

  2. Overview • Computer-aided summarisation (CAS) • Summary production stage of summarisation • Classification of human summary production operations (and guidelines) • Evaluation of classification (and guidelines derived from it) • Some conclusions and possibilities for future work Laura Hasler: CALP 2007

  3. Computer-aided summarisation • Feasible alternative to fully automatic summarisation given problems of coherence/ readability with automatic extracts • Automatic summarisation methods produce an extract (document exploration, relevance assessment) which is then post-edited by user (summary production) • No resources to ensure consistency • Focus of this research on summary production (extract  abstract) to improve coherence and readability Laura Hasler: CALP 2007

  4. Aim of the research • Chernobyl reactor number 4 was ripped apart by an explosion on 26 April 1986. Last September, the IAEA and the WHO released a report. Its headline conclusion that radiation from the accident would kill a total of 4000 people was widely reported. B) Last September, the IAEA/WHO released a report on the explosion of Chernobyl reactor number 4 on 26 April 1986, concluding that radiation from the accident would kill a total of 4000 people.(h03-ljh) Laura Hasler: CALP 2007

  5. Classification of operations • 43 pairs of news texts (extract, abstract) • 30% extracts (CAST guidelines)  20% abstracts • 5 general classes of operations • Atomic: deletion, insertion • Complex: replacement, reordering, merging • Each split into sub-operations (26 in total) • Sub-operations linked to triggers, or recognisable surface forms • Function of units also important Laura Hasler: CALP 2007

  6. Deletion • “The process of removing a unit from a certain place in the extract so it does not appear in the same place in the abstract” • Used alone or as part of complex operations • Very useful for reducing text when used alone • Deletes non-essential units (details, repetitions) • Complete sentences, subordinate clauses, PPs, reporting clauses, determiners, be Laura Hasler: CALP 2007

  7. Deletion examples • [I suspect that] the set would be the ideal book for a physicist to be cast away with on a desert island. (new-sci-B7L-54-ljh) • Three papers published recently in Science move us a little closer to understanding the basis of the disease[, which turns out to be highly complex]. (sci04done-an) Laura Hasler: CALP 2007

  8. Insertion • “The process of adding a unit which is not present in the extract into the abstract” • Used alone or as part of complex operations • Interesting because it adds text to something which is supposed to be reduced • Used to add coherence and to clarify whilst saving space • Connectives, modifiers, ‘formulaic units’, punctuation Laura Hasler: CALP 2007

  9. Insertion examples • He seesthe need to raise public awareness and demystify science and technology asa key point… (new-sci-B7L-75-ljh)[X sees Y as Z] • The TV series Men of Science is now being shown in a few other areas. (new-sci-B7L-69-ljh) Laura Hasler: CALP 2007

  10. Replacement • “The deletion of one unit and the insertion of a different unit in the same place in the text” • Complex operation, can be used in combination with other complex operations • Useful for avoiding repetition and saving space • Pronominalisation, lexical substitution, NP restructuring, nominalisation, VPs, passivisation, abbreviations Laura Hasler: CALP 2007

  11. Replacement examples • [Zhanat Carr, a radiation scientist with the WHO in Geneva,]The WHO [says]admitsthe 5000 deaths were omitted because the report was a "political communication tool". (h03-ljh) • [All this][is] hardly Culver’s fault. [The same difficulties are to be found in all other parts of evolutionary ecology.]These general difficulties of evolutionary ecologyare hardly Culver’s fault. (new-sci-B7L-63-ljh) Laura Hasler: CALP 2007

  12. Reordering • “The deletion of a unit from one place in the extract and its insertion in a different place in the abstract” • Complex operation, can be used in combination with other complex operations • Sub-functions rather than operations – difficult to sub-classify • Emphasises information, improves coherence and readability Laura Hasler: CALP 2007

  13. Reordering example • Text about world’s second face transplant, all other sentences about a specific person/operation • Experts predict the number of these operations will rise rapidly as centres around the world gear up to perform the procedure. (h01-ljh) • S2  last sentence Laura Hasler: CALP 2007

  14. Merging • “Taking information from different units in the extract and presenting it as one unit in the abstract” • All other operations can be used • Large class, most difficult to sub-classify – anything (appropriate) goes! • Best embodies abstracting as opposed to extracting – conciseness • Restructuring of clauses/sentences, punctuation/connectives Laura Hasler: CALP 2007

  15. Merging example • In October 1980 Zuccarelli filed [an expensive] European patent application, covering nine countries including Britain[. … The cost of pushing a European patent through in nine countries is around $10000. The cost of application alone is around $2000 and Zuccarelli has already paid an extra $500 for a further stage of official examination]. (new-sci-B7K-37) Laura Hasler: CALP 2007

  16. Evaluation • Applied guidelines to a different set of extracts • 25 human-produced extracts + corresponding abstracts • 25 automatically produced extracts + corresponding abstracts • Developed Centering Theory as an evaluation method (evaluation metric) due to unsuitability of existing evaluation methods Laura Hasler: CALP 2007

  17. Centering Theory (CT)(Grosz, Joshi & Weinstein 1995) • Parametric theory of local coherence and salience • Accounts for coherence using repetitions of entities across consecutive utterances • Uses the relationship between repetitions to derive ‘transitions’ • Transitions are ordered in preference from most to least coherent • Metric developed to reflect the effect of transitions in summaries Laura Hasler: CALP 2007

  18. Evaluation 2 • Human judgment obtained to complement CT • Overall, human summary production operations improve texts: CT = 78%; Judge = 82% • Agreement between CT and judge = 70% • Classification and resulting guidelines can be reliably used during post-editing in CAS • CT is useful as an evaluation method Laura Hasler: CALP 2007

  19. Conclusions • Analysis and classification of human summary production operations for CAS ( guidelines) • Evaluation: applying these operations to extracts results in more coherent/readable abstracts • Guidelines can help CAS system users in their task Future work • To use more human summarisers/judges to further validate classification/guidelines • To look at scientific texts (also popular in AS) • To further explore CT for evaluation Laura Hasler: CALP 2007

  20. Thank you! Any questions? Laura Hasler: CALP 2007

More Related