1 / 14

Summarisation Work at Sheffield

Summarisation Work at Sheffield. Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield. Outline. Terminology Approach 1: Generation from Templates Approach 2: Coreference Chains Approach 3: Statistical. Terminology.

Download Presentation

Summarisation Work at Sheffield

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield

  2. Outline • Terminology • Approach 1: Generation from Templates • Approach 2: Coreference Chains • Approach 3: Statistical AKT Workshop

  3. Terminology • Extract vs Abstract • Extract - subset of the sentences in the original • Abstract - fusion of topics in original + text generation • Generic vs User-focused • Generic - captures essence of text, independent of user’s interests • User-focused – summarises content wrt a particular user interest • Indicative vs Informative • Indicative – indicates whether document should be examined in more detail • Informative – serves as a surrogate for original AKT Workshop

  4. Approach 1: Generation from Templates • To generate • user-focused • informative • abstracts we have used an IE system + simple NL generation techniques to produce simple summaries AKT Workshop

  5. Example: AWall Street Journal Article <DOC> <DOCID> wsj94_008.0212 </DOCID> <DOCNO> 940413-0062. </DOCNO> <HL> Who's News: @ Burns Fry Ltd. </HL> <DD> 04/13/94 </DD> <SO> WALL STREET JOURNAL (J), PAGE B10 </SO> <CO> MER </CO> <IN> SECURITIES (SCR) </IN> <TXT> <p> BURNS FRY Ltd. (Toronto) -- Donald Wright, 46 years old, wasnamed executive vice president and director of fixed income at thisbrokerage firm. Mr. Wright resigned as president of Merrill LynchCanada Inc., a unit of Merrill Lynch & Co., to succeed MarkKassirer, 48, who left Burns Fry last month. A Merrill Lynchspokeswoman said it hasn't named a successor to Mr. Wright, who isexpected to begin his new position by the end of the month. </p> </TXT> </DOC> AKT Workshop

  6. Example: BNF Definition of aManagement Succession Event Template (MUC-6) <TEMPLATE> := DOC_NR: "NUMBER" ^ CONTENT: <SUCCESSION_EVENT> * <SUCCESSION_EVENT> := ORGANIZATION: <ORGANIZATION> ^ POST: "POSITION TITLE" | "no title" ^ IN_AND_OUT: <IN_AND_OUT> + VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK} ^ <IN_AND_OUT> := PERSON: <PERSON> ^ NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING} ^ ON_THE_JOB: {YES, NO, UNCLEAR} OTHER_ORG: <ORGANIZATION> - REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG} - <ORGANIZATION> := ORG_NAME: "NAME" - ORG_ALIAS: "ALIAS" * ORG_DESCRIPTOR: "DESCRIPTOR" - ORG_TYPE: {GOVERNMENT, COMPANY, OTHER} ^ ORG_LOCALE: LOCALE_STRING {{CITY, PROVINCE, COUNTRY, REGION, UNK} * ORG_COUNTRY: NORMALIZED-COUNTRY-or-REGION | COUNTRY-or-REGION-STRING * <PERSON> := PER_NAME: "NAME" - PER_ALIAS: "ALIAS" * PER_TITLE: "TITLE" * AKT Workshop

  7. Example: A (Partially) FilledManagement Succession Event Template <TEMPLATE-9404130062> := DOC_NR: "9404130062" CONTENT: <SUCCESSION_EVENT-1> <SUCCESSION_EVENT-1> := SUCCESSION_ORG: <ORGANIZATION-1> POST: "executive vice president" IN_AND_OUT:<IN_AND_OUT-1> <IN_AND_OUT-2> VACANCY_REASON: OTH_UNK <IN_AND_OUT-1> :=<IN_AND_OUT-2> := IO_PERSON: <PERSON-1>IO_PERSON: <PERSON-2> NEW_STATUS: OUTNEW_STATUS: IN ON_THE_JOB: NOON_THE_JOB: NO OTHER_ORG: <ORGANIZATION-2> REL_OTHER_ORG: OUTSIDE_ORG <ORGANIZATION-1> :=<ORGANIZATION-2> := ORG_NAME: "Burns Fry Ltd.“ORG_NAME: "Merrill Lynch Canada Inc." ORG_ALIAS: "Burns Fry“ORG_ALIAS: "Merrill Lynch" ORG_DESCRIPTOR: "this brokerage firm“ORG_DESCRIPTOR: "a unit of Merrill Lynch & Co." ORG_TYPE: COMPANYORG_TYPE: COMPANY ORG_LOCALE: Toronto CITY ORG_COUNTRY: Canada <PERSON-1> := <PERSON-2> := PER_NAME: "Mark Kassirer" PER_NAME: "Donald Wright" PER_ALIAS: "Wright" PER_TITLE: "Mr." AKT Workshop

  8. Example: One Use for a Template - Generating a Summary • From the completely filled version of the preceding template the LaSIE system generates the following natural languagesummary: BURNS FRY Ltd. named Donald Wright as executive vice president. Donald Wright resigned as presidentof Merrill Lynch Canada Inc.. Mark Kassirer left as president ofBURNS FRY Ltd. • Producing summaries in other languages is relatively easy (compared to full machine translation). AKT Workshop

  9. Approach 2: Coreference Chains • To generate • generic • informative • extracts we have used coreference chains AKT Workshop

  10. Approach 2: Coreference Chains (cont) • Background: • Morris and Hirst (’94) investigated lexical chains – chains of lexically-related words in a text that serve to make texts cohere • Barzilay + Elhadad (’97) suggested using lexical chains as a basis for selecting sentences to form a summary – rank chains based on number of links + extent over text • Halliday and Hassan (’76) proposed coreference as another major factor contributing to coherence of NL texts • Idea: • Explore use of coreference chains to produce summaries AKT Workshop

  11. Approach 2: Coreference Chains (cont) • Technique • Use LaSIE to carry out discourse analysis of text, including coreference resolution • Extract all coreference chains • Rank chains by a metric which counts chain length + extent + starting point • Intuition: entities which occur most frequently and most widely in a text are those which the text is most “about” • Depending on desired summary length, select m sentences from top n chains • Details in Azzam, Humphreys and Gaizauskas ’99 AKT Workshop

  12. Approach 3: Statistical • To generate • generic • indicative • extracts we have used a stastical approachbased on a set of factors AKT Workshop

  13. Approach 3: Statistical (cont) • Factors which have been examined in selecting sentences for inclusion in extractive summaries include: • number of content words shared with title/headings (T) • presence of “cue words” (C) • location of sentence in text (L) • number of content words discriminative of current text as opposed to corpus of texts from which it is drawn, using, e.g. tf-idf measure (K) AKT Workshop

  14. Approach 3: Statistical (cont) • Assign a weight to each sentence according to a weighted linear combination of these factors • Learn weights to optimise sentence selection as measured against a corpus of extracts + texts • Select top ranked sentences up to desired summary length AKT Workshop

More Related