Course overview introduction to summarization
Download
1 / 37

Course overview Introduction to summarization - PowerPoint PPT Presentation


  • 179 Views
  • Updated On :

Course overview Introduction to summarization . Lecture 1. Instructor: Ani Nenkova 505 Levine, [email protected] Office hours: Tuesdays 3:15—4:15 or by appointment TA: Annie Louis [email protected] Textbook. No required text

Related searches for Course overview Introduction to summarization

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Course overview Introduction to summarization' - foster


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Course overview introduction to summarization l.jpg

Course overviewIntroduction to summarization

Lecture 1


Slide2 l.jpg


Textbook l.jpg
Textbook

  • No required text

    • Slides/lecture notes and handouts will be given in class

  • Recommended

    • Speech and Language Processing (second edition, 2007, Prentice-Hall), by Daniel Jurafsky and James Martin

  • Also see

    • Christopher Manning and Hinrich Schutze, “Foundations of statistical natural language processing”

    • Advances in Automatic Text SummarizationEdited by Inderjeet Mani and Mark T. Maybury


Grading l.jpg
Grading

  • 5 homeworks (65%)

    • One will be a literature overview assignment

    • One will be at the end of the semester, instead of a final

  • You are encouraged to form teams for the homework (programming) assignments, but all write-ups should be individual

  • Midterm (20%)

  • Class participation (15%)

    • “Submit” 5 questions each week


Late submission policy l.jpg
Late submission policy

  • 5 late days for the semester

    • Can be used for any assignment with no penalty

  • Late submissions after “late days” have been used up will not be graded


What you will learn l.jpg
What you will learn

  • A lot about summarization and natural language techniques used in summarization

  • Tools and resources

    • Part of speech and named entity taggers, parsers, Wordnet, WEKA


Slide7 l.jpg


Slide8 l.jpg

  • Reading scientific articles

    • Part of the assigned readings

    • Useful skill, regardless of your future job plans

  • Improving writing skills

    • Immensely useful, regardless of your future job plans

    • The literature overview assignment will focus on this, but in other assignments the way you describe your work will also be evaluated



Columbia newsblaster l.jpg
Columbia Newsblaster

  • The academic version


What is the input l.jpg
What is the input?

  • News, or clusters of news

    • a single article or several articles on a related topic

  • Email and email thread

  • Scientific articles

  • Health information: patients and doctors

  • Meeting summarization

  • Video


What is the output l.jpg
What is the output

  • Keywords

  • Highlight information in the input

  • Chunks or speech directly from the input or paraphrase and aggregate the input in novel ways

  • Modality: text, speech, video, graphics


Ideal stages of summarization l.jpg
Ideal stages of summarization

  • Analysis

    • Input representation and understanding

  • Transformation

    • Selecting important content

  • Realization

    • Generating novel text corresponding to the gist of the input


Most current systems l.jpg
Most current systems

  • Use shallow analysis methods

    • Rather than full understanding

  • Work by sentence selection

    • Identify important sentences and piece them together to form a summary


Data driven approaches l.jpg
Data-driven approaches

  • Relying on features of the input documents that can be easily computes from statistical analysis

  • Word statistics

  • Cue phrases

  • Section headers

  • Sentence position


Knowledge based systems l.jpg
Knowledge-based systems

  • Use more sophisticated natural language processing

  • Discourse information

    • Resolve anaphora, text structure

  • Use external lexical resources

    • Wordnet, adjective polarity lists, opinion

  • Using machine learning


What are summaries useful for l.jpg
What are summaries useful for?

  • Relevance judgments

    • Does this document contain information I am interested in?

    • Is this document worth reading?

  • Save time

  • Reduce the need to consult the full document


Multi document summarization l.jpg
Multi-document summarization

  • Very useful for presenting and organizing search results

    • Many results are very similar, and grouping closely related documents helps cover more event facets

    • Summarizing similarities and differences between documents


Scientific article summarization l.jpg
Scientific article summarization

  • Not only what the article is about, but also how it relates to work it cites

  • Determine which approaches are criticized and which are supported

    • Automatic genre specific summaries are more useful than original paper abstracts


Other uses l.jpg
Other uses

  • Document indexing for information retrieval

  • Automatic essay grading, topic identification module



Frequency as indicator of importance l.jpg
Frequency as indicator of importance

  • The topic of a document will be repeated many times

  • In multi-document summarization, important content is repeated in different sources


Greedy frequency method l.jpg
Greedy frequency method

  • Compute word probability from input

  • Compute sentence weight as function of word probability

  • Pick best sentence


How to deal with redundancy l.jpg
How to deal with redundancy?

Author JK Rowling has won her legal battle in a New York court to get an unofficial Harry Potter encyclopaedia banned from publication.

A U.S. federal judge in Manhattan has sided with author J.K. Rowling and ruled against the publication of a Harry Potter encyclopedia created by a fan of the book series.

  • Shallow techniques not likely to work well


Global optimization for content selection l.jpg
Global optimization for content selection

  • What is the best summary? vs What is the best sentence?

  • Form all summaries and choose the best

    • What is the problem with this approach?


Sentence clustering for theme identification l.jpg
Sentence clustering for theme identification

1. PAL was devastated by a pilots' strike in June and

by the region's currency crisis.

2. In June, PAL was embroiled in a crippling three-week pilots' strike.

3. Tan wants to retain the 200 pilots because they stood by him when the majority of PAL's pilots staged a devastating strike in June.


Slide28 l.jpg


Using graph representations l.jpg
Using graph representations

  • Nodes

    • Sentences

    • Discourse entities

  • Arcs

    • Between similar sentences

    • Between related entities


Using machine learning l.jpg
Using machine learning

  • Ask people to select sentences

  • Use these as training examples for machine learning

    • Each sentence is represented as a number of features

    • Based on the features distinguish sentences that are appropriate for a summary and sentences that are not

  • Run on new inputs


Information ordering l.jpg
Information ordering

  • In what order to present the selected sentences?

    • An article with permuted sentences will not be easy to understand

  • Very important for multi-document summarization

    • Sentences coming from different documents


Automatic summary edits l.jpg
Automatic summary edits

  • Some expressions might not be appropriate in the new context

    • References:

      • he

      • Putin

      • Russian Prime Minister Vladimir Putin

  • Discourse connectives

    • However, moreover, subsequently

  • Requires more sophisticated NLP techniques


  • Before l.jpg
    Before

    Pinochet was placed under arrest in London Friday by

    British police acting on a warrant issued by a Spanish

    judge. Pinochet has immunity from prosecution in

    Chile as a senator-for-life under a new constitution that

    his government crafted. Pinochet was detained in the

    London clinic while recovering from back surgery.


    After l.jpg
    After

    Gen. Augusto Pinochet, the former Chilean dictator, was placed under arrest in London Friday by British police acting on a warrant issued by a Spanish judge. Pinochet has immunity from prosecution in Chile as a senator-for-life under a new constitution that his government crafted. Pinochet was detained in the London clinic while recovering from back surgery.


    Before35 l.jpg
    Before

    Turkey has been trying to form a new government since a coalition government led by Yilmaz collapsed last month over allegations that he rigged the sale of a bank. Ecevit refused even to consult with the leader of the Virtue Party during his efforts to form a government. Ecevit must now try to build a government. Demirel consulted Turkey's party leaders immediately after Ecevit gave up.


    After36 l.jpg
    After

    Turkey has been trying to form a new government since a coalition government led by Prime Minister Mesut Yilmaz collapsed last month over allegations that he rigged the sale of a bank. Premier-designate Bulent Ecevit refused even to consult with the leader of the Virtue Party during his efforts to form a government. Ecevit must now try to build a government. President Suleyman Demirel consulted Turkey's party leaders immediately after Ecevit gave up.


    ad