1 / 8

Automatic Summarization

Automatic Summarization. Student: David Kent Professor: Dr. Rakesh Verma. Types of Summarization. Late for a meeting Paying by the letter War. Single Document Multi-Document Corpus-Based Update Summarization. Motivation. Human Summary Strategies. Low-Level Deletion Copying

sunila
Download Presentation

Automatic Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Summarization Student: David Kent Professor: Dr. RakeshVerma

  2. Types of Summarization • Late for a meeting • Paying by the letter • War • Single Document • Multi-Document • Corpus-Based • Update Summarization Motivation

  3. Human Summary Strategies • Low-Level • Deletion • Copying • Mid-Level • List substitution • Paragraph summary • Higher-Level • Multi-paragraph abstraction • Summary Structure

  4. Problems • Understanding • Machine Learning • Mimicking understanding • Bag of Words Model • Vector Analysis and Clustering • TextRank • Partial Linguistic Understanding • WordNet • FrameNet

  5. Simple Metrics • Titles and headings • Sentence location • Cue Words/Key Words • Term Frequency . Inverse Document Frequency Less Simple Metrics • Vector Analysis • Clustering of words and/or sentences • Use of Lexical Databases (WordNet, FrameNet, etc.)

  6. Our Technique • Extraction-based • Corpus-free with a mélange of low to mid-level lexical techniques • Relationship to Headings • Sample Summary:

  7. WordNet • Developed at Princeton, hand-built by lexicographers • Every word defined by text, part of speech, and a sense number. • Basic unit of organization: Synonym Set (Synset) • Four forests: Nouns, Verbs, Adjectives, and Adverbs

  8. Parsing Text • Stanford Part of Speech tagger. • Used to determine if a word is a noun, verb, adjective, or adverb. • Stanford Named Entity Recognizer • Tags names, locations, and organizations (proper nouns). • SenseLearner • Determines which WordNet sense is most appropriate.

More Related