1 / 31

I256: Applied Natural Language Processing

I256: Applied Natural Language Processing. Marti Hearst Oct 2, 2006. Contents. Introduction and Applications Types of summarization tasks Basic paradigms Single document summarization Evaluation methods. Introduction. The problem – Information overload 4 Billion URLs indexed by Google

azia
Download Presentation

I256: Applied Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006

  2. Contents • Introduction and Applications • Types of summarization tasks • Basic paradigms • Single document summarization • Evaluation methods From lecture notes by Nachum Dershowitz & Dan Cohen

  3. Introduction • The problem – Information overload • 4 Billion URLs indexed by Google • 200 TB of data on the Web [Lyman and Varian 03] • Information is created every day in enormous amounts • One solution – summarization • Abstracts promote current awareness • save reading time • facilitate selection • facilitate literature searches • aid in the preparation of reviews • But what is an abstract?? From lecture notes by Nachum Dershowitz & Dan Cohen

  4. Introduction • abstract: • brief but accurate representation of the contents of a document • goal: • take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs. • compression: • the amount of text to present or the length of the summary to the length of the source. From lecture notes by Nachum Dershowitz & Dan Cohen

  5. History • The problem has been addressed since the 50’s[Luhn 58] • Numerous methods are currently being suggested • Most methods still rely on 50’s-70’s algorithms • Problem is still hard yet there are some applications: • MS Word, • www.newsinessence.com by Drago Radev’s research group From lecture notes by Nachum Dershowitz & Dan Cohen

  6. From lecture notes by Nachum Dershowitz & Dan Cohen

  7. MSWord AutoSummarize From lecture notes by Nachum Dershowitz & Dan Cohen

  8. Applications • Abstracts for Scientific and other articles • News summarization (mostly multiple document summarization) • Classification of articles and other written data • Web pages for search engines • Web access from PDAs, Cell phones • Question answering and data gathering From lecture notes by Nachum Dershowitz & Dan Cohen

  9. Types of Summaries • Indicative vs Informative • Informative: a substitute for the entire document • Indicative: give an idea of what is there • Background • Does the reader have the needed prior knowledge? • Expert reader vs Novice reader • Query based or General • Query based – a form is being filled, answers should be answered • General – General purpose summarization From lecture notes by Nachum Dershowitz & Dan Cohen

  10. Types ofSummaries (input) • Single document vs multiple documents • Domain specific (chemistry) or general • Genre specific (newspaper items) of general From lecture notes by Nachum Dershowitz & Dan Cohen

  11. Types of Summaries (output) • extract vs abstract • Extracts – representative paragraphs/sentences/ phrases/words, fragments of the original text • Abstracts – a concise summary of the central subjects in the document. • Research shows that sometimes readers prefer Extracts! • language chosen for summarization • format of the resulting summary (table/paragraph/key words) From lecture notes by Nachum Dershowitz & Dan Cohen

  12. Methods • Quantitative heuristics, manually scored • Machine-learning based statistical scoring methods • Higher semantic/syntactic structures • Network (graph) based methods • Other methods (rhetorical analysis, lexical chains, co-reference chains) • AI methods From lecture notes by Nachum Dershowitz & Dan Cohen

  13. Quantitative Heuristics • General method: • score each entity (sentence, word) ; combine scores; choose best sentence(s) • Scoring techniques: • Word frequencies throughout the text (Luhn 58) • Position in the text (Edmunson 69, Lin&Hovy 97) • Title method (Edmunson 69) • Cue phrases in sentences (Edmunson 69) From lecture notes by Nachum Dershowitz & Dan Cohen

  14. Using Word Frequencies (Luhn 58) • Very first work in automated summarization • Assumptions: • Frequent words indicate the topic • Frequent means with reference to the corpus frequency • Clusters of frequent words indicate summarizing sentence • Stemming based on similar prefix characters • Very common words and very rare words are ignored From lecture notes by Nachum Dershowitz & Dan Cohen

  15. Ranked Word Frequency Zipf’s curve

  16. Word frequencies (Luhn 58) • Find consecutive sequences of high-weight keywords • Allow a certain number of gaps of low-weight terms • Sentences with highest sum of cluster weights are chosen From lecture notes by Nachum Dershowitz & Dan Cohen

  17. Position in the text (Edmunson 69) • Claim : Important sentences occur in specific positions • “lead-based” summary • inverse of position in document works well for the “news” • Important information occurs in specific sections of the document (introduction/conclusion) From lecture notes by Nachum Dershowitz & Dan Cohen

  18. Title method (Edmunson 69) • Claim : title of document indicates its content • Unless editors are being cute • Not true for novels usually • What about blogs …? • words in title help find relevant content • create a list of title words, remove “stop words” • Use those as keywords in order to find important sentences (for example with Luhn’s methods) From lecture notes by Nachum Dershowitz & Dan Cohen

  19. Cue phrases method (Edmunson 69) • Claim : Important sentences contain cue words/indicative phrases • “The main aim of the present paper is to describe…” (IND) • “The purpose of this article is to review…” (IND) • “In this report, we outline…” (IND) • “Our investigation has shown that…” (INF) • Some words are considered bonus others stigma • bonus: comparatives, superlatives, conclusive expressions, etc. • stigma: negatives, pronouns, etc. From lecture notes by Nachum Dershowitz & Dan Cohen

  20. Feature combination (Edmundson ’69) • Linear contribution of 4 features • title, cue, keyword, position • the weights are adjusted using training data with any minimization technique • Evaluated on a corpus of 200 chemistry articles • Length ranged from 100 to 3900 words • Judges were told to extract 25% of the sentences, to maximize coherence, minimize redundancy. • Features • Position (sensitive to types of headings for sections) • cue • title • keyword • Best results obtained with: • cue + title + position From lecture notes by Nachum Dershowitz & Dan Cohen

  21. Statistical learning method Feature set sentence length |S| > 5 fixed phrases 26 manually chosen paragraph sentence position in paragraph thematic words binary: whether sentence is included in manual extract uppercase words not common acronyms Corpus 188 document + summary pairs from scientific journals Bayesian Classifier (Kupiec at el 95) From lecture notes by Nachum Dershowitz & Dan Cohen

  22. Bayesian Classifier (Kupiec at el 95) • Uses Bayesian classifier: • Assuming statistical independence: From lecture notes by Nachum Dershowitz & Dan Cohen

  23. Bayesian Classifier (Kupiec at el 95) • Each Probability is calculated empirically from a corpus • Higher probability sentences are chosed to be in the summary • Performance: • For 25% summaries, 84% precision From lecture notes by Nachum Dershowitz & Dan Cohen

  24. Evaluation methods • When a manual summary is available: 1. choose a granularity (clause; sentence; paragraph), 2. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match), 3. measure the similarity of each unit in the new to the most similar unit(s) 4. measure Recall and Precision. • Otherwise 1. Intrinsic –how good is the summary as a summary? 2. Extrinsic – how well does the summary help the user? From lecture notes by Nachum Dershowitz & Dan Cohen

  25. Intrinsic measures • Intrinsic measures (glass-box): how good is the summary as a summary? • Problem: how do you measure the goodness of a summary? • Studies: compare to ideal (Edmundson, 69; Kupiec et al., 95; Salton et al., 97; Marcu, 97) or supply criteria—fluency, informativeness, coverage, etc. (Brandow et al., 95). • Summary evaluated on its own or comparing it with the source • Is the text cohesive and coherent? • Does it contain the main topics of the document? • Are important topics omitted? From lecture notes by Nachum Dershowitz & Dan Cohen

  26. Extrinsic measures • (Black box): how well does the summary help a user with a task? • Problem: does summary quality correlate with performance? • Studies: GMAT tests (Morris et al., 92); news analysis (Miike et al. 94); IR (Mani and Bloedorn, 97); text categorization (SUMMAC 98; Sundheim, 98). • Evaluation in an specific task • Can the summary be used instead of the document? • Can the document be classified by reading the summary? • Can we answer questions by reading the summary? From lecture notes by Nachum Dershowitz & Dan Cohen

  27. The Document Understanding Conference (DUC) • This is really the Text Summarization Competition • Started in 2001 • Task and Evaluation (for 2001-2004): • Various target sizes were used (10-400 words) • Both single and multiple-document summaries assessed • Summaries were manually judged for both content and readability. • Each peer (human or automatic) summary was compared against a single model summary • using SEE (http://www.isi.edu/ cyl/SEE/) • estimates the percentage of information in the model thatwas covered in the peer. • Also used ROUGE (Lin ’04) in 2004 • Recall-Oriented Understudy for Gisting Evaluation • Uses counts of n-gram overlap between candidate and gold-standard summary, assumes fixed-length summaries

  28. The Document Understanding Conference (DUC) • Made a big change in 2005 • Extrinsic evaluation proposed but rejected (write a natural disaster summary) • Instead: a complex question-focused summarization task that required summarizers to piece together information from multiple documents to answer a question or set of questions as posed in a DUC topic. • Also indicated a desired granularity of information

  29. The Document Understanding Conference (DUC) • Evaluation metrics for new task: • Grammaticality • Non-redundancy • Referential clarity • Focus • Structure and Coherence • Responsiveness (content-based evaluation) • This was a difficult task to do well in.

  30. Let’s make a summarizer! • Each person (or pair) write code for one small part of the problem, using Kupiec et al’s method. • We’ll combine the parts in class.

  31. Next Time • More on Bayesian classification • Other summarization approaches (Marcu paper) • Multi-document summarization (Goldstein et al. paper) • In-class summarizer!

More Related