a summarization journey n.
Skip this Video
Loading SlideShow in 5 Seconds..
A summarization Journey PowerPoint Presentation
Download Presentation
A summarization Journey

Loading in 2 Seconds...

play fullscreen
1 / 29

A summarization Journey - PowerPoint PPT Presentation

  • Uploaded on

A summarization Journey . Search and Information Extraction Lab IIIT Hyderabad. Information Overload. Explosive growth of information on web Failure of information retrieval systems to satisfy user’s information need. Need for sophisticated information access solutions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A summarization Journey' - ziarre

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a summarization journey
A summarization Journey

Search and Information Extraction Lab

IIIT Hyderabad

information overload
Information Overload

Explosive growth of information on web

Failure of information retrieval systems to

satisfy user’s information need

Need for sophisticated information access



Summary is a condensed version of source document(s) having a recognizable genre : to give the reader an exact and concise idea of the contents of the source.

flavors of summarization
Flavors of Summarization




Query Focused


Opinion/ Sentiment

Single document


extract vs abstract
Extract Vs. Abstract


An extract is a summary consisting of entirely of material from the input text


An abstract is a summary at least some of whose material is not present in the input.

eg. paraphrases of content, subject of categories

towards abstraction
Towards Abstraction


  • Guided Summarization
  • Code Summarization
  • Comparison Summarization

Blog summarization

Progressive Summarization

Personalized ,

Cross Lingual


Single Document,

Query Focused Multi Document


query focused summarization
Query Focused Summarization
  • Documents should be ranked in order of probability of relevance to the request or information need, as calculated from whatever evidence is available to the system
  • Query Dependent ranking: Relevance Based Language models
    • Language models (PHAL)
  • Query Independent ranking: Sentence Prior

RBLM is an IR approach that computes the conditional probabilities of relevance from document and query

  • PHAL- probabilistic extension to HAL spaces
  • HAL constructs dependencies of a term w on other terms based on their occurrence in its context in the corpus

DUC Peformance


38 systems participated in 2006

Significant difference between first two systems

extract vs abstract summarization
Extract vs. Abstract Summarization
  • We conducted a study (post TAC 2006)
    • Generated best possible extracts
    • Calculated the scores for these extracts
  • Evaluation with respect to the reference summaries
cross lingual summarization1
Cross Lingual Summarization
  • A bridge between CLIR and MT
  • Extended our mono-lingual summarization framework to a cross-lingual setting in RBLM framework
  • Designed a cross-lingual experimental setup using DUC 2005 dataset
  • Experiments were conducted for Telugu-English language pair
  • Comparison with mono-lingual baseline shows about 90% performance in ROUGE-SU4 and about 85% in ROUGE-2 f-measures
progressive summarization
Progressive Summarization
              • Emerging area of research in summarization
  • Summarization with a sense of prior knowledge
  • Introduced as “Update Summarization” at DUC 2007, TAC 2008, TAC 2009
      • Generate a short summary of a set of newswire articles, under the assumption that the user has already read a given set of earlier articles.
  • To keep track of temporal news stories
key challenge
Key challenge
  • To detect information that is not only relevant but also new given the prior knowledge of reader
    • Relevant and new Vs
    • Non-Relevant and new Vs
    • Relevant and redundant
three level approach to novelty detection
Three level approach to Novelty Detection

Sentence Scoring

  • Developing new features that capture novelty along with relevance of a sentence
  • NF, NW


  • Sentences are re ranked based on the amount of novelty it contains
  • ITSim, CoSim

Summary Generation

  • A selected pool of sentences that contain novel facts. All remaining sentences are filtered out
  • TAC 2008 Update Summarization data for training: 48 topics
    • Each topic divided into A, B with 10 documents
    • Summary for cluster A is normal summary and cluster B is update summary
  • TAC 2009 update Summarization for testing: 44 topics
  • Baseline summarizer generates summary by picking first 100 words of last document
  • Run1 – DFS + SL1
  • Run2 – PHAL + KL
personalized summarization
Personalized Summarization
  • Perception of text differs with background of the reader
  • Need of incorporating user background in the summarization process
  • Summarization not only a function of input text but also the reader

Web-based profile creation: Personal information available on web- a conference page, a project page, an online paper, or even in a Weblog.

Estimate Model P(w/Mu) to incorporate user in sentence extraction process

opinion summarization
Opinion summarization

Sentiment Analysis

  • User-generated-content is growing rapidly through blogs
  • Sentiment analysis provides better access to information


  • Textual information on the Web can be categorized as facts and opinions
  • Computational study of opinions, sentiments in market perspective

Optimization of sentiment in the summary to the maximum extent

Sentiment summarization as a two stage classification problem at sentence level

Polarity Estimation

  • Opinion/fact
  • Positive/Negative
comparative summarization
Comparative summarization
  • Summaries for comparing multiples items belonging to a category
    • Category of “Mobile phones“ will have “Nokia”, “Black berry’ as its items
  • Comparative summaries provide the properties or facts common to these items and their corresponding values with respect to each item.
    • “Memory”, “Display”, “Battery Life”,


Battery Life

comparative summaries generation
Comparative Summaries Generation
  • Attribute Extraction
    • Find the attributes of the product class
  • Attribute Ranking
    • Rank the attributes according to importance in comparison
  • Summary Generation
    • Find the occurrence of attributes in various products
guided summarization
Guided Summarization
  • Query Focused Summarization
    • User’s information need expressed as a query along with a narrative
    • Set of documents related to the topic
    • Goal is to produce a shot coherent summary focusing answer to the query
  • Guided Summarization
    • Each topic is classified into a set of predefined categories
    • Each category has a template of important aspects about the topic
    • Summary is expected to answer all the aspects of template while containing other relevant information
guided summarization1
Guided summarization
  • Encourage deeper linguistic and semantic analysis of the source documents instead of relying only on document word frequencies to select important concepts
  • Shares similarity with information extraction
    • Specific information from unstructured text is identified and consequently classified into a set of semantic labels (templates)
    • Makes information more suitable for other information processing tasks
  • A guided summarization system has to produce a readable summary encompassing all the information about the templates
  • Very few investigations exploring the potential of merging summarization with information extraction techniques
our approach
Our approach
  • Building a domain model
    • Essential background knowledge for information extraction
  • Sentence Annotations
    • To identify sentences having answers to aspects of template
  • Concept Mining
    • To use semantic concepts instead of words to calculate sentence importance
  • Summary Extraction
    • Modification of summary extraction algorithm to adapt to the requirements using sentence annotations