1 / 16

The use of unlabeled data to improve supervised learning for text summarization

The use of unlabeled data to improve supervised learning for text summarization. MR Amini, P Gallinari (SIGIR 2002). Slides prepared by Jon Elsas for the Semi-supervised NL Learning Reading Group. Presentation Outline. Overview of Document Summarization

goro
Download Presentation

The use of unlabeled data to improve supervised learning for text summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the Semi-supervised NL Learning Reading Group

  2. Presentation Outline • Overview of Document Summarization • Major contribution: Semi-Supervised Logistic Classification Maximum Likelihood summaries. • Evaluation • Baseline Systems • Results

  3. Document Summarization • Motivation: [text volume] >> [user’s time] • Single Document Summarization: • Used for display of search results, automatic ‘abstracting’, browsing, etc. • Multi-Document Summarization: • Describe clusters & document collections, QA, etc. • Problem: What is the summary used for? Does a generic summary exist?

  4. Single Document Summarization example

  5. Document Summarization • Generative Summaries: • Synthetic text produced after analysis of high level linguistic features: discourse, semantics, etc. • Hard. • Extract Summaries: • Text excerpts (usually sentences) composed together to create summary • Boils down to a passage classification/ranking problem

  6. Major Contribution • Semi-supervised Logistic Classifying Expectation Maximization (CEM) for passage classification • Advantage over other methods: • Works on small set of labeled data + large set of unlabeled data • No modeling assumptions for density estimation • Cons: • (probably) slow; no performance numbers given

  7. Expectation Maximization (EM) • Finds maximum likelihood estimates of parameters when underlying distribution depends on unobserved latent variables. • Maximizes model fit to data distribution • Criterion function:

  8. Classifying EM (CEM) • Like EM, with the addition of an indicator variable for component membership. • Maximizes ‘quality’ of clustering • Criterion function:

  9. Semi-supervised generative-CEM • Fix component membership for labeled data. • Criterion function: Labeled Data Unlabeled Data

  10. Semi-supervised logistic-CEM • Use discriminative classifier (logistic) instead of generative. • M-step, need to re-do gradient descent to estimate β’s Labeled Data Unlabeled Data

  11. Evaluation • Algorithm evaluated against 3 other single-document summarization algorithms • Non-trainable System: passage ranking • Trainable System: Naïve Bayes sentence classifier • Generative-CEM (using full Gaussians) • Precision/Recall with regard to gold-standard extract summaries • The fine print: • All systems used *similar* representation schemes, but not the same…

  12. Baseline System: Sentence Ranking • Rank sentences, using a TF-IDF similarity measure with query expansion (Sim2) • Blind-relevance feedback from the top sentences • WordNet similarity thesaurus • Generic query created with the most frequent words in the training set.

  13. Naïve Bayes Model: Sentence Classification Simple Naïve Bayes classifier trained on 5 features: • Sentence length < tlength {0,1} • Sentence contains ‘cue words’ {0,1} • Sentence query similarity (Sim2) > tsim {0,1} • Upper-case/Acronym features (count?) • Sentence/paragraph position in text {1, 2, 3}

  14. Logistic-CEM: Sentence Representation Features Features used to train Logistic-CEM: • Normalized sentence length [0, 1] • Normalized ‘cue word’ frequency [0, 1] • Sentence Query Similarity (Sim2) [0, ∞) • Normalized acronym frequency [0, 1] • Sentence/paragraph position in text {1, 2, 3} (All of the binary features converted to continuous.)

  15. Results on Reuters dataset

  16. Results on Reuters dataset

More Related