1 / 18

Multiple Document Summarization using Principle Component Analysis incorporating Semantic Vector Space Model

Multiple Document Summarization using Principle Component Analysis incorporating Semantic Vector Space Model. Presenter Suhan Yu. Introduction. The ‘information content’ of a document can be measured by the relationship between the document and a corpus of related documents.

madra
Download Presentation

Multiple Document Summarization using Principle Component Analysis incorporating Semantic Vector Space Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Document Summarization using Principle Component Analysis incorporating Semantic Vector Space Model Presenter Suhan Yu

  2. Introduction • The ‘information content’ of a document can be measured by the relationship between the document and a corpus of related documents. • Multiple Document Summarization System: • Find the common topics in a corpus by matching sentences that are saying different things about the same topic.

  3. Introduction Statistical Vector Space Model Action Word Classifier Wordnet Action words Objects Semantic Vector Space Model PCA 1.Sentence length cut-off feature 2.Position feature 3.Keyword weight Score sentence

  4. Introduction • Analysis single document summarization: • Kupeic et. al • Estimate the probability • Analysis multiple document summarization: • Regina Barzilay et. al , Dragomir D. Radev et. al • Summarize multiple document on the same topic. • Trying to match sentences of same meaning to align multiple documents

  5. m … n Statistical VSM construction • Define each unique word as a feature, terms are assumed to be independent. • Give a weight to each feature: • Cue-phrase Keyword • Topic Keyword • Term frequency

  6. Semantic VSM construction • Using WordNet to form. • WordNet: http://www.globalwordnet.org/ • Online lexical reference system in which English nouns, verbs, adjectives and adverbs are organized into synonym sets or synsets. • With the help of WordNet, we can easily classify the word vector which belongs to ACTION class. • Knowledgebase (KB) • Seed wordlist which belongs to appearance or disappearance. • A seed wordlist set. • appearance or disappearance words such as • Destruction • Broke

  7. Identification of Action Words • Discriminate if the word is ACTION word or not. Input: T={t1,t2,…,tn} appearance Seed wordlist WORDNET disappearance yes yes action Action Word

  8. Identification of Action Words • For example: • Determine ‘devastation’ as action word. • From WordNet, following meaning obtained • Desolation: an event that results in total destruction • Ravaging • Destruction • From Desolation and Destruction meaning, it clearly lies in the phenomenon of appear/disappear. • devastation is a action word, and append devastation in the wordlist.

  9. Finding the Objects of the Action • Find the objects • The Objects are the nearest Nouns or Adjectives for the Action. • Using POS Tagger to find. http://ilk.uvt.nl/~zavrel/tagtest.html

  10. Classification of Contextual Words • Contextual words: defined as those action words which applied to the important objects. Term frequency

  11. Classification of Contextual Words

  12. Classification of Contextual Words

  13. Classification of Contextual Words

  14. Principal Component Analysis • Using SVD to carry out PCA. m … n

  15. Sentence Extraction • Consider following features: • Sentence-Length Cut off Feature: consider sentences which greater than 4 words. • Position Feature: consider the sentence is in the initial, middle or final of the document. • Keywords: set some keywords, then count how many keywords present. • Upper Case Feature: Sentence containing upper case words has been given additional weight.

  16. Single Document Summary • Compare with MS Word Summarizer and Gnome Summarizer

  17. Multiple Documents Summary

  18. Conclusion and Future work • Semantic VSM is better than Statistical VSM. • Rearrangement of Extracted Sentences in case of Multiple Documents Summarization to form effective summary. • Enhance Flexibility of system to generate summary of multiple documents not necessarily belonging to same topic. • Develop better methodology to incorporate the ACTION word score onto Statistical VSM. • Evaluation of System on large Sample of Data.

More Related