1 / 16

Hao-Chin Chang Department of Computer Science & Information Engineering

Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling Zhengchen Zhang , Shuzhi Sam Ge , Hongsheng He IPM2012. Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/03/09.

duncan
Download Presentation

Hao-Chin Chang Department of Computer Science & Information Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytellingZhengchen Zhang , Shuzhi Sam Ge , Hongsheng HeIPM2012 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/03/09

  2. Outline • Introduction • Sentence ranking using embedded graph based sentence clustering • Document modeling • Embedded graph based sentence clustering • Mutual-reinforcement ranking algorithm • Experiment • Conclusion and Future work

  3. Introduction • There are three phrases in the framework • Document modeling • document is modeled by a weighted graph with vertexes that represent sentences of the document • Sentence clustering • The sentences are clustered into different groups to find the latent topics in the story. To alleviate the influence of unrelated sentences in clustering, • Sentence ranking • An embedding process is employed to optimize the document model • We propose a framework which considers the mutual effect between clusters, sentences and terms instead of the relationship between documents, sentences, and terms to employ the cluster level information

  4. Introduction • The sentences are then clustered into different groups according to the distance between two sentences • In order to alleviate the influence of unrelated sentences in sentence clustering, we employ an embedding process to optimize the graph vertexes • The contributions of this paper are summarized • An embedded graph based sentence clustering method is proposed for sentence grouping of a document, which is robust with respect to different cluster numbers • An iterative ranking method is presented which considers the mutual-reinforcement between terms, sentences and sentence clusters • A document summarization framework considering sentence cluster information is proposed and the framework is evaluated using DUC data sets.

  5. Sentence ranking using embedded graph based sentence clustering • To reduce the influence of the lowcosinesimilarity weights and to enhancesentence clustering performance, an embedding algorithm is performed on the graph • the sentences are ranked according to the mutual effects between sentences, termsand clusters based on the assumptions • A sentence is assigned a high rank if it is similar to many high ranking clustersand it contains many high ranking terms • The rank of a cluster is high if it contains many high ranking sentences andmany high ranking terms • The rank of a term is high if it appears in many high ranking sentences and clusters.

  6. Document modeling • The weight wij of an edge denotes the distance between sentences si and sj which is a cosinesimilarity between the vectors of two sentences

  7. Embedded graph based sentence clustering • To alleviate the influence of unrelated sentences,we embed the original matrix D of a document into lower dimension space inspired by Locally Linear Embedding(LLE) • Asentence di which is a column vector of D is expressed as a linear combination of its ni mostsimilar sentences dj • where i is the set of sentences most similar to di

  8. Embedded graph based sentence clustering • In graph embedding , we minimize the following cost function of approximation error to determinethe optimal weight matrix rij • Wi =[ri1, …, rin] are the weights connecting di to its neighbors • Partial derivatives with respect to each weight rij • Wi is found by solving the equations

  9. Embedded graph based sentence clustering • The vectors of embedded sentences with enhanced relationship di are obtained by minimizing the cost function • While the embedding operation keeps the relationship between a sentence and a set of neighbors • The performance of embedding operation will improve if there are more points in the graph

  10. Mutual-reinforcement ranking algorithm • To employ the cluster-level information and the latent theme information in the clusters for document summarization • D is the sentence–term matrix of the document • where cluster cj is represented by a vector which is summary of the vectors of all the sentences in this cluster • The weight of the edge connecting term tl and cluster cj

  11. Mutual-reinforcement ranking algorithm • r(si) is the rank of sentencesi, r(cj) is the rank of cluster cj, and r(tl) is the rank of term tl • The sentence with the highest score is selected as a part of the summarization

  12. Experiment

  13. Experiment • KM and agglomerative (AGG) clustering • MREG algorithm which is named KM-TSC-EM • mutual-reinforcement between sentences and clusters (KM-SC) • TSC is short for Term, Sentence and Cluster • EM is short for Embedded Graph based sentence clustering

  14. Experiment • a = 0.25 b = 0.25 and c = 0.50 • a = 0.50, b = 0.50,c = 1.0 • a = 0.750, b = 1.0,c = 1.0 ROUGE-1 scores 0.31841 • a = 0.750, b = 1.0,c = 0.50 ROUGE-1 scores 0.31769

  15. Conclusion • The sentences were clustered into different groups, and an embedding process was employed to reduce the effect of unrelated sentences and to enhance the sentence clustering performance. • Performance comparison of different combinations of components illustrated that the algorithm improved system performance and was more robust with respect to different cluster numbers. • Computer Speech & Language 2012 Camille Guinaudeau , Guillaume Gravier , Pascale Sébillo Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation • Confidence score

  16. 摘要示意圖 S1 S2 S3 … … Sn 訓練文本語料 初次檢索系統KL(S||D) 相關特徵 詞頻特徵 上下文特徵 一般化 收集 語料庫 生成摘要 KL(D||S)

More Related