Hao-Chin Chang Department of Computer Science & Information Engineering

A progressive sentence selection strategy for document summarization You Ouyang , Wenjie Li , Renxian Zhang , Sujian Li , Qin Lu IPM 2013 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2013/03/05

Outline • Introduction • Methodology • Experiments and evaluation • Conclusion and future work

Introduction • Many studies • It is also well acknowledged that sentence selection strategies are very important, which mainly aim at reducing the redundancy among the selected sentences to enable them to cover more concepts. • Different from the existing methods, in our study we’d like to explore the idea of directly examining the uncovered part of the sentences for saliency estimation in order to maximize the coverage of the summary.

Introduction • To avoid the possible saliency problem, we make use of the subsuming relationship between sentences to improve the saliency measure. • The idea is to use the salient general concepts that are more significant to help discover the salient supporting concepts. • once we have selected a general word ‘‘school’’ in a sentence of the summary, we would like to select ‘‘student’’ or ‘‘teacher’’ in the next sentences. • Sentence A: the schools that have vigorous music programs tend to have higher academic performance. • Sentence B: among the lower-income students without music involvement, only 15.5% achieved high math scores. • when sentence A is selected, how much we want to include another sentence B to support the ideas in sentence A.

Identifying word relations • 1. linguistic relation databases such as WordNet • 2. frequency-based statistics such as co-occurrence or pointwise mutual information • In our study, the target is to study the subsuming relations between the words in the input documents. • the association of two words is defined by two conditions: • P(a|b) ≧0.8, P(b|a) < P(a|b) • Word a subsumes word b if the documents in which b occurs are a subset, or nearly a subset, or nearly a subset, of the documents in which a occurs.

Identifying word relations • Sentence-level coverage • sometimes a document set just consists of only a few documents • to get more available information, We intend to study the sentence-level co-occurrence statistics in our study • Set-based coverage • Sentence-level co-occurrence is sparser than document-level co-occurrence due to the shorter length of sentences. • we intend to examine the coverage not only between two words, but also between a word and a word set • two common phrases King Norodom , Prince Norodom • Norodom is almost entirely covered by the set {King , Prince }

Identifying word relations • Transitive reduction • The subsuming relation between two words also reflects the recommendation status between them. • three words a, b, c that satisfy a > b, b > c and a > c • (a > b denotes a subsuming b), • the long-term relationship a > c will be ignored • Spanned sentence set • a word w in a document set D, whose sentence set is denoted by SD, is defined as the set of the sentences that contain w

Identifying word relations • an existing non-empty word set • Concept coverage of a word w over W is devised to reflect to what extent w brings new information relative to the known information provided in W • COV(w) is defined as the proportion of the sentences in SPAN(w) that appear in SPAN(W) • The smaller the coverage is, the more likely w will bring new information to W

Identifying word relations • When comparing the word w to a former word w0 that already subsumes a set of words S to align a relation between w and w0 • two constraints(0-1)

The definition of the subsuming relationship • the word set of s as W = • the word set of s’ as W’ = • Connected Word • A word wi in W is regarded to be ‘‘connected’’ to a word w’j in W’ if it satisfies the condition • s.t. • directly connects wi is wl1 The weight of the edge COV(wi |wl1 ) • strength of the connection between wi and w’j CON(wi | w’j )

The definition of the subsuming relationship • Conditional Saliency (CS for short) of s to s’ is calculated as a weighted sum of the importance of all the ‘‘connected words’’ in s to s’

Progressive sentence selection strategy • It can be viewed as a random walking process on the DAG(directed acyclic graph) from the center to its surrounding nodes • we introduce a virtual word besides the real words that do appear in the input documents • The virtual word is used as the center of the DAG (denoted as ROOT-W). • we can view it as a virtual word that spans the whole sentence set so that it can perfectly cover any actual word.

Progressive sentence selection strategy • This virtual sentence ROOT-S is regarded as being already selected at the beginning of the sentence selection process. • The conditional saliency of a sentence to ROOT-S just indicates its ability of describing the general ideas of the input documents because the words attached to ROOT-W are the general words.

Progressive sentence selection strategy • The sentence selection process is cast as: • first adding ROOT-S to the initial summary • then iteratively adding the sentence that best supports the existing sentence(s) (denoted as Sold) • the score of each unselected sentence based on its conditional saliency to each selected sentence • This maximum saliency indicates how much supporting information • When different sentences contain the same ‘‘connected words’’, they have equal scores • we use two popular criteria, length and position, to obtain the final measure of the sentence score

Redundancy control by penalizing repetitive words • To ensure that the selected sentence always brings new concepts, a damping factor a is applied to the word importance during the sentence selection process • In the extreme case when equals 0, an effective ‘‘connected word’’ is required not to appear in any selected sentence

Experiments and evaluation • Document Understanding Conference (DUC) • The proposed summarization methods are first evaluated on a generic multi-document summarization data set • And then extended to several query-focused multi-document summarization data sets • we use the automatic evaluation toolkit ROUGE to evaluate the system summaries. • DUC 2004 generic multi-document summarization data set • 45 set multi-document • Each set consisting of 10 documents

Experiments and evaluation • The resulting summary tends to include morediverse words and thus stands a better chance to share more words with the reference summaries, which may lead to a higher ROUGE-1 score. • the ROUGE-2 score may decrease even more as it requires matching two continuous words • sequential system obtain the highest ROUGE-1 score with full penalty on repetitive words (a equals 0). However, the ROUGE-2 scores drop significantly • ROUGE-2 scores are obtained when a equals 0.5, we can observe that the dropping rate is much lower for the progressive system.

Experiments and evaluation • This clearly demonstrates the advantages of the progressive sentence selection strategy guarantees the novelty and saliency of the sentences

Experiments and evaluation • the damping factor is used to handle the redundancy issue • The reason is that it is more consistent with the word importance estimation method used in the systems and thus it is better in handling the redundancy for the system

Experiments and evaluation • is too small • many unrelated words may also be wrongly associated, which unavoidably impairs the reliability of the word relations and leads to the worse performance • is too large • the discovered word relations will be very limited and thus weaken the progressive system

Experiments and evaluation • 2005-2007 DUC query-focused multi-document summarization data set • The data set in each year contains about 50 topics • each topic consisting of 25–50 documents • system-generated summaries are strictly limited to 250 English words in length

Experiments and evaluation • It is also shown that incorporating the query to refine the word importance is effective for both the progressive system and the sequential system.

Conclusion and future work • In the process, a sentence can be selected either as a general sentence or as a supporting sentence. • The sentence relationship is used to improve the saliency of the supporting sentences. • A fact is that a single word alone is often insufficient to represent a complex concept and the sense of a word can be ambiguous in a document set. • In future work, we’d like to explore concept relations

Conclusion and future work • due to the limitation of the current natural language generation techniques, automatic summarization systems still cannot freely compose ideal sentences like human do. • In the future, we’d like to investigate other means to break the limitation of the original sentences, such as sentence compression or sentence fusion, which can generate additional candidate sentences in order to more accurately express the desired concepts

Speech summarization

Experiments Data • 實驗語料 List • ds2_all_list.txt • 100訓練語料List • ds2_all_list_test.txt • 105測試語料List • ds2_all_list_train.txt • 20篇測試語料List • test_difficult.txt • RMWRM 使用額外資訊的資料 • 2002_News_Content.txt.seg

Experiments Data • BGDATA • CNA0102.GT3-7.lm.wid • N-gram, Lmscore(10維底要換成e為底),LMWID,Backoffscore • 字典 • NTNULexicon2003-72K.txt • AcousticWID, LMWID, N-gram, 中文字 • 注音符號 • No tone syllableWID • tone syllableWID

Experiments Data • ROUGE字典 • RougeDict.txt • a1 a2 a3 • a (LMWID)

Sentence modeling • ULM KL

Sentence modeling • RM

Hao-Chin Chang Department of Computer Science & Information Engineering