1 / 13

Headline Generation Based on Statistical Translation

Headline Generation Based on Statistical Translation. Michele Banko Computer Science Department Johns Hopkins University. Vibhu O.Mittal Just Research. Michael J. Witbrock Lycos Inc. ACL2000. 報告人 : 翁鴻加. Abstract. Extractive approach can’t generate document

garren
Download Presentation

Headline Generation Based on Statistical Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Headline Generation Based on Statistical Translation Michele Banko Computer Science Department Johns Hopkins University Vibhu O.Mittal Just Research Michael J. Witbrock Lycos Inc. ACL2000 報告人:翁鴻加

  2. Abstract • Extractive approach can’t generate document • summaries shorter than one sentence • Non-extractive approach : statistical models of • term selection • Actual headline often ungrammatical and • incomplete phrase

  3. Introduction • Generating effective summaries requires the • ability to select, evaluate, order and aggregate • item of information according to subject • Previous work has focused on extractive • summarization • drawback: • 1.inability to generate coherent summaries shorter than considering context-span • 2.most important information scattered across multiple sentences • 3.tend to select long sentence

  4. The System • Content Selection • Generate summary • 1.Length of summaries:fixed • length based on document • genre • 2.Coherently ordered summary • from content selected

  5. The System • Assumption : likelihood of a word in summary is • independent of other words in the summary • =>initial modeling choice

  6. The System • Bigram instead of n-gram • Model : zero-level, Cross-validation is used to • learn weight

  7. Experiments <HEADLINE> U.S. Pushes for Mideast Peace </HEADLINE> President Clinton met with his top Mideast advisers, including Secretary of State Madeleine Albright and U.S. peace envoy Dennis Ross, in preparation for a session with Israel Prime Minister Benjamin Netanyahu tomorrow. Palestinian leader Yasser Arafat is to meet with Clinton later this week. Published reports in Israel say Netanyahu will warn Clinton that Israel can’t withdraw from more than nine percent of the West Bank in its next scheduled pullback, although Clinton wants a 12-15 percent pullback. 3 Clinton netanyahu arafat 4 Clinton to mideast peace 5 Clinton to meet netanyahu arafat 6 Clinton to meet Netanyahu Arafat Israel

  8. Experiment • Corpus : 25000 news articles from Reuters • between 1/1/1997 ~ 1/6/1997 • Strip punctuation except apostrophes • 44000 unique tokens in the article • 15000 tokens in the headline • All pairwise conditional probability added • complexity : limited vocabulary

  9. Experiments • Lack of sufficient training data • Lexical • model • 1000 unseen • documents

  10. Multiple Selection Models : POS and Position • Part of speech information : • learn which word-senses are more likely to • be part of headline and coherently order • Position information : • estimating the probability of a token appea- • ring in the headline given that it appeared in • the 1st, 2st , 3st , 4st quartile of the body of • the article

  11. Experiments Overlap with headline

  12. Some “equally good” generated headlines count as error

  13. Conclusion and Future Work • This paper has presented an approach to make it • possible to generate coherent summaries shorter • than a single sentence • With slight generalization of the system, the • summaries need not contain any of the words in • original document • Given good corpora, this approach used in • Japanese documents and English headline

More Related