Automatic sentence compression in the musa project
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Automatic Sentence Compression in the MUSA project PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Automatic Sentence Compression in the MUSA project. Walter Daelemans & Anja Höthker [email protected] http://cnts.uia.ac.be CNTS, University of Antwerp, Belgium Languages & The Media 2004, Berlin. MUSA. MU ltilingual S ubtitling of multimedi A content

Download Presentation

Automatic Sentence Compression in the MUSA project

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automatic sentence compression in the musa project

Automatic Sentence Compression in the MUSA project

Walter Daelemans & Anja Höthker

[email protected]

http://cnts.uia.ac.be

CNTS, University of Antwerp, Belgium

Languages & The Media 2004, Berlin


Automatic sentence compression in the musa project

MUSA

  • MUltilingual Subtitling of multimediA content

    EU IST 5th framework, Sep. 2002 - Feb. 2005

  • Goals

    • Conversion of audio streams into TV subtitles (monolingual)

    • Translation of subtitles into French or Greek


Partners

Partners

  • ILSP, Athens: coordination, integration

  • ESAT, KU Leuven: Automatic Speech Recognition

  • CNTS, U. Antwerp: Sentence compression

  • Systran, Paris: Machine Translation

  • BBC, London: Main User, Data provider, Evaluation

  • Lumiere, Athens: Main User, Multilingual Data Provider, Evaluation


Goals for sentence compression

Goals for Sentence Compression

  • Automatically and dynamically generate subtitles based on constraints (words and characters)

  • Reduce the time needed for producing subtitles by expert subtitler

  • Provide an architecture that can easily be ported to other languages


Example

Example

  • SPEECH:

    The task force is in place and ready to attack without mercy.

  • Constraints:

    Delete 3 words and 14 characters

  • Compression Module output:

    The task force is [ in place and ] ready to fight[ without mercy ] .

  • SUBTITLE:

    The task force is ready ...

    ...to fight without mercy.


Approach

Approach

  • Remove disfluencies: compress sentence by removing repetitions introduced by hesitation

    I, Iknow that this war, this warwill last for years

  • Paraphrasing: replace part of the input sentence by shorter paraphrase

    an increasing number of  more and more

  • Rule-Based Approach: compress sentences based on handcrafted deletion rules that combine:

    • Shallow-parsing information (identifying constituents used by deletion rules)

    • Relevance measures (determine in which order to delete constituents)


Shallow parsing pos tagging

Shallow Parsing: POS Tagging

The/Det woman/NN will/MD give/VB Mary/NNP a/Det book/NN


Shallow parsing chunking

Shallow Parsing: Chunking

[The/Det woman/NN]NP[will/MD give/VB]VP[Mary/NNP]NP[a/Det book/NN]NP


Shallow parsing sense tagging

Shallow parsing: Sense Tagging

[The/Det woman/NN]NP-PERSON[will/MD give/VB]VP[Mary/NNP]NP-PERSON[a/Det book/NN]NP-MATERIAL-OBJECT


Shallow parsing relation finding

Shallow Parsing: Relation Finding

person

material-object

person


Automatic sentence compression in the musa project

MBSP

(Perl)

Text In

Tokenizer

(Perl)

MBT server

POS Tagger

TiMBL server

Known words

TiMBL server

Relation Finder

TiMBL server

Unknown words

MBT server

Concept Tagger

Timbl server

Phrase Chunker

TiMBL 5.0

MBT 2.0

http://ilk.uvt.nl/

TiMBL server

Known words

TiMBL server

Unknown words


Rule based approach syntax

Rule-Based Approach (syntax)

  • Deletion rules mark phrases for deletion based on shallow parser output

  • Rules for adverbs, adjectives, PNPs, subordinate sentences, interjections, ...

  • Phrases are deleted iteratively until target compression rate is met


Automatic sentence compression in the musa project

Example Rule ADJECTIVES:

if(POS(word) == JJ && CHUNK(word) != ADJP-END && word-1 != most || least || more || less)

{delete(word)

if (word-1==CC && word-2==JJ)

{delete(word-1)}

elseif (word+1==CC && word+2==JJ)

{delete(word+1)}

}

Adam 's [ only ] [ serious ] childhood illness had been measles

The virus triggered an [1 extremely ]1 [2 rare [3 and ]2 fatal ]3 condition


Relevance measures semantics

Relevance Measures (“semantics”)

  • Deletion rules suggest more deletions than necessary for reaching target compression

  • System rates the different possibilities and starts with deleting the least important phrases

  • Relevance measures in MUSA are based on (a weighted combination of)

    • Word frequencies (in BNC corpus)

    • Rule Probabilities (as encountered in parallel BBC corpus of transcripts with associated subtitles)

    • Word Durations (compare estimates with actual durations)


Example1

Example

  • This is a basic summarizer for English used for demonstration purposes.

  • (NP This) is (NP a basic11 summarizer) (PNPfor English)10 used (PNPfor demonstration purposes)12.

    • This is a basic summarizer used for demonstration purposes

    • This is a summarizer used for demonstration purposes

    • This is a summarizer used


Evaluation data lumiere

Evaluation Data (Lumiere)

  • MMR - every parents choice

    • 243 segments

    • 39.5% of the segments need compression

    • Average target compression rate: 4.58 words, 1.98 chars

  • The Tranquiliser Trap

    • 287 segments

    • 50.52% of the segments need compression

    • Average target compression rate: 3.21 words, 2.0 chars


Human evaluation

Tranquilizer Trap (%)

MMR Every parent’s choice (%)

Syntax

85

87

Semantics

78

85

Compression Rate reached

85

84

Perfect

69

72

Human Evaluation


Conclusions

Conclusions

  • We presented the Sentence Compression Module of the MUSA system

  • Eclectic system combining statistical techniques for relevance detection with handcrafted deletion rules based on shallow parser output

  • Evaluation suggests usefulness (with transcripts as input)

  • Future Work

    • Porting to other languages

    • Machine Learning of paraphrases


Demos

Demos

  • Sentence Compression http://cnts.uia.ac.be/cgi-bin/anja/musa

  • MUSA demo http://sifnos.ilsp.gr/musa/demos


  • Login