wp4 22 final evaluation of subtitle generator n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
WP4-22. Final Evaluation of Subtitle Generator PowerPoint Presentation
Download Presentation
WP4-22. Final Evaluation of Subtitle Generator

Loading in 2 Seconds...

play fullscreen
1 / 21

WP4-22. Final Evaluation of Subtitle Generator - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

WP4-22. Final Evaluation of Subtitle Generator. Vincent Vandeghinste, Pan Yi CCL – KULeuven. Example. Transcript: Het meest spectaculaire aan de daadwerkelijke start van de euro is dat er eigenlijk niets spectaculairs te melden valt. Ondertitel:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'WP4-22. Final Evaluation of Subtitle Generator' - rafael


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
wp4 22 final evaluation of subtitle generator

WP4-22. Final Evaluation of Subtitle Generator

Vincent Vandeghinste, Pan Yi

CCL – KULeuven

example
Example

Transcript:

Het meest spectaculaire aan de daadwerkelijke start van de euro is dat er eigenlijk niets spectaculairs te melden valt.

Ondertitel:

Het meest spectaculaire aan de start van de euro was dat er niets spectaculairs te melden valt.

availability calculator
Availability Calculator
  • Pronunciation Time of Input Sentence => estimate nr of characters available in subtitle
  • If UNKNOWN, estimate it by
    • counting nr of syllables
    • Average speaking rate for Dutch
syllable counter
Syllable Counter
  • Rule-based
  • Evaluated on CGN-lexicon combined with FREQ-lists
  • Estimated nr  Nr of syl in phonetic transcripts
  • 99.63% of all words in CGN is correctly estimated
availability calculator1
Availability Calculator
  • When pronunciation time not given: estimate it
  • Subtitles: 70 chars / 6 sec = 11.67 chars/sec
  • If nr of chars > nr of available chars => compress sentence
sentence compressor
Sentence Compressor
  • Parallel Corpus
  • Sentence Analysis
  • Sentence Compression
  • Evaluation
parallel corpus
Parallel Corpus
  • Sentence aligned
  • Source & Target corpus:
    • Tagging
    • Chunking
    • SSUB detection
  • Chunk alignment
chunk alignment
Chunk Alignment

Every 4-gram from src-chnk is compared with every 4-gram from tgt-chnk

A = ( m / (m+n)) . (L1 + L2)/2

If (A > 0.315) then Align Chunk

F-value for NP/PP-alignment is 95%

sentence analysis
Sentence Analysis
  • Tagging (TnT): accuracy = 96.2% (Oostdijk et al., 2002)
  • Chunking
sentence analysis 2
Sentence Analysis (2)
  • SSUB detection
sentence compression
Sentence Compression
  • Use of statistics
  • Use of rules
  • Word reduction
  • Selection of the Compressed Sentence
use of rules
Use of rules
  • To avoid generating ungrammatical sentences
  • Rules of type

For every NP, never remove the head noun

  • Rules are applied recursively
word reduction
Word Reduction
  • Example: replace gevangenisstraf by straf
  • Counterexample: replace voetbal by bal
  • Making use of Wordbuilding module (WP2)
  • Introduces a lot of errors: added accuracy?
  • Better integration with rest of system should be possible
selection of the compressed sentence
Selection of the Compressed Sentence
  • All previous steps result in an ordered list of sentence alternatives
    • Supposedly grammatically correct
    • Sentences are ordered depending on their probability
    • First sentence (most probable) with a length smaller than available nr of chars is chosen
subtitle layout generator
Subtitle Layout Generator

Actieve of gewezen voetballers

zoals Ruud Gullit of Dennis

Bergkamp moeten het stellen met

nauwelijks anderhalf miljard .

wordt

Actieve of gewezen voetballers

zoals Ruud Gullit of

Dennis Bergkamp moeten het stellen

met nauwelijks anderhalf miljard .

conclusion
Conclusion
  • System approach works very well:
    • If sentence analysis is correct
    • If there are possible reductions (according to the ruleset)
  • A lot of No Output cases: System cannot reduce sentence
    • Sentence cannot be reduced (even by humans)
    • Rule-set is too strict / Wrong sentence analysis
    • Not fine-grained enough statistical info
  • Bad output:
    • Wrong sentence analysis (CONJ)
    • Wrong word-reductions
future
Future
  • Near future (within Atranos)
    • Better integration of word-reduction
    • Combine advantages of CNTS approach and CCL approach into one approach
  • Far future (outside Atranos)
    • Better sentence analysis: full parse is needed
    • More fine-grained analysis of parallel corpus