Recognition of video text through temporal integration
Download
1 / 25

Recognition of Video Text Through Temporal Integration - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Recognition of Video Text Through Temporal Integration. Trung Quy Phan , Palaiahnakote Shivakumara Tong Lu and Chew Lim Tan. Introduction. Text extraction from video frames  video search and retrieval. Introduction. Low resolution Complex background Unconstrained appearance.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Recognition of Video Text Through Temporal Integration' - lydie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Recognition of video text through temporal integration

Recognition of Video TextThrough Temporal Integration

TrungQuyPhan, PalaiahnakoteShivakumaraTong Lu and Chew Lim Tan


Introduction
Introduction

  • Text extraction from video frames video search and retrieval


Introduction1
Introduction

  • Low resolution

  • Complex background

  • Unconstrained appearance


Introduction2
Introduction

  • Low resolution

  • Complex background

  • Unconstrained appearance

  • Temporal information


Problem
Problem

  • Input

    • Word bounding box in a reference frame

    • Frame ID

  • Output

    • Binarized image

  • Scope

    • Static texts

    • Linearly moving texts


Approach
Approach

  • Tracking

  • Alignment

  • Integration

  • Refinement


1 tracking
1. Tracking

  • Find

    • [tstart, tend]  text framespan

    • Bounding box in each frame  text instance

tstart …

… tend

tref


1 tracking1
1. Tracking

  • Text descriptors


1 tracking2
1. Tracking

  • Text descriptors

  • Stroke Width Transform-SIFT


1 tracking3
1. Tracking

  • t = tref + 1, tref + 2, …

  • Initialize search area


1 tracking4
1. Tracking

  • t = tref + 1, tref + 2, …

  • Initialize search area

  • If matchRatio ≥ 0.1  estimate new BB


1 tracking5
1. Tracking

  • t = tref + 1, tref + 2, …

  • Initialize search area

  • If matchRatio ≥ 0.1  estimate new BB

  • Otherwise, found tend


2 alignment
2. Alignment

  • Align at pixel-level  better integration


2 alignment1
2. Alignment

  • Align at pixel-level  better integration

  • Slide reference text mask over individual masks  optimal alignment


2 alignment2
2. Alignment

  • Align at pixel-level  better integration

  • Slide reference text mask over individual masks  optimal alignment


3 integration
3. Integration

  • Text probability map


3 integration1
3. Integration

  • Initial binarization


4 refinement
4. Refinement

  • SWT: rounded strokes

  • Intensity values preserve sharp edges & holes suppress background pixels


Experiments
Experiments

  • Moving text dataset: English + German

    • 250 words

    • 1,545 characters

    • Bottom to top, right to left and left to right

  • Static text dataset: English

    • 212 words

    • 1,389 characters


Experiments1
Experiments

  • Methods for comparison

    • Niblack (Single)

    • Min/max (Multiple)

    • Average-Min/max (Multiple)

    • Ours (Single)

    • Ours (Multiple)




Results on moving texts
Results on Moving Texts

  • Character recognition rate (CRR)

  • Word recognition rate (WRR)


Results on static texts
Results on Static Texts

  • Multiple-frame: ~20% improvement over single-frame


Summary
Summary

  • A variation of SIFTfor robust tracking

  • Integration based onword masks

  • Future work: handle complex text movements


ad