Recognition of video text through temporal integration
Download
1 / 25

Recognition of Video Text Through Temporal Integration - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Recognition of Video Text Through Temporal Integration. Trung Quy Phan , Palaiahnakote Shivakumara Tong Lu and Chew Lim Tan. Introduction. Text extraction from video frames  video search and retrieval. Introduction. Low resolution Complex background Unconstrained appearance.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Recognition of Video Text Through Temporal Integration' - lydie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Recognition of video text through temporal integration

Recognition of Video TextThrough Temporal Integration

TrungQuyPhan, PalaiahnakoteShivakumaraTong Lu and Chew Lim Tan


Introduction
Introduction

  • Text extraction from video frames video search and retrieval


Introduction1
Introduction

  • Low resolution

  • Complex background

  • Unconstrained appearance


Introduction2
Introduction

  • Low resolution

  • Complex background

  • Unconstrained appearance

  • Temporal information


Problem
Problem

  • Input

    • Word bounding box in a reference frame

    • Frame ID

  • Output

    • Binarized image

  • Scope

    • Static texts

    • Linearly moving texts


Approach
Approach

  • Tracking

  • Alignment

  • Integration

  • Refinement


1 tracking
1. Tracking

  • Find

    • [tstart, tend]  text framespan

    • Bounding box in each frame  text instance

tstart …

… tend

tref


1 tracking1
1. Tracking

  • Text descriptors


1 tracking2
1. Tracking

  • Text descriptors

  • Stroke Width Transform-SIFT


1 tracking3
1. Tracking

  • t = tref + 1, tref + 2, …

  • Initialize search area


1 tracking4
1. Tracking

  • t = tref + 1, tref + 2, …

  • Initialize search area

  • If matchRatio ≥ 0.1  estimate new BB


1 tracking5
1. Tracking

  • t = tref + 1, tref + 2, …

  • Initialize search area

  • If matchRatio ≥ 0.1  estimate new BB

  • Otherwise, found tend


2 alignment
2. Alignment

  • Align at pixel-level  better integration


2 alignment1
2. Alignment

  • Align at pixel-level  better integration

  • Slide reference text mask over individual masks  optimal alignment


2 alignment2
2. Alignment

  • Align at pixel-level  better integration

  • Slide reference text mask over individual masks  optimal alignment


3 integration
3. Integration

  • Text probability map


3 integration1
3. Integration

  • Initial binarization


4 refinement
4. Refinement

  • SWT: rounded strokes

  • Intensity values preserve sharp edges & holes suppress background pixels


Experiments
Experiments

  • Moving text dataset: English + German

    • 250 words

    • 1,545 characters

    • Bottom to top, right to left and left to right

  • Static text dataset: English

    • 212 words

    • 1,389 characters


Experiments1
Experiments

  • Methods for comparison

    • Niblack (Single)

    • Min/max (Multiple)

    • Average-Min/max (Multiple)

    • Ours (Single)

    • Ours (Multiple)




Results on moving texts
Results on Moving Texts

  • Character recognition rate (CRR)

  • Word recognition rate (WRR)


Results on static texts
Results on Static Texts

  • Multiple-frame: ~20% improvement over single-frame


Summary
Summary

  • A variation of SIFTfor robust tracking

  • Integration based onword masks

  • Future work: handle complex text movements


ad