On combining multiple segmentations in scene text recognition
Sponsored Links
This presentation is the property of its rightful owner.
1 / 18

On Combining Multiple Segmentations in Scene Text Recognition PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on
  • Presentation posted in: General

On Combining Multiple Segmentations in Scene Text Recognition. Lukáš Neumann and Ji ří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague. T alk Overview. End-to-End Scene Text Recognition - Problem Introduction The TextSpotter System

Download Presentation

On Combining Multiple Segmentations in Scene Text Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


On Combining Multiple Segmentations in SceneText Recognition

Lukáš Neumann and Jiří Matas

Centre for Machine Perception, Department of Cybernetics

Czech Technical University, Prague


Talk Overview

  • End-to-End Scene Text Recognition - Problem Introduction

  • The TextSpotter System

  • Character Detection as Extremal Region (ERs) Selection

  • Line formation & Character Recognition

  • Character Ordering

  • Optimal Sequence Selection

  • Experiments


End-to-End Scene Text Recognition

Input: Digital image (BMP, JPG, PNG) / video (AVI)

Lexicon-free method

Output: Set of words in the imageword = (horizontal)rectangular bounding box, text content

Bounding Box=[240;1428;391;1770]

Content="TESCO"


System Overview

  • Multi-scale Character Detection [1]

    with Gaussian Pyramid (new)

  • Text Line Formation [2]

  • Character Recognition [3]

  • Optimal Sequence Selection (new)

[1] L. Neumann, J. Matas, “Real-time scene text localization and recognition”, CVPR 2012

[2] L. Neumann, J. Matas, “Text localization in real-world images using efficiently pruned exhaustive search”, ICDAR 2011

[3] L. Neumann, J. Matas, “A method for text localization and recognition in real-world images”, ACCV 2010


Character Detection - Thresholding

Input image

(PNG, JPEG, BMP)

1Dprojection

<0;255>

(grey scale, hue,…)

Extremal regions with threshold

( =50, 100, 150, 200)


Extremal Regions (ER)

  • Let image I be a mapping I: Z2 S Let S be a totally ordered set, e.g. <0, 255>

  • Let A be an adjacency relation (e.g. 4-neigbourhood)

  • Region Q is a contiguous subset w.r.t. A

  • (Outer) Region Boundary δQ is set of pixels adjacent but not belonging to Q

  • Extremal Region is a region where there exists a threshold that separates the region and its boundary : pQ,qQ : I(p) <   I(q)

 = 32

  • Assuming character is an ER, 3 parameters still have to be determined:

    • Threshold

    • Mapping to a totally order set (colour space projection)

    • Adjacency relation


ER Detection - Threshold Selection

  • Character boundaries are often fuzzy

  • It is very difficult to locally determine the threshold value, typical document processing pipeline (image binarization OCR) leads to inferior results

  • Thresholds that most probably correspond to a character segmentation are selected using a CSER classifier [1], multiple hypotheses for each character are generated

[1] L. Neumann and J. Matas, “Real-time scene text localization and recognition”, CVPR 2012


ER Detection – Threshold Selection

  • p(r|character) estimated at each threshold for each region

  • Only regions corresponding to local maxima selected by the detector

  • Incrementally computed descriptors used for classification [1]

    • Aspect ratio

    • Compactness

    • Number of holes

    • Horizontal crossings

  • Trained AdaBoost classifier with decision trees calibrated to output probabilities

  • Linear complexity, real-time performance (300ms on an 800x600px image)

[1] L. Neumann and J. Matas, “Real-time scene text localization and recognition”, CVPR 2012


ER Detection - Color Space Projection

  • Color space projection maps a color image into a totally ordered set

  • Trade-off between recall and speed (although can be easily parallelized)

  • Standard channels (R, G, B, H, S, I) of RGB / HSI color space

  • 85.6% characters detected in the Intensity channel, combining all channels increases the recall to 94.8%

Source Image

Intensity Channel

(no threshold exists for the letter “A”)

Red Channel


ER Detection - Gaussian Pyramid

  • Pre-processing with a Gaussian pyramid alters the adjacency relation

  • At each level of the pyramid only a certain interval of character stroke widths is amplified

  • Not a major overhead as each level is 4 times faster than the previous one, total processing takes ~ 4/3 of the first level (1 + ¼ + ¼2…)

Characters formed of multiple

small regions

Multiple characters joint together


Character Recognition

  • Regions agglomerated into text lines hypotheses by exhaustive search [1]

  • Each segmentation (region) labeled by a FLANN classifier trained on synthetic data [2]

  • Multiple mutually exclusive segmentations with different label(s) present in each text line hypothesis

P

ilI

n

f

f

m

A

n

[1] Neumann, Matas, Text localization in real-world images using efficiently pruned exhaustive search, ICDAR 2011

[2] Neumann, Matas, A method for text localization and recognition in real-world images”, ACCV 2010


Character Ordering

  • Region A is a predecessor of a region B if A immediately precedes B in a text line

  • Approximated by a heuristic function based on text direction and mutual overlap

  • The relation induces a directed graph for each text line


Optimal Sequence Selection

  • The final region sequence of each text line is selected as an optimal path in the graph, maximizing the total score

  • Unary terms

    • Text line positioning (prefers regions which “sit nicely” in the text line)

    • Character recognition confidence

  • Binary terms (regions pair compatibility score)

    • Threshold interval overlap (prefers that neighboring regions have similar threshold)

    • Language model transition probability (2nd order character model)

Accommodation


Experiments

ICDAR 2011 Dataset – Text Localization

[1] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform”, CVPR 2010


Experiments

ICDAR 2011 Dataset – Text Localization

[1] C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao, “Scene text detection

using graph model built upon maximally stable extremalregions”, Pattern Recognition Letters, 2013

[2] A. Shahab, F. Shafait, and A. Dengel, “ICDAR 2011 robust reading

competition challenge 2: Reading text in scene images”, ICDAR 2011

[3] L. Neumann and J. Matas, “Real-time scene text localization and recognition”, CVPR 2012

[4] C. Yi and Y. Tian, “Text string detection from natural scenes by structure-based partition and grouping”, Image Processing, 2011

[5] S. M. Hanif and L. Prevost, “Text detection and localization in complex scene images using constrained adaboostalgorithm”, ICDAR 2009


Experiments

ICDAR 2011 Dataset – End-to-End Text Recognition

Percentage of words correctly recognized without any error – case-sensitive comparison (ICDAR 2003 protocol)

[1] L. Neumann and J. Matas, “Real-time scene text localization and recognition”, CVPR 2012


Sample Results on the ICDAR 2011 Dataset

FREEDON

chips

cut

CABOT

PLACF


Conclusions

  • Multi-scale processing / Gaussian Pyramid improves text localization results without a significant impact on speed

  • Combining several channels and postponing the decision about character detection parameters (e.g. binarization threshold) to a later stage improves localization and OCR accuracy

  • Method current state

    • The method placed second in ICDAR 2013 Text Localization competition, 1.4% worse than the winner (f-measure)(unfortunately, end-to-end text recognition is not part of the competition)

    • Online demo available at http://www.textspotter.org/

    • OpenCV implementation of the character detector in progress by the open source community

  • Future work

    • OCR accuracy improvement

    • Overcoming limitations of CC-based methods (e.g. non-linearity non-robustness caused by a single pixel)


  • Login