1 / 49

(Off-Line) Cursive Word Recognition

(Off-Line) Cursive Word Recognition. Tal Steinherz Tel-Aviv University. Cursive Word Recognition. Preprocessing. Segmentation. Feature Extraction . Recognition. Post Processing. Preprocessing. Skew correction Slant correction Smoothing Reference line finding. Segmentation Motivation.

hien
Download Presentation

(Off-Line) Cursive Word Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University

  2. Cursive Word Recognition Preprocessing Segmentation Feature Extraction Recognition Post Processing

  3. Preprocessing • Skew correction • Slant correction • Smoothing • Reference line finding

  4. Segmentation Motivation • Given a 2-dimensional image and a model that expects a 1-dimensional input signal, one needs to derive an ordered list of features. • Fragmentation is another alternative where the resulting pieces have no literal meaning.

  5. Segmentation Dilemma • To segment or not to segment ? That’s the question! • Sayre’s paradox: “To recognize a letter, one must know where it starts and where it ends, to isolate a letter, one must recognize it first”.

  6. Recognition Model • What is the basic (atomic) model? • word (remains identical through training and recognition) • letter (concatenated on demand during recognition) • What are the training implications? • specific = total cover (several samples for each word) • dynamic = brick cover (samples of various words that include all possible characters=letters)

  7. . . . . . . i th letter sub-model last letter sub-model 1st letter sub-model Basic Word Model

  8. Segmentation-Free • In a segmentation-free approach recognition is based on measuring the distance between observation sequences

  9. Segmentation-Free - continue • The most popular metric is Levenshtein’s Edit Distance, where a transformation between sequences is done by atomic operations: insertion, deletion and substitution - associated with different costs • Implementations: Dynamic programming, HMM

  10. Segmentation-Free (demo) • Each column was translated into a feature vector. • Two types of features: • number of zero-crossing • gradient of the word’s curve

  11. The gradient of the word’s curve at a given pixel column

  12. Normal Transition Null Transition Letter sub-HMM components

  13. Normal Transition Null Transition Letter sub-HMM

  14. Segmentation-Based • In a segmentation-based approach recognition is based on complete bipartite match-making between blocks of primitive segments and letters of a word

  15. Segmentation-Based - continue • The best match is found by the dynamic programming Viterbi algorithm • An implementation by an HMM is very popular and enhances the model capabilities

  16. Segmentation-Based (demo) • First the word is heuristically segmented. • It is preferable to over segment a character. Nevertheless a character must not span more than a predefined number of segments. • Each segment is translated into a feature vector.

  17. Features in Segments (demo) • Global features: • ascenders, descenders, loops, i dots, t strokes • Local features: • X crossings, T crossings, end points, sharp curvatures, parametric strokes • Non-symbolic features: • pixel moments, pixel distributions, contour condings

  18. 1 2 3 4 1 Letter sub-HMM (maximum 4 segments per character)

  19. L M R L M R L Two-Letter joined sub-HMM (0.5-3 segments per character)

  20. Pattern Recognition Issues • Lexicon size: • small (up to 100 words) • limited (between 100 to 1000 words) • infinite (more than 1000 words)

  21. . . . . . . ‘m’ sub-HMM ‘z’ sub-HMM ‘a’ sub-HMM Word Model Extension • A new approach to practice recognition? • path discriminant (a single general word model, a path=hypothesis per word)

  22. Online vs. Off-Line • Online – captured by pen-like devices.the input format is a two-dimensional signal of pixel locations as a function of time (x(t),y(t)). • Off-line – captured by scanning devices.the input format is a two-dimensional image of gray-scale colors as a function of location I(m*n).strokes have significant width.

  23. Online vs. Off-Line (demo)

  24. Online vs. Off-Line (cont.) • In general online classifiers are superior to off-line classifiers because some valuable strokes are blurred in the static image.Sometimes temporal information (stroke order) is also a must in order to distinguish between similar objects.

  25. Online Weaknesses Sensitivity to stroke order, stroke number and stroke characteristics variations: • Similar shapes that resemble in the image domain might be produced by different sets of strokes. • Many redundant strokes (consecutive superfluous pixels) that are byproducts of the continuous nature of cursive handwriting. • Incomplete (open) loops are more frequent.

  26. Off-Line can improve Online • Sometimes the off-line representation enables one to recognize words that are not recognized given the online signal. • An optimal system would combine online and off-line based classifiers.

  27. The desired integration between online and off-line classifiers • Having a single word recognition engine to practice both the online and off-line data. • It requires an off-line to online transformation to extract an alternative list of strokes that preserves off-line like features while being consistent in order.

  28. Online signal Projection to image Domain Bitmap image Stroke width=1 Online signal “Painting” (thickening the strokes) Real static image Stroke width>1 The “pseudo-online” transformation Pseudo-online representation Online recognition engine C l a s s i f i c a t i o n Online classifiers Pseudo-online classifiers Online classification outputs Pseudo-online classification outputs Integration by some combination scheme Recognition results

  29. Cursive Handwriting Terms • Axis - The main subset of strokes that assemble the backbone, which is the shortest path from left to right including loops on several occasions. • Tarsi - The other subsets of connected strokes that produce branches, which are hang above (in case of ascenders) or below (in case of descenders) the axis .

  30. The Pseudo-Online Transformation • Follow the skeleton of the axis from the left most pixel until reaching the first intersection with a tarsus. • Surround the tarsus by tracking its contour until returning back to the intersection point we started from. • Continue along the axis to the next intersection with a tarsus, and so on until the right most pixel is reached. • Loops that are encountered along the axis are also surrounded completely.

  31. Computing the axis’s skeleton

  32. Computing the axis’s skeleton (cont.)

  33. Computing the axis’s skeleton (cont.)

  34. Processing the tarsi

  35. Processing the tarsi (cont.)

  36. Handling i-dots

  37. Experimental Setup • The online word recognition engine of Neskovic et al. – satisfies Trainability and Versatility. • A combination of 6/12 online and pseudo-online classifiers. • Several combination schemes – majority vote, max rule, sum rule. • An extension of the HP’s dataset that can be found in the UNIPEN collection.

  38. Experimental Setup (cont.) • Different Training sets of 46 writers. • Disjoint validation sets of 9 writers. • Disjoint test set of 11 writers. • The lexicon contains 862 words.

  39. Experimental Results for 6 Classifiers

  40. Experimental Results for 12 Classifiers

  41. Result Analysis • Word level - in 110 word classes (12.8%) at least 7 word samples (10.6%) were correctly recognized only by the combination with the pseudo-online classifiers. • Writer level – for 12 writers (18.2%) at least 65 of the words they produced (7.5%) were correctly recognized only by the combination with the pseudo-online classifiers.

  42. Result Analysis (cont.) • 909 of the input words (5.9%) were correctly recognized by at least one pseudo-online classifier and neither one of the 12 online classifiers. • 357 of the input words (2.3%) were correctly recognized by at least 4 of the 12 pseudo-online classifiers and neither one of the 12 online classifiers. • For 828 of the input words (5.3%) the difference between the number of pseudo-online and online classifiers that correctly recognized them was 6 or more.

  43. Conclusions • The pseudo-online representation does add information that cannot be obtained by optimizing \ extending a combination of online classifiers only.

More Related