180 likes | 274 Views
Explore the process of handwriting recognition through transducer devices, preprocessing, segmentation, and Fisher Discriminant Analysis with student Jong Oh Davi Geiger from NYU. Challenges and recognition results are also discussed.
E N D
On-Line Handwriting Recognition • Transducer device (digitizer) • Input: sequence of point coordinates with pen-down/up signals from the digitizer • Stroke: sequence of points from pen-down to pen-up signals • Word: sequence of one or more strokes. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
System Overview Pre-processing (high curvature points) Input Dictionary Segmentation Character Recognizer Recognition Engine Context Models Word Candidates Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Segmentation Hypotheses • High-curvature points and segmentation points: Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Character Recognition I • Fisher Discriminant Analysis (FDA): improves over PCA (Principal Component Analysis). p=WTx Linear projec- tion Original space Projection space • Training set: 1040 lowercase letters, Test set: 520 lowercase letters • Test results: 91.5% correct Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Fisher Discriminant Analysis • Between-class scatter matrix • C: number of classes • Ni: number of data vectors in class i • i: mean vector of class i and: mean vector • Within-class scatter matrix • vji: j-th data vector of class i. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Given a projection matrix W (of size n by m) and its linear transformation , the between-class scatter in the projection space is Similarly Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Fisher Discriminant Analysis (cont.) • Optimization formulation of the fisher projection solution: (YB, YW are scatter matrices in projection space) Work with student Jong Oh Davi Geiger, Courant Institute, NYU
FDA (continued) • Construction of the Fisher projection matrix: • Compute the n eigenvalues and eigenvectors of the generalized eigenvalue problem: • Retain the m eigenvectors having the largest eigenvalues. They form the columns of the target projection matrix. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Character Recognition Results • Training set: 1040 lowercase letters • Test set: 520 lowercase letters • Test results: Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Challenge I • The problem of the previous approach is: non-characters are classified as characters. When applied to cursive words it creates several/too many non-sense word hypothesis by extracting characters where they don’t seem to exist. • More generally, one wants to be able to generate shapes and their deformations. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Challenge II • How to extract reliable local geometric features of images (corners, contour tangents, contour curvature, …) ? • How to group them ? • Large size data base to match one input, how to do it fast ? • Hierarchical clustering of the database, possibly over a tree structure or some general graph. How to do it ? Which criteria to cluster ? Which methods to use it ? Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Recognition Engine • Integrates all available information, generates and grows the word-level hypotheses. • Most general form: graph and its search. • Hypothesis Propagation Network Work with student Jong Oh Davi Geiger, Courant Institute, NYU
H (t, m) Class m's legal predecessors List length T t Look-back window range 3 2 1 Time "a" "b” m "y" "z" Hypothesis Propagation Network Recognition of 85% on 100 words (not good) Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Challenge III • How to search more efficiently in this network and more generally on Bayesian networks ? Work with student Jong Oh Davi Geiger, Courant Institute, NYU
“go” Relative height ratio and positioning “90” Character heights Visual Bigram Models (VBM) • Some characters can be very ambiguous when isolated: “9” and “g”; “e” and “l”; “o” and “0”; etc, but more obvious when put in a context. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
VBM: Parameters • Height Diff. Ratio: • HDR = (h1- h2) / h • Top Diff. Ratio: • TDR = (top1- top2) / h • Bottom Diff. Ratio: • BDR = (bot1- bot2) / h top1 top2 h1 h h2 bot1 bot2 Work with student Jong Oh Davi Geiger, Courant Institute, NYU
VBM: Ascendancy Categories • Total 9 visual bigram categories (instead of 26x26=676). Work with student Jong Oh Davi Geiger, Courant Institute, NYU
VBM: Test Results Work with student Jong Oh Davi Geiger, Courant Institute, NYU