Handwritten character recognition using hidden markov models
1 / 9

- PowerPoint PPT Presentation

  • Updated On :

Handwritten Character Recognition using Hidden Markov Models. Quantifying the marginal benefit of exploiting correlations between adjacent characters and words. Optical Character Recognition. Rich field of research with many applicable domains

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - aleshanee

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Handwritten character recognition using hidden markov models l.jpg

Handwritten Character Recognition using Hidden Markov Models

Quantifying the marginal benefit of exploiting correlations between adjacent characters and words

Optical character recognition l.jpg
Optical Character Recognition

  • Rich field of research with many applicable domains

    • Off-line vs. On-line (includes time-sequence info)

    • Handwritten vs. Typed

    • Cursive vs. Hand-printed

    • Cooperative vs. Random Writers

    • Language-specific differences of grammar and dictionary size

  • We focus on off-line mixed-modal English data set with mostly handwritten and some cursive data

  • Observation is monochrome bitmap representation of each letter with segmentation problem already solved for us (but poorly)

  • Pre-processing of dataset for noise filtering and normalizations of scale also assumed done

Common approaches to ocr l.jpg
Common Approaches to OCR

  • Statistical Grammar Rules and Dictionaries

  • Feature Extraction of observations

    • Global features: Moments and invariants of image (e.g., percentage of pixels in certain region, measuring curvature)

    • Local features: Group windows around image pixels

  • Hidden Markov Models

    • Used mostly in cursive domain for easy training and to avoid segmentation issues

    • Most HMMs use very large models with words as states, combined with above approaches, which is more applicable to domains of small dictionary size with other restrictions

Visualizing the dataset l.jpg
Visualizing the Dataset

  • Data Collected from 159 subjects with varying styles, printed and cursive

  • Missing first letter of each word to simplify capital letters

  • Each character represented by 16x8 array of bits

  • Character meta-data includes correct labels and end-of-word boundaries

  • Pre-processed into 10 cross-validation folds

Our approach hmms l.jpg
Our Approach: HMMs

  • Primary Goal: Quantify the impact of correlations between adjacent letters and words

  • Secondary Goal: Learn an accurate classifier for our data set

  • Our Approach: Use a HMM and compare to other algorithms

    • 26 states of HMM each represent letter of alphabet

    • Supervised learning of model with labeled data

    • Prior probabilities and transition matrix learned by frequency of letters in training

    • Learning algorithm for emission probabilities uses Naive Bayes assumption (i.e., pixels conditionally independent given the letter)

    • Viterbi algorithm predicts most probable sequence of states given the observed character pixel maps

Algorithms and optimizations l.jpg
Algorithms and Optimizations

  • Learning algorithms implemented and tested:

    • Baseline Algorithm: Naïve Bayes Classifier (no HMM)

    • Algorithm 2: NB with maximum probable classification over a set of shifted observations

      • Motivation was to compensate for correlations between adjacent pixels not included in Naïve Bayes assumption

    • Algorithm 3: HMM with NB assumption

      • Fix for incomplete data: Examples ‘hallucinated’ prior to training

    • Algorithm 4: Optimized HMM with NB assumption

      • Ignore effects of inter-word transitions when learning HMM

    • Algorithm 5: Dictionary Creation and Lookup with NB assumption (no HMM)

      • Geared toward specific data set with small dictionary size, but less generalizable to more constrained data sets with larger dictionaries

Alternative algorithms and experimental setup l.jpg
Alternative Algorithms and Experimental Setup

  • Other variants considered but not implemented:

    • Joint Bayes parameter estimation (too many probabilities to learn, 2^128 vs. 3,328)

    • HMM with 2nd-order Markov assumption (exponential in number of Viterbi paths)

    • Training Naïve Bayes over a set of shifted and overlayed observations (preprocessing to create thicker boundary)

  • All experiments run with 10-fold cross-validation

  • Results given as averages with standard deviations

Conclusions l.jpg

  • Naïve Bayes classifier did pretty good on its own (62.7% accuracy - 15x better than random classifier!)

  • Classification on shifted data did worse since we lost data on edges!

  • Small dictionary size of dataset affected results:

    • Optimized HMM w/ NB achieves 71% accuracy

      • Optimizations only marginally significant because of dataset

      • More simple and flexible approach for achieving impressive results on other datasets

    • Dictionary approach is almost perfect with 99.3% accuracy!

      • Demonstrates additional benefit of exploiting domain constraints, grammatical or syntactic rules

      • Not always feasible: dictionary may be unknown, too large, or the data may not be predictable