1 / 17

Development of an OCR System Third Quarter

Development of an OCR System Third Quarter. Nathan Harmata Period 5. Recap of Goals for 3rd Quarter. More heuristics for Character Recognition. Make results more “spread out”. Make results more “spread out”. Minor goal: “generic character models”. Diagram of OCR System. Input Image.

sana
Download Presentation

Development of an OCR System Third Quarter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of an OCR System Third Quarter Nathan Harmata Period 5

  2. Recap of Goals for 3rd Quarter More heuristics for Character Recognition Make results more “spread out” Make results more “spread out” Minor goal: “generic character models”

  3. Diagram of OCR System Input Image Blocks of Text Image Processing Lines Lines Lines Words Words Words Letter Letter Letter Transformations Transformations Character Recognition Character Model Comparison to GCDD and recognition

  4. Image Processing Uses whitespace between lines and words

  5. Character Recognition Developed new heuristic: GapVector Developed GCD (Generic Character Definitions)‏ Character Models and Attributes

  6. Attribute My two comparison heuristics, SectorVector and GapVector, are extensions of the Attribute class Attribute Description Comparison Scalar Data Method of comparison Uses specific comparison used by the overriding class Method of output to database Uses outputs of overriding class

  7. Character Model Came up with idea during 2nd Quarter Way of organizing data and making code cleaner Character Model Attribute Attribute HashMap of Attributes Method of comparison Uses vector difference of Attribute Vectors Method of output to database Uses outputs of Attributes Method of hashing Uses hashcode of output

  8. Sector Vector - SectorParsing Deals with the major flaw with SlopeField Parses the image into portions that pass the vertical line test. Each portion is then transformed into a SlopeField.

  9. Sector Vector - SlopeField

  10. Gap Vector - Theory Separates the different letters into groups Examples: n, m u, v Q, o, 0

  11. Gap Vector - Gap Parsing First find the four corners of the letter Defined as the intersection of the two paths

  12. Gap Vector - Gap Parsing First idea: relative gap coverages -coverage of pixels on the line that would exists if there were a gap Areas of pixel coverage Straight-line path

  13. Gap Vector - Gap Parsing Second idea: locations of pixels relative to the gap This just ends up being a comparison of the area of the pixels in front of the straight line to the areas of the pixels behind the straight line -More pixels in front of gap -> no gap

  14. Gap Vector - Gap Parsing

  15. Putting everything together Example Character Model output: c SectorVector -2 3 GapVector R Do this for every letter of the alphabet for a lot of different fonts and average the results To recognize an individual image of a letter, find the best matches in the cache based using the “compareTo” method of the CharacterModel class

  16. Character Model Cache a SectorVector -5 5 GapVector b GapVector SectorVector 4 3 c SectorVector -2 3 GapVector R d SectorVector -1 3 GapVector e SectorVector -2 3 GapVector f SectorVector 0 3 GapVector R g SectorVector -1 5 GapVector h SectorVector 0 1 GapVector i SectorVector 0 2 GapVector L j SectorVector 0 4 GapVector k SectorVector -2 3 GapVector R l SectorVector 0 1 GapVector m SectorVector -3 1 GapVector T n SectorVector -1 1 GapVector o SectorVector -3 3 GapVector p GapVector SectorVector 4 3 q SectorVector -1 3 GapVector r SectorVector 0 1 GapVector R s SectorVector -2 6 GapVector t SectorVector 0 3 GapVector u SectorVector 0 1 GapVector T v GapVector T SectorVector -2 1 w SectorVector -5 1 GapVector T x SectorVector -4 3 GapVector T L y SectorVector -2 3 GapVector T L z SectorVector 1 4 GapVector L ** Use Java Reflections for generic Attribute handling

  17. Goals for 4th Quarter Get everything working together -almost done, haven’t tested it yet Think of another heuristic if the results are good enough Make GUI for OCR system Noise removal

More Related