1 / 21

UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture

UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture. Richard Lang International Manager. Agenda. OCR Optical Character Recognition ICR Intelligent Character Recognition DFR Dynamic Form Recognition. OCR = optical character recognition. Technology was first invented in 1929

emmett
Download Presentation

UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UN Workshop on Data Capture,Dar es Salaam Session 7Data Capture Richard Lang International Manager

  2. Agenda • OCROptical CharacterRecognition • ICRIntelligent CharacterRecognition • DFRDynamic Form Recognition

  3. OCR = optical character recognition • Technology was first invented in 1929 • Gustav Tauschek obtained a patent on OCR in Germany • Mechanical device that used templates • First commercial system was installed at Readers Digest in 1955 • Years later donated to the Smithsonian Institution • Today • Recognition ofmachine written textis now considered largely a solved problem • Accuracy rates exceed 99%

  4. OCR • Beta Systems well experienced with this recognition engines in Banks • in GermanyOCR A⑁ Chair⑀ Hook ⑂Fork • Austria OCRB+ Plus

  5. ICR Intelligent Character Recognition • The technique is far ahead of OCRbecause of ongoing development of ICR • Handwriting recognitionsystem • Allows different styles of handwritingto be learned by a computerduring / before processingto improve accuracyand recognition rates

  6. ICR Process: • Capturingthe image with Scanners • Processing by (ICR) and/or (OCR) • Segmentationis a very important step • Decision if the homogenous criteria belong to the foreground or to the background • Human editors can do that depending on the context • Compare also computer tomography: according to different results from radio waves reflected from different angels the computer can reconstruct the picture • With the first step only a suitable starting point (sets of pixels) is possible • The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence)

  7. ICR Process: • Pre-processing • Deskew • Shift, rotate • Stretch

  8. ICR Process: • Enhance • Less / More Contrast • Clean up(de-noise, halftone removal) • to enable the recognition engine to give best results

  9. ICR Process: • Feature extraction • Data reduction

  10. ICR Process: • Classification • A one was written • 90 % = 1 • 8 % = 7 2 % = 4

  11. ICR Algorithm: • Neural Network • Using kNNk-Nearest Neighbour • SVMSupport Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers

  12. ICR Process: • After different classification alternatives the appropriate confidence will be provided • Recognition Limitation only for most probable characterse.g. if only characters 3,6,0 are possible the engine can also be limited to this setand the results are much better • Voting Machine • Usability: • security, • efficiency and • Accuracy

  13. Dynamic Field Recognition • No fixed positionis required • If form is only½ available still ½ readable • No special Formsare required • No timing tracksare necessary on the forms for OMR butresultsare also availablethe same timeno cleaning of LEDs in the scanner necessary • Robust against vertical / horizontal stretching or shrinking(e.g. different printers)

  14. Dynamic Field Recognition • Recognizes: • features(word as pixel cloud) • boxes, • lines and • symbols

  15. Hardware- / Software - Requirement • Hardware • Scanner • PC • Network • Disc Storage only necessary if images are needed for audit purposes • Software • Scan Software • One Recognition and Voting Softwarefor OMR, OCR, ICR, Barcode

  16. OMR

  17. ICR Advantages • Better than: • Manual keying • 90 % (plus) correct keysManual = higher substitution ratethan automated recognition • Time consuming • Deliberate manipulation possible • OMR, because OMR is space consuming • OCR, because OCR is machine writtenand therefore of limited use

  18. ICR Advantages • Clear accuracy for OMRbecause of dirt removal by softwaredepending on the mark size and figure • Can detect line and can ignore dirt • Clear result

  19. ICR Advantages • Barcode, • OCR, • OMR, • and ICR Recognition with one Software

  20. ICR Advantages • Pro: • Only rejected characters/fields need correctionRest of the form untouched • With new technologies open for futurefaster, better quality • With standardized correction mode • Handwriting of the corresponding country will be recognized • The previously mentioned advantages do not have to be repeated here again

  21. Thank you for your attention

More Related