1 / 27

UNSD Census Workshop

UNSD Census Workshop. Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager. DRS are Worldwide specialists in Census data capture www.drs.co.uk. Intelligent Character Recognition (ICR) Elements Form design

Download Presentation

UNSD Census Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in Census data capture www.drs.co.uk

  2. Intelligent Character Recognition (ICR) Elements Form design Hardware/Software requirements Scanners Computer infrastructure Workflow Accuracy Advantages Disadvantages Data Capture DRS are Worldwide specialists in Census data capture www.drs.co.uk

  3. Forms design Typical stock grade paper (90GSM) Corner Stones advised Dropout colour is recommended but not essential Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  4. Hardware requirements Image Scanners TWAIN or ISIS Database Server (Full redundancy) Storage Server – Terabytes (Raid 5, Mirrored, etc.) Network (Gb preferred) Administrator PC CS-Pro PCs Key correction PCs (Verification) Character Inspection PCs (Mass verification - optional) Scanner PCs Automatic data capture PCs Data Capture - ICR Software requirements • MS-SQL or other database • Data Storage, Archive and Retrieval • Backup Software • Software for Administrator PC • CS-Pro for analysis and reporting PCs • Software for Key correction PCs • Software for Character inspection PCs • Software for Scanner PCs • Software for automatic data capture DRS are Worldwide specialists in Census data capture www.drs.co.uk

  5. Typical Workflow ICR Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  6. Typical Workflow Paper Movement – Processing Centre/s Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  7. Typical Workflow Receiving Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  8. Typical Workflow Logging/Checking Open Batch Verify Contents Register Batch Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  9. Typical Workflow Sifting Orientation Other Forms Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  10. Typical Workflow Spine removal Cut Booklets 30,000/day Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  11. Typical Workflow Scanning Double Sided High Speed Double Detection Ease of Use Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  12. Typical Workflow Scanning/sorting Automatic Identification Data Capture Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  13. Typical Workflow Storage Conditions Retrieval Space Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  14. Typical Workflow Image Movement/Data Extraction – Processing Centre/s Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  15. Typical Workflow Image interpretation Automated Process Background Task Page Identification De-skew Image Clean up Pre-defined Areas Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  16. Typical Workflow Character inspection Tiling High Confidence Operator Decision Field Context Tall to Short Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  17. Typical Workflow Key correction Low Confidence Operator Decision From Context External Verification Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  18. Typical Workflow Key Correction ASCII File CSV Format 1 Line/Form CSPro Import Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  19. Typical Workflow ICR Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  20. Accuracy This is always the first question Handprint Numeric only in isolated fields 98% Numeric only in semi constrained fields 95-96% Alpha upper case only 90% Alpha lower case only 85-87% Alpha mixed case 75-80% Alpha/Numeric mixed case 50% or less reduce by 5% if there are special characters not a-z and 0-9 The accuracy level post data correction (e.g. the final output accuracy) should be 100% (subject to good operators) Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  21. Accuracy continued… The accuracy of all modern ICR engines are pretty much comparable The major differences with suppliers solutions are the methods and workflow utilised with each offering False positive detection takes 10 times longer than entry of characters recognised with low confidence – false positives (substitutions) are the most expensive errors Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  22. Accuracy continued… Accuracy can be improved by: Restricting the responses to any given question Using external verification Using multiple ICR engines to ‘vote’ which is expensive Training your ICR engines on local hand writing styles (If possible) Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  23. Advantages No specialist hardware required An image archive can be automatically produced of every form Very high speed scanning can be achieved Both OMR and ICR can be interpreted using ICR software Forms designed for ICR relatively easy to fill in. Locally printed forms can be used. Allows capturing much more complex data than with OMR alone Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  24. Disadvantages Significant hardware/software and trained IT staff will be required Accuracy dependant on manual intervention High calibre IT staff are required to support the ICR system More complex cost/benefit analysis than with OMR alone Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  25. Indicative Costs For 65 Million Population Census (20M Single Sided A4 household form) Processing period of 12 Weeks (8 hours/day 5 days/week) Hardware $800k-$1M in total Software $700k-$1.3M in total Total Indicative Costs are $1.5M to $2.3M No. of Staff 100-190 in total 6-10 Managers 94-180 PC Operators Data Capture - ICR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  26. Summary ICR offers considerable flexibility at the cost of higher skilled IT personnel The single most important factor for timely and accurate data capture is to make sure ‘the forms are filled in correctly and are returned in good condition’ Data Capture - OMR DRS are Worldwide specialists in Census data capture www.drs.co.uk

  27. UNSD Census Workshop Day 2 - Session 7 Thank you for listening Andy Tye – International Manager DRS are Worldwide specialists in Census data capture www.drs.co.uk

More Related