1 / 39

December 2007

December 2007. The Census Challenge. Data Capture for Census Projects. “Counted” by eFLOW world wide. 1,374,026,304. TIS’s Experience in Census Projects. Largest market share worldwide in census project s information capture. Governmental projects Australian Department of Defense

oprah
Download Presentation

December 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. December 2007 The Census Challenge Data Capture for Census Projects

  2. “Counted” by eFLOW world wide 1,374,026,304

  3. TIS’s Experience in Census Projects Largest market share worldwide in census projects information capture

  4. Governmental projects Australian Department of Defense Brazilian Department of Statistics Chilean Social Security Office US Social Security Administration Turkish Ministry of Finance Argentinean National Institute of Statistics Population Census worldwide: India, Italy, South Africa, Brazil, Ireland ,Kenya,… And from other segments IBM T-Mobile BP-British Petroleum 3M Europe AQA Examination Board U.K Comicrom Service Bureau BCBG Vermont BKK And few more….

  5. Our Partners – Across the Globe

  6. eFlow Application Forms Export Image files What we supply?eFLOWUnified Content Platform Office Docs Email Web Pages PDFs Text files

  7. eFlow CensusData Eflow platform in Censuses Paper File Census Images

  8. Traditional Data Capture Back-Office Mail Room Scanning Data Entry End Users Document prep Sorting Manual Key from image

  9. Intelligent Document Capture Back-Office Mail Room Scanning Data Entry End Users Document prep No sorting Reduce manual data entry by 40-70% Increase accuracy and consistency

  10. TIS’s Experience in Census Projects • India 2002 (912 million A4 images in 18 months) • Italy 2002 (800 million forms in 180 days) • Brazil 2000 (333 million forms in 100 days) • South Africa 2001 (144 million forms in 130 days) • Ireland 2001 (40 million forms in 120 days) • Germany (DP) 1999 (36 million Forms in 30 days) • Cyprus 2002 (5.2 million forms in 80 days) • Turkey 1997, 2000 (18 million forms in 30 days) • Kenya 2000 (16 million forms in 80 days) • Slovak Republic 2001 (16 million forms in 80 days) • Hong Kong 2001 (10 million forms in 50 days) • Irish Census 2006 (50 million forms in 100 days)

  11. TIS Main Advantages in the Census Arena • Largest market share worldwide in the processing of census projects • Extensive experience in the design, development and implementation of data capture for census projects • Proven solution in census data capture • Data capture platform (paper, electronic, mobile) and not a recognition product • Successful cooperation with local partners in providing census solutions (knowledge transfer, co-implementation, support) • Coding tasks & data validations performed on the data capture platform : a ‘cost-effective’ solution

  12. The evolution of data capture in census projects eFLOW From OCR into IDR Solution

  13. The evolution of data capture in census projects Key From Paper Key From Image • Manual data entry (key from paper) • Slow • High error rate in the data entry process • Recruitment, training and management of personnel • key from Image: • Archive • Approx 30-40% faster than key from paper

  14. The evolution of data capture in census projects OMR OMR (hardware readers for checkbox) • Requires specially printed forms and special scanners • Cannot handle handwritten/printed data • Forms are not user-friendly • Cannot handle double-sided forms • OMR requires more answers => more space => increased paper expenditures => more handling and printing costs • Not flexible, difficult to adjust to other applications once census is over • No possibility to add business rules: computation, validations, coding

  15. The evolution of data capture in census projects Automated data capture Requires less human intervention, enables to complete the census data capture much faster (less space, less salaries, less hardware) Ensures data integrity – enables the use of automatic AND manual: online validations, exception handling, coding The most advanced and proven technology today, recommended by the UN and used by most countries for census projects Full flexibility in the type of data gathered (checkbox, handwritten, alpha and numeric, barcode…) Provides all capabilities of the OMR and plus much more Creates a correlation between the image and the actual form Remote capabilities enable all forms to be scanned locally and then sent to a central site for processing Automated Data Capture eFLOW

  16. Intelligent Data Capture The evolution of data capture in census projects Intelligent data recognition (IDR) • Automated data capture + • Smart - automatic classification for documents • Smart understands and differentiates between various types of documents and languages and Based on state-of-the-art Machine Learning algorithms • Freedom • artificial intelligence algorithms which provides enough information for the system to find the location of the fields on its own

  17. Census-Specific Issues (common issues) and how TIS’s answers it • Peak volume challenge • Long term project • Data integrity • Capture of form identification • Data validation procedures • Automated recognition • Voting algorithms • Data Storage • Image Storage • Personal Data Confidentiality • Statistical Coding

  18. Peak Volume Challenge • The Challenge : • Process very high volumes of forms in a pre-defined period • The Goal : • To successfully gather population data, while meeting a planned schedule and budget • Proposed Solution : • Utilize a data capture platform approach and not “Character Capture” approach. • Optimal combination of technological and operational solutions • Utilize a data capture platform for coding and Edits. • Reducing risks by using an ‘Off the shelf’ product - Extensive experience in similar projects. • On-line operation control tools

  19. On-line operation control tools Eflow’s Controller work station

  20. The ControllerLayout View

  21. A Long-Term Project • Challenge No. 1: • Rapid changes of technology • The Goal : • Utilize new technologies in actual census • Proposed Solution : • Open system (recognition engines, connectivity) • Continuously developed product • Census-focused company.

  22. A Long-Term Project (cont.) • Challenge No. 2: • Post census usage of the data capture system. • The Goal : • Utilize the system for ongoing data capture • Proposed Solution : • Business: Outsourcing/ Renting/ Purchase • Technical: • Break down the system into a few smaller scale independent systems (Scalable system, Flexibility of software and hardware infrastructure). • Powerful set-up utility enables to later use the system for other on-going projects (statistical surveys; governmental service bureau)

  23. Statistical Coding & Editing • The Challenge: • ‘Bottlenecks’ occurs due to insufficient number of statistical experts and\or due to inefficient procedures. • The goal : • Maintain general ‘throughput’ of the system, by avoiding pre- and post-data capture coding. • Proposed Solution : • Using automated recognition and\or ‘key-from-image’ : • Computer-assisted coding as part of the data capture system. • The Code & Edit tasks performed on the data capture platform - a ‘cost-effective’ solution.

  24. ICR & Look-up table Computer Assisted Coding by statistical experts as part of the data capture system (2nd level repair).

  25. Data Storage • TheChallenge : • The need for large volume data and images storage. • The Goal : • Optimization of resources (network, storage facilities). • Proposed Solution : • Using TiS’ unique “Form Out” module; • Reduce network traffic • Reduce storage media • No need for dropout ink (saves printing costs).

  26. Image Storage

  27. Uncompressed census form (200 dpi) occupies 950 Kb; Compressed with CCITT Group 4 reduced it to 100 Kb; FormOut! reduced the same form to only 6 Kb!

  28. ROI Original TIFF EFI DIF How do we do it?

  29. Personal Data Confidentiality (Security) • TheChallenge : • Avoid the exposure of personal information. • The Goal : • Minimize image and data exposure in data capture system by complete access control. • Proposed Solution : Multi level access control: • Overall system\ segment level - set amount of workstations. • User level - personal log-in and permissions for each user. • Computer screens - anonymous images in ‘field mode’. • On-line centralized security control (“Controller’).

  30. Data Validation Procedures • TheChallenge : • Substitution errors (“computer mistakes”) occur. • The Goal : • Eliminating substitution errors and handling invalid responses during the data capture stage - to quicken results release. • Proposed Solution : • Limiting the possible answer; i.e. look-up tables, dictionaries, dates, single OMR response, set numeric range….. • Use of multiple recognition engines - “voting”. • Multi level comparisons - field level; form level; batch level. • Logical validations – automatic + manual

  31. Automated Recognition

  32. JUSTICR ABBYY KADMOS RICOH OCE INLITE EXPERVISION PARASCRIPT A2IA TIS OCR/ICR Engines

  33. ICR B ICR A ICR C *oshua Jo*hu* J*sh*a VotingMethod Joshua Virtual Engine Example

  34. Form Design • TheChallenge : • System efficiency – throughput. • The Goal : • Increase the recognition results • Proposed Solution: • Recommended guidelines (paper developed by TIS): • Considering the need to restrict the optional answers to a limited number of desired possibilities. • Choosing between: • Mark response (Check box). • Numeric response. • Alphabet response. • Combination of the above.

  35. Census Forms Examples

  36. Data Types • OCR – Optical Character Recognition (Machine Print and barcodes) • ICR – Intelligent Character Recognition (Handwriting) • OMR – Optical Marking Recognition (Checkboxes)

  37. Data Types OCR ICR OMR

  38. Why choose TiS? • Extensive experience in real census and other high volume form processing projects - Largest market share worldwide in the processing of census projects • Data capture platform (paper, electronic, mobile) and not a recognition product • Successful cooperation with local partners in providing census solutions (knowledge transfer, co-implementation, support) • Max. flexibility & redundancy - ensures meeting timetable to release census results. • Financially stable company – NASDAQ since 1996

More Related