data capture in census of india l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
DATA CAPTURE IN CENSUS OF INDIA PowerPoint Presentation
Download Presentation
DATA CAPTURE IN CENSUS OF INDIA

Loading in 2 Seconds...

play fullscreen
1 / 56

DATA CAPTURE IN CENSUS OF INDIA - PowerPoint PPT Presentation


  • 264 Views
  • Uploaded on

DATA CAPTURE IN CENSUS OF INDIA. Registrar General & Census Commissioner, India Visit Our Website at www.censusindia.gov.in. FEATURES OF INDIAN CENSUS.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'DATA CAPTURE IN CENSUS OF INDIA' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data capture in census of india

DATA CAPTURE IN CENSUS OF INDIA

Registrar General & Census Commissioner, India

Visit Our Website at

www.censusindia.gov.in

features of indian census
FEATURES OF INDIAN CENSUS
  • India – a large country with more than a billion population Censuses is then one of the world largest administrative and statistical exercise
  • Diversity in languages – Schedules filled in 16 languages
  • 2 million enumerators deployed in 2001 Census – likely to increase further in 2011 census.
features of indian census contd
FEATURES OF INDIAN CENSUS (Contd..)

Census which is conducted using ‘canvasser’ method is in two phases:

House-listing

Population Enumeration

Census Organization has experimented with new IT innovations since the beginning

Technology is required particularly for data capture/processing – mainly due to large volume and for speedier tabulation & release of Census results

data capture processing in 2001 census
DATA CAPTURE & PROCESSING IN 2001 CENSUS

Important Considerations

  • Conventional data entry not suitable for large volume (228 million schedules for 102.8 million population) of data.
  • Availability of advanced IT tools and techniques.
  • Capture and process all the collected information.
  • Complexities in data entry due to multiplicity of languages/responses and size (A3) Census Schedule.
data capture processing in 2001 census6
DATA CAPTURE & PROCESSING IN 2001 CENSUS

Important Considerations (Contd..)

  • Retrieval of original documents for correction labor – intensive.
  • Reduce the time span from 5-8 years to 3-5 years.
  • Compact , reliable and efficient archival system.
  • Better workflow management.
data capture processing in 2001 census7
DATA CAPTURE & PROCESSING IN 2001 CENSUS

Selection and Consequent Action

  • Evaluation of various available technologies (OMR/OCR/ICR).
      • Trial run with NCS and DRS OMR.
      • Trial Run with various ICR vendors.
  • Opted for ICR technology(TIS eFlow)
  • IT Infrastructure in all the 15 Data Centers upgraded to meet the new requirement.
data capture processing in 2001 census8
DATA CAPTURE & PROCESSING IN 2001 CENSUS

Model Conceived for implementation

Services of System Integrator hired to guide and assist in the implementation of ICR technology.

An unique model for Outsourcing

SI to work in our premises for better

communication and control

maintain data security, safety and confidentiality

Capacity building (Training and guiding to IT staff)

Production Linked payment to SI

slide9

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Work Flow of ORGI (TIS Eflow characteristic)

Design data capture workflow

Presents a graphical view of the system

Monitors the processing and workflow in real time

Enables to customize applications and add custom features

slide10

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Work flow Modules

Scan Portal, File Portal, Controller FormID, Manual FormID RC

Processing [OCR/ICR]

Tile, Completion, CAC & Exception

Export

slide11

DATA CAPTURE & PROCESSING IN 2001 CENSUSORGI Workflow Stages

Server

ASCII FILE

Export/Archival

Exception

Completion

Tiling

Recognition

Scanning

Prepare Batch

slide12

DATA CAPTURE & PROCESSING IN 2001 CENSUS

LANSETUP - ORGI DATA CENTERs

Scanning station

Export station

Controllerstation

Supervisor Export completed batches as ASCII file for further processing

Supervisor Monitor the workflow & Balance the load at different stages of operation

Forms are fed thru SCANNER(S) batch by batch

Supervisors Handle Exceptional cases referred by Operators

Form IMAGES stored in Network DISK

Recognition stations

Server

Exception stations

Tile/Correction station - Un-recognised Characters are corrected by OPERATORS

Field by field character images are automatically RECOGNISED

Tiling & Completion stations

data capture processing in 2001 census13
DATA CAPTURE & PROCESSING IN 2001 CENSUS

eFlow customization

  • customization of Scanning software for Batching the images
  • optimization of Batch Size for Network movement of images and data
  • Customization of workflow management to reduce the workload on Manual Identification station
data capture processing in 2001 census14
DATA CAPTURE & PROCESSING IN 2001 CENSUS

eFlow customization (Contd..)

Development of new Management Information tools for operators and daily production status etc

creation of JUSTICR.mdb to recognize the Indian enumerators writing patterns

Creation and implementation of various static and Dynamic Dictionaries for CAC

slide15

DATA CAPTURE & PROCESSING IN 2001 CENSUS

  • Results Achieved
    • First time 100% data captured, processed and released within five year of Census
    • Auto Recognition Rate 90% & false positive < 2%
    • Considerable financial saving
    • Assimilation of IT skills internally in the organisation.
slide16

DATA CAPTURE & PROCESSING IN 2001 CENSUS

  • Results Achieved (Contd..)
  • Manual Coding was replaced by Computer Assisted Coding
          • Schedule Caste/ Schedule Tribe
          • Languages spoken, Education level
          • Migration particulars, NIC and NCO
  • Indigenous data capture for other projects
          • Economic Census
          • Sample Registration System
          • Verbal Autopsy
data capture processing in 2001 census difficulties experienced
DATA CAPTURE & PROCESSING IN 2001 CENSUSDifficulties Experienced
  • Unable to use color drop-out at scanning stage
  • Difficult to handle bad images during scanning stages.
  • Bad/Back Images due to variation in paper/print quality
  • Over writing/use of whitener, grid line recognize as 1
  • Limitation of recognizing Indian languages affected the through put
data capture processing in 2001 census difficulties experienced contd
DATA CAPTURE & PROCESSING IN 2001 CENSUSDifficulties Experienced (Contd..)

Operational Constraints in Manual Identification

No powerful tools for online Load balancing among various stages of eflow

Lack of concurrent quality check at each stage of eflow

Lack of Auto coding features for textual responses

Even Single image non recognition leads to redo whole batch

slide19

LESSONS LEARNT FOR FUTURE

    • Outsourcing in controlled environment beneficial and cost-effective
    • Good quality of paper
    • ICR friendly Form Design
    • Use of Bar Code for better work flow and Inventory management
    • Good quality printing
slide20

LESSONS LEARNT FOR FUTURE

  • (Contd..)
    • Special training to enumerators for filling the forms
    • For CAC, use knowledge Based dictionaries to increase throughput
    • Use of concurrent quality check procedures on the line of USA and UK
data capture processing technology for 2011 census
DATA CAPTURE & PROCESSINGTechnology for 2011 Census
  • Continuation of ICR Technology
    • International and national experience shows as on date no better substitute for scanning & ICR technology
    • Expertise and competence gained in using ICR technology available in the organization
data capture processing technology for 2011 census contd
DATA CAPTURE & PROCESSINGTechnology for 2011 Census (contd..)

Use more efficient scanners having facility for image enhancement, noise removal, color drop-out, better throughput and on-spot detection and correction (through in-built software) of bad images to be used.

Use of improved version of ICR software with better recognition and built-in enhanced workflow management capability.

Use new features in Auto/Computer Assisted Coding in ICR software

slide23

Thank you.

Visit Our Website at

www.censusindia.gov.in

steps involved in e flow process
Steps involved in e-Flow Process
  • Intelligent Character Recognition (ICR) Technology is used to extract the handwritten/machine printed (typeset) character(s) from the scanned images to generate the computer processable data file. In brief, following steps are involved in using ICR technology.
  • Scanning:- Paper based forms are scanned to create bit map image file
  • File Portal::- It is an Image File Registration module in eflow as an input to next activity.
  • Form Identification:- Automatically identifies the Images of various schedules based on the Empty Form Image (EFI) template created during the designing stage.
steps involved in e flow process25
Steps involved in e-Flow Process
  • Manual Identification: Unidentified forms due to bad images are matched by the operator manually on computer with the help of EFIs .
  • Processing: This module is heart and brain of the ICR technology. It automatically recognize the data (numerals/alpha) from the images with the help of various engines (CGK, AEG,KADMOS,TISICR etc)
  • Tile: This module displays the images of similar digit at one place to identify any wrongly recognized character by system for correction and thus, enhances the accuracy and quality of data.
steps involved in eflow process
STEPS INVOLVED IN eFLOW PROCESS
  • Completion:- Unrecognized or wrongly marked recognized characters in the Tiling will be presented for correction using images displayed simultaneously.
  • Exception:- If any character image is not understood by operator at completion station (module), that will be corrected in Exception station by an officer competent to make decision.
  • Export:- System exports the data generated in above steps to server for further processing like editing/aggregation/tabulation etc.
slide31

EXAMPLE – USE OF WHITENER

Casual writingpattern

slide41

VOTING IN PROCESSING

ICR1

ICR 2

ICR 3

ICR 4

3

3

8

3

Majority = 3

Unanimous = ?

slide44

COMPLETION STATION

[Field mode display]

slide45

EXCEPTION STATION

Form

Field

Date

Original Form

Image Viewer

Exception

Area

slide47
HOUSEHOLD SCHEDULE- SIDE A

Religion

Name of SC/ST

Mother Tongue &

Other languages

Education

slide48

HOUSEHOLD SCHEDULE- SIDE B

NCO

NCO

NIC

Place of Birth &

Last residence

slide49

DATA CAPTURE & PROCESSING

Selection of technology OMR/OCR / ICR in 2001

  • Recognition of hand written descriptive entries in different languages is beyond the capabilities of the known ICR SW and hence a conscious decision was taken to go in for the recognition of Only Numeric Characters, leaving the rest to be handled thru Image enabled computer assisted coding (CAC) . Following key features were introduced in the data capture solution.
  • Parameters for selecting the ICR Software
  • Highest recognition rate and lowest percentage of false positive with customization and assured support & Training
  • Facility of organized workflow in LAN environment with centralized controls with Computer Assisted Coding facility.
  • In built quality enhancement tools to trap the wrongly recognized characters so as to facilitate corrective action.
  • Use of multiple engines with voting algorithm.Ability to incorporate validation rules to trap inconsistent entries/wrong recognition. Learning capabilities of engines.
data capture processing
DATA CAPTURE & PROCESSING
  • Parameters for selecting the scanner
    • Speed to match with our volume
    • Duty cycle (life and production tolerance)
    • Must be duplex scanning
    • Resolution minimum to 200dpi
    • Image enhancement facility like noise removing, skewing, cropping, contrast
    • Hopper size and scanning path(U,J or flat belt)
    • Maintenance & Training services
data capture processing51
DATA CAPTURE & PROCESSING

Selection of Scanner/Hardware/ICR software

  • High level technical committee has evaluated and selected the above items on the basis of demonstrated capabilities of concerned items by various vendors
  • As a result CMC was selected System Integrator, ACER and HP for Computer Hardware with OS Window NT 4.0
  • Kodak Module 7520 Scanner, TIS for ICR software
  • National Informatics Centre has done LAN cabling and inspection of Hardware
  • Up gradation of 15 Data Centers
slide53

SETUP AT D.D.E. CENTRES

15 Locations (State Capitals)

slide56

DATA CAPTURE & PROCESSING

  • Role of the Integrator
    • Supply, Installation and On-site Maintenance of SCANNERS.
    • Supply, Installation of Form Processing Software.
    • Manage LAN and load balancing from one stage to another.
    • Provide Software Core-Team centrally at ORGI HQ.
    • Impart operational training to the staff at each location.
    • Provide Software Personnel at each site
    • Provide scanner operators and carry out Scanning operations
    • Achieve > 90% recognition rate and < 2% false positive