1 / 26

Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

Editing the 2011 Census data with CANCEIS and options considered for 2016. Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014. Overview of CANCEIS

mircea
Download Presentation

Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Editing the 2011 Census data with CANCEIS and options considered for 2016 Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014 Statistics Canada • Statistique Canada

  2. Overview of CANCEIS Recent improvements to CANCEIS and to the 2011 E&I strategy Options considered for 2016 Outline Statistics Canada • Statistique Canada

  3. 1. Overview of CANCEIS (CANadianCensus Edit and Imputation System) Statistics Canada • Statistique Canada

  4. Statistics Canada • Statistique Canada

  5. Domestic Users (other than Census) National Household Survey Canadian Income Survey Survey on Financial Security Survey of Household Spending Longitudinal and International Study of Adults CANCEIS users Statistics Canada • Statistique Canada

  6. Other countries (users, past users, or exploring CANCEIS) Argentina Australia Brazil Israel Italy Japan New Zealand Peru Switzerland UK USA • CSPA initiative (Common Statistical Processing Architecture) • Targeted CANCEIS in a pilot with New Zealand to test portability. Statistics Canada • Statistique Canada

  7. Deterministic imputation Donor imputation Based upon the principles of minimum change preserving distribution of the data Imputation methods available Statistics Canada • Statistique Canada

  8. Developed by Mike Bankier in the 1990’s Apply edits Search for invalid values, missing & inconsistencies Classify records as Passed or Failed • New Imputation methodology (NIM) Statistics Canada • Statistique Canada

  9. Perform donor imputation Step1: establish list of best donors (i.e. that most resemble the failed record) Step2: find best imputation actions for these donors Step3: select an imputation action at random • New Imputation methodology (NIM) (cont’d) Statistics Canada • Statistique Canada

  10. Advantages of this methodology Can deal with non-linear edits Data driven imputation • Offers a practical solution to an operational problem • Allows simplification of edits  use minimum set in relation to the donor chosen  Computationally efficient Statistics Canada • Statistique Canada

  11. Categorical, numerical and alphanumeric variables Large numbers of edits & large data files Portable, flexible & efficient All parameterized  easy to customize Ten different distance functions to find best donors, which cover different types of variables CANCEIS Features Statistics Canada • Statistique Canada

  12. over all paired fields (i) where Vfi is the value of matching variable ifor the failed record; Vpi is the value of matching variable ifor the passed record; wiis the weight of variable i (wi≥0); Di is the distance function chosen for variable i (0≤Di≤1). Distance Measure for Potential Donors Statistics Canada • Statistique Canada

  13. Data Data Dictionary System Parameters Decision Logic Tables Donor Imputation Deterministic Imputation Reports & Logs Imputed Data CANCEIS System Components Inputs CANCEIS Components Outputs Statistics Canada • Statistique Canada

  14. 2. Recent improvements to CANCEIS and to the 2011 E&I strategy Statistics Canada • Statistique Canada

  15. Improvements • For 2011, CANCEIS was rewritten in C# (C-sharp) in a .NET environment • Easier to maintain • Improved efficiency (lower processing time) • Increased stability Statistics Canada • Statistique Canada

  16. Improvements (cont’d) • Multi-threading now possible in donor imputation • Enables processing of multiple failed units at one time • Increases performance and reduces processing time Statistics Canada • Statistique Canada

  17. Improvements (cont’d) • CANCEIS is more user friendly • Before: could handle only .txt files (inputs/outputs) • Now: handling also data dictionaries in Excel and creating summary reports in HTML Statistics Canada • Statistique Canada

  18. Improvements (cont’d) • Increased content and level of detail in the logs • Facilitate troubleshooting • Facilitate validating desired strategy for each module Statistics Canada • Statistique Canada

  19. New features added • Additional flexibility in specifying imputation parameters • New parameter to specify that the staged search will not stop until an excellent donor is found • Continue to search if the target quality is not reached Statistics Canada • Statistique Canada

  20. Modification to the 2011 E&I strategy • Group these five processes • Place of birth of parents • Immigration status • Aboriginal status • Citizenship • Visible minorities into one ethnocultural process Statistics Canada • Statistique Canada

  21. Modification to the 2011 E&I strategy (cont’d) • Goals: • Increase data coherence between processes by using one single donor to impute all variables • Reduce manual fixes after E&I • Challenge: manage lots of edits & data Statistics Canada • Statistique Canada

  22. 3. Options considered for 2016 Statistics Canada • Statistique Canada

  23. Planning E&I strategy for 2016 • Evaluating the use of administrative data as alternative source of data • Exploring if the language processes could be grouped (mother tongue, home language, official language) • Exploring if steps within processes could be grouped • Exploring if processes could be run in parallel Goals improve quality, reduce processing time Statistics Canada • Statistique Canada

  24. Continue improving CANCEIS to serve future requirements of the Census • Research and development ongoing • Done by programmers and methodologists • CANCEIS v5.2 to be released by Dec.2014 • Allowing DLTs and System Parameters in Excel • Revisited contents of Inputs & Outputs • Standardized naming convention • Improvements to default values of parameters Statistics Canada • Statistique Canada

  25. Will offer the CANVERT conversion tool • Ensures smooth transition from v5.1 to v5.2 • Updated documentation will be provided • Basic User Guide (with two simple examples and basic features) • Comprehensive User Guide (with more examples, and all features) Statistics Canada • Statistique Canada

  26. For more information, Pour plus d'information, please contact: veuillez contacter : Lyne Guertin (1-613-951-4543) lyne.guertin@statcan.gc.ca Thank you for your attention! Merci!

More Related