1 / 27

a long-standing cooperation to serve research

a long-standing cooperation to serve research. Peter Doorn (DANS) Ruben Dood (CBS). REGIONAL WORKSHOP 16th & 17th October 2014, Athens. Contents. Past: Long-standing collaboration dating back to 1960s Steinmetz Archives acquires punch-card collections

Download Presentation

a long-standing cooperation to serve research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. a long-standing cooperation to serve research Peter Doorn (DANS) Ruben Dood (CBS) REGIONAL WORKSHOP 16th & 17th October 2014, Athens

  2. Contents Past: • Long-standing collaboration dating back to 1960s • SteinmetzArchivesacquires punch-card collections • Scientific Statistical Agency founded in 1994 • Digitization of historicalcensuses Present: • Covenantgoverning the collaboration • De-anonymized microdata sets of samples archived at DANS • Remote & On-Site Access: separate contractswith CBS Future: • Linked data, big data, more data…

  3. One of my first publications Problems of comparingandaccessing the Housing Survey and the Labour Force Sample Survey of the CBS (1985) Tape of the Labour Force Sample Survey 1981

  4. Digitization of historical censuses, 1996 - … CBS-archive & library: analogdata warehouse Former CBS building in Voorburg

  5. Joint websites www.volkstellingen.nl 2004 1999 2015

  6. Archiving and digital restoration of the 1960 census Program code by Rinus Deurloo mergedwith data records: 1115100421115 6302120001000995581111405057126086200 B"("N3=")"5ZD,10B 1760 1115100421110 1306363301000075-81111718035817732405 SC2+NSC3); 1770 1115100421116 1305352202000900521111205041728284204 ",/")"); 1780 1115100421119 4303430001000930521111203038829276500 B"("N3=")"5ZD,10B 1790 Punch errors or bitrot? 121586010 3855413012060 3 52701322981010 1060 12158W000 3755113010010 2 52801322981010 0061 121586001 3406713012050 00 0152701322981010 1860 Missing and double records: • 11 milionanonymizedinhabitants on 11 million punch cards • 2000 punch cards per box, 5500 boxes • Stored in twolocations in the late 60s, early seventies • Punch cards read in 1973, stored on tape • Furtherinventory of the census data in 1982 • Analysedandcleaned in 1994 • Re-analysis and digital restoration in 2004

  7. Statistics Netherlands process (government) agencies datasets and administrations Companies survey and administations Individuals administrations Internetsurvey Callcenter Interview CBS=Statistics Netherlands (SN)

  8. Linking ‘anonymous’ datasets Person Company Household Location RIN

  9. Connectingwithpreviousyears 2010 RIN RIN RIN RIN 2011 2012 2013

  10. Kinds of data Secure Use Files • For SN employees only. Data at individual level. Datasets canbelinked. • Scientific research underverystrictconditions. ScientificUse Files • Also: Microdata Under Contract, ControlledCirculation Files. • Data at individual level, but not as detailed as secure use • Datasets cannotbelinked Public Use Files • Canbefreelydistributed.

  11. Risk management The risk is that a single person, household or company canberecognized in the dataset. With Secure Use Files this is unavoidable. Henceverystrictconditions. WithScientificUse Files this is stillpossible, therefore we provide extra security measurements.

  12. Costsconcerning types of data

  13. Costsconcerning types of data Microdataservices

  14. MicroDataServices (MiDaS) Reserved for: • Institutions mentioned in the Statistics Netherlands Act: universities, planning agencies and Eurostat. • Institutions/projects admitted by the Central Committee for Statistics. • no direct administrativeauthority, • explicit research aim, • publishfor public useand • a goodreputation

  15. Conditionsfor access tot secure use files • Access is limited to the datasets that are required for the research at hand. • Proof of employment for each researcher. • Confidentiality declaration by each researcher. • IT-security: Citrix • Output screening by SN. • Additional bio-metric identification

  16. Biometric identification

  17. Current use • Over 2000 well-documented datasets. • Over 100 customer organisations (mainly universities, planning agencies and statistical research agencies). • Over 500 registered researchers. • 8 On Site, 100 Remote Access work stations. Over 15 of the RA-sites are not in the Netherlands. • Over 10 new projects per month. • Over 280 active projects at this moment.

  18. ScientificUse Files Can be part of a contract between DANS and Statistics Netherlands. Can also be funded by a third party. Are published in DANS EASY format and published on the DANS website.

  19. Present: Covenant CBS-DANS • Period: 2011-2015 • CBS makes available (micro-) data for scientific users entitled or authorized by CBS, according to a yearly agreed list of data files. The files are delivered with sufficient documentation and a code book • DANS archives these files in a durable way and makes them available to those entitled or authorized by CBS • DANS pays CBS a yearly sum (116 Keuro since 2013) • CBS en DANS jointly propose better data access for research to ministries

  20. CBS Datasets in DANS EASY archive of which 87 protected microdata files

  21. Joint publications [Including related projects: Historical GIS, HASH, HDNG]

  22. Joint seminars & conferences

  23. DANS at CBS Volleyball tournament

  24. Present & future: Linked Data Challenge: link information fromtablesacross time andtoinformation in the outsideworld • 3 Types of Censuses [Population/Occupation/Housing] • 17 census years • 2288 census tables • 33283 annotations • 17 millioncharacters Linked Open Data cloud Linked data fromtables (RDF) Harmonization Harmonization Tables T1 T2 T3

  25. 110,585,567 total triples • 10,272,862 marked cells triples • 389,132 hierarchical row headers • 7,960,911 data cells • 61,110 column headers • 3,609 row properties • 2,150 titles • 1,581,546 row headers • 274,404 metadata cells original paper table digital spreadsheet Census 1869 Province of Noord-Brabant http://www.cedar-project.nl/ http://lod.cedar-project.nl

  26. Additional DANS wishesforfutureandfurther cooperation • More data (e.g. on health, economy, etc.) • More data • More data • More user groupsentitled/authorizedto access CBS data (and data produced in collaborationwith CBS) • Use of CBS register data for sample drawingfor research projects • Joint project proposal on e.g. linked data and big data • Archiving of historical CBS datasets by DANS • CBS certification (as trustworthy digital repository)

  27. Thank you for your attention www.dans.knaw.nl www.narcis.nl www.cbs.nl peter.doorn@dans.knaw.nl Twitter: @pkdoorn @DANSKNAW

More Related