taking advantage of ddi 3 0 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Taking Advantage of DDI 3.0 PowerPoint Presentation
Download Presentation
Taking Advantage of DDI 3.0

Loading in 2 Seconds...

play fullscreen
1 / 58

Taking Advantage of DDI 3.0 - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Taking Advantage of DDI 3.0. IASSIST 2007: Workshop Montreal. Presenters. Wendy Thomas Minnesota Population Center Arofan Gregory Open Data Foundation Joachim Wackerow GESIS-ZUMA. Afternoon’s Schedule 1:30 – 5:00. 1:30 Introduction and Overview 2:00 Maintainable objects

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Taking Advantage of DDI 3.0' - ama


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
taking advantage of ddi 3 0

Taking Advantage of DDI 3.0

IASSIST 2007: Workshop

Montreal

presenters
Presenters
  • Wendy Thomas
    • Minnesota Population Center
  • Arofan Gregory
    • Open Data Foundation
  • Joachim Wackerow
    • GESIS-ZUMA
afternoon s schedule 1 30 5 00
Afternoon’s Schedule1:30 – 5:00
  • 1:30 Introduction and Overview
  • 2:00 Maintainable objects
  • 2:30 Questions to Variables to Data
  • 3:00 Break
  • 3:30 Creating groups
  • 4:00 URNs and Versioning
  • 4:30 What is it good for?
ground rules
Ground Rules
  • Time periods are general, we’ll adjust to address your interests
  • Ask questions
  • There’s a lot to cover so we may suggest continuing a discussion one-on-one during break or after the workshop
  • Materials from the workshop will be posted on the DDI site
basic element types
Basic Element Types

Differences from 2.1

--Every element is NOT identifiable

--Many individual elements or

complex elements may be

versioned

--A number of complex elements

can be separately maintained

3 0 modules and schemas one is not necessarily the other
3.0 Modules and Schemas (one is not necessarily the other)
  • Modules
    • Reflect closely related sets of information similar to the sections of DDI DTD
    • Modules can be held as separate XML instances and be included in a large instance by either inclusion or reference
    • All modules are maintainable, but not all maintainables are modules
3 0 modules and schemas one is not necessarily the other1
3.0 Modules and Schemas (one is not necessarily the other)
  • Schemas
    • Each .xsd file is a schema
    • Some schemas are modules
    • Some schemas are substitution sets
    • Some schemas simply contain elements that are used by multiple schemas or may require more frequent updates
    • Some schema are “borrowed”
schemas
archive

comparative

conceptualcomponent

datacollection

dataset

dcelements

DDIprofile

ddi-xhtml11

ddi-xhtml11-model-1

ddi-xhtml11-modules-1

group

inline_ncube_recordlayout

instance

logicalproduct

ncube_recordlayout

organization

physicaldataproduct

physicalinstance

reusable

simpledc20021212

studyunit

tabular_ncube_recordlayout

xml

SCHEMAS
schemas modules
archive

comparative

conceptualcomponent

datacollection

dataset

dcelements

DDIprofile

ddi-xhtml11

ddi-xhtml11-model-1

ddi-xhtml11-modules-1

group

inline_ncube_recordlayout

instance

logicalproduct

ncube_recordlayout

organization

physicaldataproduct

physicalinstance

reusable

simpledc20021212

studyunit

tabular_ncube_recordlayout

xml

Schemas: MODULES
schemas substitutions
archive

comparative

conceptualcomponent

datacollection

dataset

dcelements

DDIprofile

ddi-xhtml11

ddi-xhtml11-model-1

ddi-xhtml11-modules-1

group

inline_ncube_recordlayout

instance

logicalproduct

ncube_recordlayout

organization

physicaldataproduct

physicalinstance

reusable

simpledc20021212

studyunit

tabular_ncube_recordlayout

xml

Schemas: SUBSTITUTIONS
schemas reuse
archive

comparative

conceptualcomponent

datacollection

dataset

dcelements

DDIprofile

ddi-xhtml11

ddi-xhtml11-model-1

ddi-xhtml11-modules-1

group

inline_ncube_recordlayout

instance

logicalproduct

ncube_recordlayout

organization

physicaldataproduct

physicalinstance

reusable

simpledc20021212

studyunit

tabular_ncube_recordlayout

xml

Schemas: REUSE
schemas borrowed
archive

comparative

conceptualcomponent

datacollection

dataset

dcelements

DDIprofile

ddi-xhtml11

ddi-xhtml11-model-1

ddi-xhtml11-modules-1

group

inline_ncube_recordlayout

instance

logicalproduct

ncube_recordlayout

organization

physicaldataproduct

physicalinstance

reusable

simpledc20021212

studyunit

tabular_ncube_recordlayout

xml

Schemas: BORROWED
schemas maintainable schemas
archive

comparative

conceptualcomponent

datacollection

dataset

dcelements

DDIprofile

ddi-xhtml11

ddi-xhtml11-model-1

ddi-xhtml11-modules-1

group

inline_ncube_recordlayout

instance

logicalproduct

ncube_recordlayout

organization

physicaldataproduct

physicalinstance

reusable

simpledc20021212

studyunit

tabular_ncube_recordlayout

xml

Schemas: Maintainable Schemas
2 1 sections to 3 0 schema
1.0 Document Description

Citation

Guide

Document Status

Source Document

2.0 Study Description

Citation

Study Information

Methodology

Data Access

Other Material

3.0 File Description

File Text

Location Map

4.0 Data Description

5.0 Other Material

Instance

Archive

Study Unit

Conceptual Components

Abstract / Purpose / Coverage

DataCollection

Methodology

Collection Event

Question Scheme

Instrument

Processing Event

LogicalProduct

Data Relationships

Category Scheme

Code Scheme

Variable Scheme

NCube

PhysicalDataProduct

Gross Record Structure

Base Record Structure

PhysicalInstance

File Identification

Statistics

2.1 sections to 3.0 schema
why the big change
Why the big change?
  • Documentation focused on the Codebook remains a value-added commodity
    • It becomes the de-facto responsibility of the archive rather than the producer
  • Producers capture information that help them do their work
    • The Life Cycle model focuses on the flow of data through a system [production oriented]
  • Codebooks should be an output from the documentation process, not the sole commodity
tighter control more required items
Tighter control more required items
  • In order to support processing we needed tighter control on element and attribute contents
    • Schema provide tighter element level control
    • Profiles allow for customization of coverage
  • A large set of required elements provides a more consistent base for programming
reuse and replacement
Reuse and Replacement
  • Reuse of elements means that similar actions are handled in similar ways
    • Questions and Variables use the same category and coding schemes
    • Instrument flow logic is found in comparison and coding instructions for variables
    • Identification and referencing are handled in a consistent manner
  • Replacement by substitution allows for addressing changing technologies without making major changes in existing schemas
    • Physical data structures
    • Data types (microdata, aggregate data, future coverage types?)
maintainable objects
Maintainable Objects
  • Publishing persistent parts
    • Concept lists
    • Question Banks
    • Coding schemes
    • Comparison mapping
    • Study description
support for registries
Support for Registries
  • A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.
  • Metadata registries are used whenever data must be used consistently within an organization or group of organizations.

http://en.wikipedia.org/wiki/Metadata_registry

examples of registry users
Examples of Registry Users
  • Organizations that transmit data using structures such as XML, Web Services or EDI
  • Organizations that need consistent definitions of data across time, between organizations or between processes. For example when an organization builds a data warehouse
  • Organizations that are attempting to break down "silos" of information captured within applications or proprietary file formats

http://en.wikipedia.org/wiki/Metadata_registry

capturing and reusing metadata
Capturing and Reusing Metadata
  • Whether captured at inception or created after-the-fact some sections must be completed before other sections can be made
  • The capture of metadata at point of inception in a non-proprietary structure that can be transferred out-of and into process software provides incentive for metadata creation during the life cycle of the data
metadata flow
Metadata flow
  • DDI is built on the life cycle of the data and some information naturally occurs earlier than other information
  • Reuse of and reference to certain types of information such as universe, concepts, categories, and coding schemes prescribe a creation order
slide23

STEP 4

STEP 5

STEP 1

STEP 2

Universe

Scheme

Category

Scheme

Variable

Scheme

Record

Structure

Concept

Scheme

Data

Relationships

Coding

Scheme

Remaining Physical Data Product Items

Organization /

Individual

NCube

STEP 3 optional

STEP 7

StudyUnit

Citation

Question

Scheme

Remaining Logical Product items

Physical Instance

Instrument

StudyUnit

Coverage

STEP 8

Processing

Event

(coding)

Archive / Group / etc.

questions to variables
Questions to Variables

Question

Development

Software

Identifying

Universe and

Concepts

Building or

Importing

Question Text

and Response

Domains

Instrument

Development

Software CAI

Organizing

questions and

flow logic

Capturing raw

response data

and process

data

Data Processing

Software

Data cleaning

and verification

Recoding and/or

deriving new

data elements

using existing or

new categories

or coding schemes

DDI

DDI

mapping relationships preparing data for the user
Mapping Relationships[Preparing data for the user]
  • Information previously provided in Guide
  • Now found in the logical product under Data Relationships
  • Physical expression of the linking relationships is found in the physical data product
what it tells you
What it tells you
  • Record Type
    • How many record types you have
    • How you can tell what record type it is
    • Does it provide support for a “multi-part” record
  • How to identify a “unique” record within a record type
  • How do you link one record type to another
multiple parts complex id
Multiple Parts / Complex ID

SF1 050 000101 27053 DATA

SF1 050 000102 27053 DATA

SF1 160 002201 27 48000 DATA

SF1 160 002202 27 48000 DATA

multiple parts complex id1
Multiple Parts / Complex ID

SF1 050 000101 27053 DATA

SF1 050 000102 27053 DATA

SF1 160 002201 27 48000 DATA

SF1 160 002202 27 48000 DATA

multiple parts complex id2
Multiple Parts / Complex ID

SF1 050 000101 27053 DATA

SF1 050 000102 27053 DATA

SF1 160 002201 27 48000 DATA

SF1 160 002202 27 48000 DATA

UNIQUE Within File = LOGRECNO

IF SUMLEV = 050 then STATE and COUNTY

IF SUMLEV = 160 then STATE and PLACE

grouping
Grouping
  • Grouping allows two or more studies to be grouped together
  • Grouping can be done
    • by design
    • after the fact
    • virtually
grouping by design
Grouping by design
  • Uses inheritance
    • to reuse rather than rewrite
    • handle a form of multiple inheritance trees first by hierarchy and then by reference
  • Uses comparison to describe changes that take place in a series over time
grouping after the fact
Grouping after the fact
  • Uses comparison to describe equivalent or similar objects and how they compare
    • Questions
    • Concepts
    • Variables
    • Category Schemes
    • Coding Schemes
      • including capturing process for recodes
slide34

PERSON LEVEL INFORMATION

All Waves 1997-2003

All waves inherit person level information from the group

Comparable by design

slide35

Waves 1997-2003

Satisfaction with life,

School Degree

All waves contain common topical

data on Satisfaction with life and School Degree

slide36

Currency Fields CHANGE

between 2001 and 2002

Wave 2002-2003

Currency Euro

Wave 1997-2001

Currency DM

slide37

Question and Data expanded

between 1998 and 1999

Wave 1997-1998

Size of Company

Wave 1999-2003

Size of Company,

Concerns about Euro

slide38

This set of questions is included

PERIODICALLY

Waves 1997, 2000, 2001

Computer Usage

slide39

DDI3: GROUPING and COMPARISON

  • Example
    • Standard Eurobarometer 1970 ff.
    • Occupation of respondent
    • Changing “wave standard” category schemes
    • Variations across countries - translation, question/variable structure
slide40

DDI3: GROUPING and COMPARISON

TREND: OCCUPATION

R: WHAT IS YOUR OCCUPATION?

  • Example
    • Harmonized systematic 3-digit coding
    • Distinguishing major categories and sub-categories
    • Avoiding artificial changes in occupational structures over time

Source: Jan W. van Deth: Using Published Survey Data, in: Harkness/Van de Vijver, Mohler: Cross-Cultural Survey Methods

slide41

DDI3: GROUPING and COMPARISON

TREND: OCCUPATION VARIABLE NAME: OCCUP

R: WHAT IS YOUR OCCUPATION?

110. FARMER / FISHERMAN (SKIPPERS) <until EB29>

111. FARMER

112. FISHERMAN

120. <SELF EMPLOYED> PROFESSIONAL (LAWYER, MEDICAL PRACTITIONER, ACCOUNTANT, ARCHITECT, ...)

130. OWNER OF A SHOP, CRAFTSMEN, BUSINESS PROPIETOR

131. OWNER OF A SHOP, CRAFTSMEN, OTHER SELF EMPLOYED PERSON

132. BUSINESS PROPRIETORS, OWNER (FULL OR PARTNER) OF A COMPANY

210. EMPLOYED PROFESSIONAL (LAWYER, MEDICAL PRACTITIONER, ACCOUNTANT, ARCHITECT, ...)

220. EXECUTIVE, TOP MANAGEMENT, DIRECTOR <starting with EB30:> GENERAL MANAGEMENT <starting with EB37:> GENERAL MANAGEMENT, DIRECTOR OR TOP MANAGEMENT

230. MIDDLE MANAGEMENT, OTHER MANAGEMENT (DEPARTMENT HEAD, JUNIOR MANAGER, TEACHER, TECHNICIAN)

310. EMPLOYED POSITION, WORKING MAINLY AT A DESK

311. WHITE COLLAR – OFFICE WORKER <until EB29>

312. OTHER OFFICE EMPLOYEES <EB30 to EB36>

320. NON-OFFICE EMPLOYEES, NON MANUAL WORKERS (SERVICE SECTOR, E.G. SHOP ASSISTANT ETC.)

321. EMPLOYED POSITION, NOT AT A DESK BUT TRAVELLING (SALESMAN, DRIVER, ...)

...

540. UNEMPLOYED <starting wirh EB30:> TEMPORARILY NOT WORKING, UNEMPLOYED

998. DK / NA

999. INAP

  • Resource
slide42

DDI3: GROUPING and COMPARISON

  • „Internal“ standard:
  • Mannheim Trend File
  • „External“ standards:
  • ESOMAR
  • EB-CH / OSF
  • ISCO-88

Eurobarometer 3 - 29Wave Standard

[Q…. ] OCCUPATION OF SELF:

SELF EMPLOYED

01. FARMERS, FISHERMEN (SKIPPERS)

02. PROFESSIONAL - LAWYERS,

ACCOUNTANTS, ETC.

03. BUSINESS - OWNERS OF SHOPS,

CRAFTSMEN,PROPRIETORS

EMPLOYED

04. MANUAL WORKER

05. WHITE COLLAR - OFFICE WORKER

06. EXECUTIVE, TOP MANAGEMENT,

DIRECTOR

NOT EMPLOYED

07. RETIRED

08. HOUSEWIFE,

NOT OTHERWISE EMPLOYED

09. STUDENT, MILITARY SERVICE

10. UNEMPLOYED

00. DK/NA

DE - Eurobarometer ? - 17

[ F… ] Sind Sie persönlich berufstätig?

Selbständige

1. Landwirte

2. Freie Berufe (z.B. Arzt, Anwalt)

3. Kleine, mittlere, größere Selbständige

Berufstätige

4. Arbeiter / Facharbeiter

5. Angestellte / Beamte

6. Ltd. Angestellte / ltd. Beamte

Nicht berufstätige

7. Rentner / Pensionär

8. Hausfrauen (nicht andersweitig beschäftigt)

9. Schüler, Studenten, Lehrling

0. Arbeitslos

DE - Eurobarometer 18 - 22

[F. ] Sind Sie persönlich berufstätig?

1. Voll berufstätig (einschl. vorübergehend arbeitslos)

2. Teilweise berufstätig (einschl. vorübergehend arbeitslos)

3. Rentner, Pensionär (früher berufstätig)

4. Rentner, Pensionär (früher nicht berufstätig)

5. (In Ausbildung) Lehrling

6. (In Ausbildung) Schüler, Student

7. Nicht berufstätig, aber früher berufstätig gewesen

8. Noch nie berufstätig gewesen

DE - EB 18 - 23

[F. ] Welchen Beruf üben Sie zur Zeit aus, bzw. haben Sie zuletzt ausgeübt?

11. Einfache Angestellte

12. Mittlere Angestellte

13. Qualifizierte Angestellte

14. Leitende Angestellte

15. Ungelernte Arbeiter

16. Angelernte Arbeiter

17. Einfache Facharbeiter

18. Qualifizierte Facharveiter

21. Kleinere Selbständige

22. Mittlere Selbständige

23. Größere Selbständige

24. Freie Berufe (z.B. Arzt, Anwalt)

25. Beamte einfacher Dienst

26. Beamte mittlerer Dienst

27. Beamte gehobener Dienst

28. Beamte höherer Dienst

31. Selbständige Landwirte - kleine (unter 5 ha)

32. Selbständige Landwirte - mittlere (5- unter 20 ha)

32. Selbständige Landwirte - große (20 ha +)

slide43

DDI3: GROUPING and COMPARISON

Group standards inheritance

EB OCCUPATION

Trend Standard

DataCollection

LogicalProduct

Resource standards comparison to instances

Subgroup: overwriting additions inheritance

Subgroup:no overwriting comparison to instances

DE 18-22

DE 3-16

DE 17

DE 23-29

DE 37 ff.

DE 30-36

DE 24-29

DE 18-23

Study Unit: overwriting additions

EB 30

DataCollection

LogicalProduct

EB …

DataCollection

LogicalProduct

EB 36

DataCollection

LogicalProduct

EBCH 2000-2003

OFS/ISCO-88

versioning and maintenance
Versioning and Maintenance
  • There are three classes of objects:
    • Identifiable (has ID)
    • Versionable (has version and ID)
    • Maintainable (has agency, version, and ID)
  • Very often, identifiable items such as Codes and Variables are maintained in parent schemes
rationale
Rationale
  • Because several organizations are involved in the creation of a set of metadata throughout the lifecycle flow:
    • Rules for maintenance, versioning, and identification must be universal
    • Reference to other organization’s metadata is necessary for re-use – and very common
maintenance rules
Maintenance Rules
  • A maintenance agency is identified by its domain name (as for it’s website and e-mail)
  • Maintenance agencies own the objects they maintain
    • Only they are allowed to change or version the objects
  • Other organizations may reference external items in their own schemes, but may not change those items
    • You can make a copy which you maintain, but once you do that, you own it!
versioning rules
Versioning Rules
  • If an object changes in any way, its version changes
  • This will change the version of any containing maintainable object
  • Typically, objects grow and are versioned as they move through the lifecycle
  • Versions inherit their agency from the maintainable scheme they live in
identifiable rules
Identifiable Rules
  • Identifiers are assigned to each identifiable object, and are unique within their maintained parent scheme
  • Identifiable objects inherit their version from their containing versionable parent (if any)
  • Identifiable objects inherit their maintaining agency from the maintainable object they live in
referencing
Referencing
  • When referencing an object, you must provide:
    • The maintenance agency
    • The identifier
    • The version
  • Often, these are inherited from a maintained scheme
    • This is part of their identification
identification
Identification
  • Identification can be by URN or a series of fields
  • The fields make up the parts of the URN and can be used to compose it
  • A number of fields can inherit information from a maintainable parent
parts of the identification series
Identifiable Element

Identification:

ID

Identifying Agency

Version

Version Date

Version Responsibility

Version Rationale

Variable

Identification:

V1

pop.umn.edu

1.1 [default is 1.0]

2007-02-10

Wendy Thomas

Spelling correction

Parts of the Identification Series
the urn
The URN

urn:ddi:3_0:VariableScheme:pop.umn.edu:

VScheme_2:1_1:Variable:V1:1_1

  • Declares that its a ddi version 3.0 element
  • Tells the type of maintainable object being referenced
  • Gives the identifying agency of the scheme
  • Tells the type of object and its unique ID
    • Note that this includes both a maintainable ID and element ID as uniqueness must be maintained within a maintainable object rather than within the agency
ddi what is it good for
DDI What Is It Good For?
  • There are some obvious differences between DDI 2.* and DDI 3.*
    • Ability to capture comparative information
    • Ability to re-use and share metadata
    • Ability to mark up data in XML
    • Greater ability to facilitate data discovery and relationships
    • It is designed to capture lifecycle information as it occurs, and in a way that is useful during production
    • It is machine-actionable – not just documentary
  • All of this comes with added complexity
  • It also allows for greater interoperable support between organizations
  • Here are a few examples…
scenario 1 upstream metadata capture
Scenario 1: Upstream Metadata Capture
  • Because there is support throughout the lifecycle, you can capture the metadata as it occurs
  • It is re-useable throughout the lifecycle
    • It is versionable as it is modified across the lifecycle
  • It supports production at each stage of the lifecycle
    • It moves into and out of the software tools used at each stage
scenario 2 reuse of metadata
Scenario 2: Reuse of Metadata
  • You can reuse many types of metadata, benefiting from the work of others
    • Concepts
    • Variables
    • Categories and codes
    • Geography
    • Questions
  • Promotes interoperability and standardization across organizations
  • Can capture (and re-use) common cross-walks
scenario 3 virtual data
Scenario 3: Virtual Data
  • When researchers use data, they often combine variables from several sources
    • This can be viewed as a “virtual” data set
    • The re-coding and harmonization process can be captured as useful metadata
    • The researcher’s data set can be re-created from this metadata
    • Comparability of data from several sources can be expressed
scenario 4 mining the archive
Scenario 4: Mining the Archive
  • With metadata about relationships and structural similarities
    • You can automatically identify potentially comparable data sets
    • You can navigate the archive’s contents at a high level
    • You have much better detail at a low level across divergent data sets