Using xml mapper and xmlmap to read data documented by data documentation initiative ddi files
Download
1 / 33

Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI) Files - PowerPoint PPT Presentation


Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI) Files . Larry Hoyle Policy Research Institute University of Kansas. Read2825.sas. 2825.xml. DDI.map. Work.ICPSR2825Household. Work.ICPSR2825Family. Work.ICPSR2825Person. Da2825.txt. Overview.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI) Files

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using xml mapper and xmlmap to read data documented by data documentation initiative ddi files l.jpg
Using XML Mapper and XMLMAP to Read Data Documented by Data Documentation Initiative (DDI) Files

Larry Hoyle

Policy Research Institute

University of Kansas


Overview l.jpg

Read2825.sas Documentation Initiative (DDI) Files

2825.xml

DDI.map

Work.ICPSR2825Household

Work.ICPSR2825Family

Work.ICPSR2825Person

Da2825.txt

Overview

  • A SAS program reads an XML metadata file and writes a SAS program to read the raw data file described by the metadata file.

makeReader.sas

Read2825.sas


Slide3 l.jpg
DDI Documentation Initiative (DDI) Files

  • “an international effort to establish a standard for technical documentation describing social science data” - http://www.icpsr.umich.edu/DDI/index.html


Ddi files l.jpg
DDI Files Documentation Initiative (DDI) Files

  • XML

    • DTD - http://www.icpsr.umich.edu/DDI/users/dtd/index.html

  • Metadata about:

    • The DDI file itself

    • The study that collected the data

    • The data file

    • Variables within the data file

    • Other Material


The minimal ddi file l.jpg
The Minimal DDI File Documentation Initiative (DDI) Files

<?xml version="1.0"?>

<codeBook>

<stdyDscr>

<citation>

<titlStmt>

<titl>Howdy World: Valid but Useless Metadata</titl>

</titlStmt>

</citation>

</stdyDscr>

</codeBook>


Real example icpsr 6084 raw data file l.jpg
Real Example: ICPSR 6084 Raw Data File Documentation Initiative (DDI) Files

100001 161132146211115555299 9991199 99911219

200001 49992

30000102534 000325222641942 3834101202

100002 12213112421111212112222221121 2122 12

200002 12221

30000202574 000756221622052 4261103202


Icpsr 6084 about the study l.jpg
ICPSR 6084 – About The Study Documentation Initiative (DDI) Files

<citation>

<titlStmt>

<titl>CBS News Monthly Poll #2, August 1992</titl>


Icpsr 6084 about the file l.jpg
ICPSR 6084 – About The File Documentation Initiative (DDI) Files

<dimensns> <caseQnty>1,546</caseQnty>

<varQnty>70</varQnty> <logRecL>80</logRecL> <recPrCas>3</recPrCas>


Icpsr 6084 reading the file with sas l.jpg
ICPSR 6084 – Reading The File With SAS Documentation Initiative (DDI) Files

<dimensns>

<caseQnty>1,546</caseQnty>

<varQnty>70</varQnty>

<logRecL>80</logRecL>

<recPrCas>3</recPrCas>

<recNumTot>4,638</recNumTot>

</dimensns>

infile 'C:\DDRIVE\data\icpsr\data\6084\da6084.txt' LRECL=80 PAD;


More from icpsr 6084 first variable l.jpg
More From ICPSR 6084 – first variable Documentation Initiative (DDI) Files


More from icpsr 6084 reading the first variable l.jpg

input Documentation Initiative (DDI) Files

#1 cardno 1-1

More From ICPSR 6084 – Reading the first variable


More from icpsr 6084 another variable l.jpg
More From ICPSR 6084 – another variable Documentation Initiative (DDI) Files

#3 respno 2-6


The tasks l.jpg
The Tasks Documentation Initiative (DDI) Files

  • Pull the necessary information from a hierarchical xml file into SAS as tables

    • Use XML libname engine with an XMLMAP file

  • Use that information in SAS to read the raw data file


Making the xmlmap file sas xml mapper l.jpg
Making the XMLMAP File – SAS XML Mapper Documentation Initiative (DDI) Files


Defining tables what defines rows l.jpg
Defining Tables –What Defines Rows Documentation Initiative (DDI) Files

Drag the element that defines rows to the root of the XMLMAP structure


Defining tables row defined l.jpg
Defining Tables –Row Defined Documentation Initiative (DDI) Files


Defining tables what defines columns l.jpg
Defining Tables – What Defines Columns Documentation Initiative (DDI) Files

Drag an element that defines a column to the root of the table


Defining tables column defined l.jpg
Defining Tables – Column Defined Documentation Initiative (DDI) Files


Viewing the xmlmap file l.jpg
Viewing The XMLMap File Documentation Initiative (DDI) Files


Viewing the xmlmap file row path l.jpg
Viewing The XMLMap File – Row Path Documentation Initiative (DDI) Files


Viewing the xmlmap file column path l.jpg
Viewing The XMLMap File - Column Path Documentation Initiative (DDI) Files


Viewing sample sas code l.jpg
Viewing Sample SAS Code Documentation Initiative (DDI) Files


Previewing the table l.jpg
Previewing the Table Documentation Initiative (DDI) Files


Xmlmapper limitations l.jpg
XMLMapper Limitations Documentation Initiative (DDI) Files

  • Not every XML file will have all the elements of any possible XML file of that type.

    • Use XML Schema instead of XML file

  • An XML Schema file may not work

    • XML file type defined by DTD

    • XML Schema too complex for XML Mapper


What then l.jpg
What then Documentation Initiative (DDI) Files

  • You can use XMLMapper to start and then hand edit the XML MAP file.


Lots of tables from ddi mostly for comments l.jpg

DATADSCR_VAR Documentation Initiative (DDI) Files

DATADSCR_VARGRP

DATADSCR_VAR_CATGRY

DATADSCR_VAR_INVALRNG

DATADSCR_VAR_INVALRNG_ITEM

DATADSCR_VAR_INVALRNG_RANGE

DATADSCR_VAR_VALRNG_ITEM

DATADSCR_VAR_VALRNG_RANGE

DOCDSCR_CITATION__AUTHENTY

DOCDSCR_CITATION__COPYRIGHT

DOCDSCR_CITATION__IDNO

DOCDSCR_CITATION__OTHID

DOCDSCR_CITATION__PRODDATE

DOCDSCR_CITATION__PRODUCER

DOCDSCR_CITATION__TITL

FILEDSCR_FILETXT

FILEDSCR_FILETXT_RECGRP

STDYDSCR_CITATION_BIBLCIT

STDYDSCR_CITATION_TITLSTMT

STDYDSCR_CITATION_VERSTMT

STDYDSCR_CITATION__AUTHENTY

STDYDSCR_CITATION__COPYRIGHT

STDYDSCR_CITATION__DISTRBTR

STDYDSCR_CITATION__FUNDAG

STDYDSCR_CITATION__GRANTNO

STDYDSCR_CITATION__PRODDATE

STDYDSCR_CITATION__PRODUCER

STDYDSCR_CITATION__SOFTWARE

STDYDSCR_METHOD__COLLMODE

STDYDSCR_METHOD__DATACOLLECTOR

STDYDSCR_METHOD__FREQUENC

STDYDSCR_METHOD__RESINSTRU

STDYDSCR_METHOD__SAMPPROC

STDYDSCR_METHOD__TIMEMETH

STDYDSCR_METHOD__WEIGHT

STDYDSCR_STDYINFO_ABSTRACT

STDYDSCR_STDYINFO__ANLYUNIT

STDYDSCR_STDYINFO__COLLDATE

STDYDSCR_STDYINFO__DATAKIND

STDYDSCR_STDYINFO__GEOGCOVER

STDYDSCR_STDYINFO__KEYWORD

STDYDSCR_STDYINFO__NATION

STDYDSCR_STDYINFO__TIMEPRD

STDYDSCR_STDYINFO__TOPCCLAS

STDYDSCR_STDYINFO__UNIVERSE

Lots of Tables From DDI Mostly for Comments


Write a sas program metadata comment l.jpg

data _null_; Documentation Initiative (DDI) Files

file reader lrecl=1024 ;

length vEdited $ 2000;

set DDIfile.stdyDscr_citation_titlStmt;

if _n_=1 then put '/*' /

' SAS program to read ' agency ' ' IDNo ;

stdyDscrTitl= compbl(tranwrd(translate(stdyDscrTitl, ' ', '09'x),

'*/', '*_/'));

put 'Study Title' _n_ ': ' stdyDscrTitl;

altTitl=compbl(tranwrd(translate(altTitl,

' ','09'x),'*/','*_/'));

put ' ' altTitl;

/*

SAS program to read ICPSR 6084

Study Title1 : CBS News Monthly Poll #2, August 1992

August National Poll II, Republican National Convention

Write a SAS Program – Metadata Comment


Cntlin file for formats l.jpg

data makeTheFormats; Documentation Initiative (DDI) Files

input fmtname $ 1-7 type $ 9-9 start $ 11-26 default 28-35 /

label :&$512.;

datalines;

V00006f N 1 1

Yes

V00006f N 3 1

Converted Refusal

;

run;

proc format cntlin=makeTheFormats;

run;

Cntlin file for Formats


Input l.jpg
Input Documentation Initiative (DDI) Files

  • Logic to produce different input statements for:

    • Fixed column data

    • Delimited data


Output l.jpg

Multiple datasets if different record types Documentation Initiative (DDI) Files

Separate keep dataset options

Logic to output to appropriate dataset

if left(_RecordSetIdentifier) eq left("2 ") then DO;

output ICPSR2825FAMILY ;

END;

Output

if "&fileStructureType" eq "hierarchical" then do;

put 'if left(_RecordSetIdentifier) eq left("' catValu '") then DO;'

/ " output " safeAgency +(-1) safeIDNo +(-1) rectype ';' / 'END;' //;

end;


Some of the limitations l.jpg
Some of the Limitations Documentation Initiative (DDI) Files

  • DDI can describe nCubes, geographic coverage, variable groups

    • Current makeReader.sas can’t handle these

  • DDI definition includes recursive elements

    • E.g. recGrps within recGrps

    • Current makeReader,sas would not find nested elements


Questions l.jpg
Questions? Documentation Initiative (DDI) Files


About the speaker l.jpg

Larry Hoyle Documentation Initiative (DDI) Files

Associate Scientist

Policy Research Institute,

University of Kansas

1541 Lilac Lane

Lawrence, KS 66044-3177

LarryHoyle@ku.edu

About the Speaker


ad
  • Login