UIS Data Transformation and Validations. As it pertains to the SDMX TWG EXL Initiative. Gathering Data. Each data point to be collected is described with dimensions prior to collection Unique identifier is assigned to each data point/dimensional grouping Data is collected via surveys
As it pertains to the SDMX TWG EXL Initiative
EMC_ID: Internal unique identifier used to store data. Each EMC_ID summarizes a set of dimension for data that we collect.
In this case, the data point refers to ENROLLMENT (EC_UNIT=210) in ISCED 1 (EC_ISCED = 10). Labels for each dimensional value are stored in separate dimension tables.
For a more human legible format, each EMC_ID used in indicator definitions is also given an alphanumeric code that summarizes the dimensions. In this case, “E.1” is used, for ENROLLMENT in ISCED 1.
Concept: Redundant Data Check
Description: UIS Surveys often have cells that are redundant in order to verify that the value entered in one cell is accurate and not the victim of a human input error
Purpose: Verify that one cell equals another, redundant, cell
Method: Validates that a specific “MASTER” cell is equal to any other redundant cell. Redundant cells are identified by having all dimensional values equal to the master cell with the exception of the PRIORITY dimension.
Indicators are defined using MathML, with custom tags implemented by the UIS.
<en>Graduation age population</en>
<fr>Population d age de graduation</fr>
<roll wildcard="isc" list="1,2.GPV,2"/>
<roll wildcard="sex" list="F,T"/>
<offset wildcard="age" low="Ag1" up="Ag25">
<d src="POP" >P.(age).(sex)</d>
<synonym wildcard="isc" use="2.A.GPV" for="2.GPV" />
<synonym wildcard="isc" use="2.A.GPV" for="2" />
<!-- relative change is not greater than 10< -->
<formula cid="12" cids="1" range="0.1">
<roll wildcard="sex" list="M,F,T" />
<roll wildcard="isc" list="0,1,23,4,56" />
Alternate Comparison for INCLUSION
(when the data is included in the master cell)
By default, if the above data point is missing, the indicator calculated will also be labeled as missing.
The MG=“2” code above alters the behavior of the data point. Missing data for this data point will now be considered ‘nil’ or 0