Eurostat edit 2012
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Eurostat EDIT 2012 PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Eurostat EDIT 2012. Functional Presentation. EDIT Introduction. EDIT allows users to import data, perform a set of predefined operations on the imported datasets and export data resulted from these processing operations. Validations / Computations Record Vertical Hierarchical

Download Presentation

Eurostat EDIT 2012

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Eurostat edit 2012

Eurostat EDIT 2012

Functional Presentation


Edit introduction

EDIT Introduction

EDIT allows users to import data, perform a set of predefined operations on the imported datasets and export data resulted from these processing operations.

Validations / Computations

  • Record

  • Vertical

  • Hierarchical

    Dataset Operations

  • Copy, Merge, Alter, Aggregate, etcsee Scripting manual.


Edit introduction 2

EDIT Introduction (2)

Data Validation tool, allowing users to import data, run validation programs and export results

The validation process relies on a custom Scripting Language

Web-based User Interface

Data and Metadata isolated into independent Domains


Specialised functions

Specialised functions

Time series outliers (Terror)

Berthelot-Hidiroglu

Sigma Gap

Programmable functions


Technology overview

Technology Overview

Web Based Interface

  • Unified interface for both the local version and the server deployment

  • EUROSTAT Look & Feel

  • Light interface, simplified workflows

    RDBMS (Oracle and PostgreSQL)

    ECAS or local authentication ( end year: SMS)


Edit integration capabilities

EDIT Integration Capabilities

Exposes full API as Web Services

Integrated with EDAMIS

  • detect incoming files and process them in unattended mode

  • publish validation results to the Feedback Channel

    Integrated with the SDMX Registry

  • fetch DSDs into EDIT structures

  • load codelists from the Registry


Edamis integration

EDAMIS Integration

EDAMIS can send data to EDIT by placing the files in a configurable location

EDIT detects metadata based on the EDAMIS naming convention

EDIT performs the processing in unattended mode

EDIT acts as a client for the EDAMIS Feedback Channel Web Service in order to publish the results of a job execution


Sdmx registry integration

SDMX Registry Integration

EDIT can import DSDs or codelists

EDIT acts as a client for the SDMX Registry Web Services in order to fetch DSD files and codelists data

The DSD file is broken down into EDIT components

  • Key families are translated to EDIT formats

  • Codelists are translated to EDIT Lookups

  • An EDIT Program is created performing lookup validations and basic checks on the dimension fields

    A specific importer has been implemented to process codelist data


Eve integration

EVE Integration

EVE Rules can be imported into EDIT or executed from an external file during the EDIT Job Execution

An EDIT component can translate EVE rules defined in XML files to EDIT Scripting Language


Important principles

Important principles

1) From Microdata to macrodata

2) Scripting principle (symbols/placeholders)

3) Editing seen as a case of complex computations

4) Multidataset approach

5) Cube approach in computations


Rule layout

Rule layout

  • Rule name

  • Rule type

  • Rule body

  • Error part (msg,selected vars)

  • Then compute part

  • Else compute part


Edit scripting language capabilities

EDIT Scripting Language Capabilities

  • Custom Scripting Language designed specifically for data validation

  • Tries to be as simple as possible and still flexible enough to fit the requirements of any existing domain

  • Allows the definition of Formats and validation Programs

  • Formats (Dataset Definitions) describe the structure of the data (Format Definition Language)

  • Validation Programs describe the validation rules and are composed from a set of steps with inputs and outputs (Program Definition Language)


Edit standalone installation

EDIT Standalone Installation

  • Standalone Installation supported for Windows XP and Windows 7

  • Simple installation wizard

  • Shortcuts are created in the Start Menu


Edit user types

EDIT User Types

  • User

    • Executes jobs on datasets

  • Programmer

    • Manages the Metadata needed by the User to execute jobs

    • Sets up the unattended mode configuration

  • Administrator

    • Manages users and permissions


User module functionality

User Module Functionality

  • Change Password

    • Change the password of the user(when not logged through ECAS)

  • Dataset Import/Export

    • Import and export data to and from the System

    • Monitor any ongoing import/export processes

  • Job Execution

    • Execute validation programs on imported datasets

    • View the results of a Job Execution


User workflow

User Workflow


Programming module main functional capabilities

Programming ModuleMain Functional Capabilities

  • Formats and Programs Definition

    • Define Metadata using the editors in the User Interface

    • Import/Export Metadata from/to external TXT files

    • Import/Export to Oracle

  • Data Import/Export

    • Import Auxiliary Data (lookup datasets)

  • Job Execution

    • Execute validation programs on imported datasets


Programming module workflows

Programming Module Workflows


Programming module format definition

Programming Module - Format Definition


Programming module program definition

Programming Module – Program Definition


Programming module import auxiliary data

Programming Module – Import Auxiliary Data


Programming module unattended mode configuration

Programming Module – Unattended Mode Configuration


Administration module main functional capabilities

Administration ModuleMain Functional Capabilities

  • User Management

    • Manage the Users and their permissions

  • User Group Management

    • Manage the User Groups and their members

  • Domain Management

    • Manage the Domains


Dataset format cvts

Dataset format CVTS

FORMAT cvts_4 {

DESCRIPTION "CVTS Format";

FIELDS {

COUNTRY {

DESCRIPTION "None";

CAPTION "None";

TYPE STRING;

LENGTH 2;

}

ENTERPR {

DESCRIPTION "None";

CAPTION "None";

TYPE NUMBER;

LENGTH 6;

}

REFYEAR {

DESCRIPTION "None";

CAPTION "None";

TYPE NUMBER;

LENGTH 4;

}

WEIGHT {

DESCRIPTION "None";

CAPTION "None";

TYPE DOUBLE;

LENGTH 20.10;

}

NACE_SP {

DESCRIPTION "None";

CAPTION "None";

TYPE STRING;

LENGTH 5;


Program cvts annex8

Program CVTS annex8

PROGRAM cvts_4 {

INPUT cvts_4 inputDataSet;<= all input datasets we use for the validation

INPUT LANGUAGES_LIST LANGUAGES;

INPUT COUNTRIES_LIST COUNTRIES;

INPUT NUTS_LIST NUTS;

INPUT NACE_LIST NACE;

STEPS {<= can be multi-step program (for example separately ERRORS .. WARNINGS )

VALIDATION annex8_error {

INPUTinputDataSet; <= main dataset being validated

LOOKUP LANGUAGES;

LOOKUP COUNTRIES;<= lookup tables

LOOKUP NUTS;

LOOKUP NACE;

ERROR err_annex8_error;<= output log - error dataset

RULES {

RECORD FL001 {

CONDITIONinLookup (COUNTRY, COUNTRIES, "CODE"); <= check validity of the COUNTRY code using lookup table

ERRMSG "Rule FL1 failed for field [COUNTRY]: See EU Manual for valid list of codes (annex 12)" SEVERITY "Error" (COUNTRY) ;

}

RECORD FL002 {

CONDITION (ENTERPR>=0 AND ENTERPR<=999996);

ERRMSG "Rule FL2 failed for field [ENTERPR]: In the range 0 to 999996" SEVERITY "Error" (ENTERPR) ;

}

RECORD FL004 {

CONDITION (strToDouble (NACE_SP)>=2001 AND strToDouble (NACE_SP)<=2020);

ERRMSG "Rule FL4 failed for field [NACE_SP]:In the range 2001 to 2020 - See Manual (annex 1)" SEVERITY "Error" (NACE_SP) ;

}

RECORD FL005 {

CONDITION in (SIZE_SP, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9);

ERRMSG "Rule FL5 failed for field [SIZE_SP]: In the range 0 to 9" SEVERITY "Error" (SIZE_SP) ;

}

RECORD FL171 {

CONDITION (NOT isNull (A1bis)) -> inLookup (A1bis, NACE, "CODE");

ERRMSG "Rule FL171 failed for field [A1bis]: NACE rev 1.1" SEVERITY "Warning" (A1bis) ;

}

RECORD FL172 {

CONDITION (NOT isNull (A2bis)) -> ( (A2bis >=0 and A2bis <= 999996) or A2bis = 999999);

ERRMSG "Rule FL172 failed for field [A2bis]: In the range of 0-999996 or 999999" SEVERITY "Warning" (A2bis) ;

}


Complex program example 1

Complex program example (1)

PROGRAM ComputationExamples {

INPUTcountryDsdinputData;

STEPS {

VALIDATION checkValues {

INPUTinputData;

ERROR errorData1;

RULES {

RECORDpureRecord {

PRICE := 20;

}

RECORDconditionalRecord {

CONDITIONisNull(VALUE);

THEN {

VALUE := PRICE * QUANTITY;

}

ELSE {

PRICE := VALUE / 5;

QUANTITY := VALUE / PRICE;

}

}

VERTICALpureVertical {

EXPRESSION {

KEYS COUNTRY, CTYPE, MONTH, PRODUCT;

TRKEYS COUNTRY;

VALUE['TOTAL'] := nvl(VALUE['TOTAL'],0);

}

}


Complex program example 2

Complex program example (2)

VERTICAL conditionalVertical {

EXPRESSION {

KEYS COUNTRY, CTYPE, MONTH, PRODUCT;

TRKEYS COUNTRY;

CONDITION2 * VALUE['TOTAL'] = sum(VALUE[*]);

THEN {

VALUE['FR'] := VALUE['TOTAL'] / 3;

VALUE['GB'] := VALUE['TOTAL'] / 2;

VALUE['TOTAL'] := sum(VALUE[*]) - VALUE['TOTAL'];

}

ELSE {

VALUE['TOTAL'] := sum(VALUE[*]) - VALUE['TOTAL'];

}

}

}

VERTICAL multipleTranspositionsComputation {

EXPRESSION {

KEYS COUNTRY, CTYPE, MONTH, PRODUCT;

TRKEYS COUNTRY, PRODUCT;

VALUE['TOTAL']['GAS'] := nvl(VALUE['TOTAL']['GAS'],0);

}

}


Complex program example 3

Complex program example (3)

VERTICAL multipleTranspositionsCondition {

EXPRESSION {

KEYS COUNTRY, CTYPE, MONTH, PRODUCT;

TRKEYS COUNTRY, PRODUCT;

CONDITION VALUE['TOTAL']['GAS'] > 5000;

THEN {

VALUE['TOTAL']['GAS'] := VALUE['TOTAL']['OIL'] * 2;

VALUE['TOTAL']['GLD'] := VALUE['TOTAL']['OIL'] * 5;

}

ELSE {

VALUE['TOTAL']['GLD'] := VALUE['TOTAL']['GAS'] + 3000;

}

}

}

}

}

DATAOPERATION sortData {

SORT {

INPUT inputData;

ORDER MONTH ASC, CTYPE ASC, COUNTRY ASC, PRODUCT ASC;

}

}

}

}


Accepted data formats

Accepted data formats

Gesmes/ BOP ITS, BOP FDI

UNA:+.? '

UNB+UNOC:3+FR2+4D0+100929:1637+IREF000243++GESMES/TS'

UNH+MREF000001+GESMES:2:1:E6'

BGM+74'

NAD+Z02+ECB'

NAD+MR+4D0'

NAD+MS+FR2'

IDE+10+EUROSTAT_BOP_01 reporting'

DSI+BOP_FDI_A'

STS+3+7'

DTM+242:201009291637:203'

DTM+Z02:20072009:702'

IDE+5+EUROSTAT_BOP_01'

GIS+AR3'

GIS+1:::-'

ARR++A:FR:N:2:330:N:4A:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F‘ <= multi-year 2007, 2008, 2009 observations

ARR++A:FR:N:2:330:N:4F:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F'

ARR++A:FR:N:2:330:N:7Z:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F'

ARR++A:FR:N:2:330:N:A1:E:1100:9999:20072009:702:5824:A:F+5930:A:F+4204:A:F'

ARR++A:FR:N:2:330:N:A1:E:1495:9999:20072009:702:5828:A:F+5932:A:F+4206:A:F'

CSV (with or without header) /SBS, CVTS,TOURISM

9H; 2008; LT; 2; B-N_X_K642; 11930; 16236; ; ; ; ; UNIT; ; ; ; ; ; TT0; ; ; ; ; D08

9H; 2008; LT; 3; B-N_X_K642; 11930; 1001; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08

9H; 2008; LT; 4; B-N_X_K642; 11930; 529; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08

9H; 2008; LT; 30; B-N_X_K642; 11930; 17766; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08

9H; 2008; LT; 2; B-E; 11930; 1138; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08

9H; 2008; LT; 3; B-E; 11930; 104; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08

9H; 2008; LT; 4; B-E; 11930; 61; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08

FLR example

001E20100121814 00 804.822

001E20100121816 93 5295.549

001E20100121814 99 6166.24

001E20100125290334 581.371

FLR example 2

2010010011 010252000405595911005909580E 01ZZZZZ 2691.966 2734482.0 0.0

2010010011 010252000405595911004009600E 01ZZZZZ 237.543 341202.0 0.0


Program with parameter s

Program with parameter(s)

PROGRAM SBS_ANNEX1_SingleSeries {

INPUT SBS_DATA input1;

PARAMETER P_T NUMBER;

PARAMETERSET PARAMETERS {

P_T = 2009;

}

STEPS {

VALIDATION Validation {

INPUT input1;

ERRORErrorLog;

RULES {

VERTICAL Rule001 {

EXPRESSION {

KEYS SERIES, YEAR, TER_UNIT, SIZECLASS, ECO_ACTIVITY, VARIABLE;

TRKEYS VARIABLE;

CONDITION SERIES = '1A' AND YEAR =p_T AND countMissing(aux_val['12150'],aux_val['12110'])=0 -> aux_val['12150'] <= aux_val['12110'];

ERRMSG '12150 > 12110' SEVERITY 'Warning' (aux_val['12150'],aux_val['12110']) ;

}

}


Functions data types operators 1

Functions, data types, operators (1)

Functions

These following function calls are supported:

A

  • abs(Double) – absolute value

  • ascii(Char) – returns the ASCII code for a character

    B

  • between(Double, Double, Double) – verifies if a number is inside a closed interval

  • between(Double, Double, Double, Boolean, Boolean) – verifies if a number is inside an interval allowing the user to specify if the interval is closed or open at each end

  • between(String, String, String) – same as above

  • between(String, String, String, Boolean, Boolean) – same as above

    C

  • ceiling(Double) – ceiling for number (Ex: ceiling(3.2) => 4)

  • chr(Integer) – returns the ASCII character for the ASCII code

  • concatenate(String…) – concatenate Strings

  • countMissing(List) – returns the number of null values in the list

  • count(List) – returns the number of elements in the list

    E

  • exp(Double)

    F

  • firstIndexOf(String toSearch, String searchIn, Double startingFrom) – first occurance of the toSearch String in the searchIn String

  • floor(Double) – floor for number (Ex: ceiling(3.2) => 3)

    G

  • getRowCount(datasetReference) – returns the number of rows for the specified dataset reference

Data Types

There are four types of data:

  • Boolean

  • Double

  • Number

  • String

    OperatorsThere are three types of operators in the SL Expressions:

    Boolean operators, used to evaluate expressions into a true/false result:

  • OR

  • AND

  • NOT

  • -> (implication)

  • = (equals)

  • <> (not equals)

  • > (greater than)

  • < (lower than)

  • >= (greater than or equal to)

  • <= (lower than or equal to)

  • Computation operators, used to produce a value result following evaluation:

  • + (plus)

  • – (minus)

  • * (multiply)

  • / (divide)

  • Assignment operator, used to assign a value to an operand

  • := (supports assignment to NULL, value := NULL)


Functions data types operators 2

Functions, data types, operators (2)

N

  • nvl(Boolean, Boolean) – if first argument value is null return second argument

  • nvl(Double, Double)

  • nvl(String, String)

    O

  • occurs(String S, String s) – returns the number of occurences of s in S

    P

  • printf(String, String/Double…) – offers the capabilities of the printf method

  • pow(Double N, Double n) – N**n

    R

  • right(String S, Double N) – returns the last N characters from S

  • round(Double N, Double n) – round N till n decimals (Ex: round(4.46, 1) => 4.5)

  • rtrim(String) – trim right side of String

    S

  • str(Double N1, Double N2, Double N3) – Ex: str(30.25, 7, 3)= “_BLANK_30.250“

  • strToDouble(String) – convert a String into a Double – return null if String cannot be converted

  • substring(String, Double N, Double n) – substring starting from N, counting n characters

  • sum(Double List) – sum of elements from list

  • sqrt(Double) – returns the square root of the value

    T

  • trim(String) – trim String

  • transcode (“targetField”, lookupReference, “lookupField”, lookupValue) – performs a lookup based on the specified field and value and returns the value of the target field on the matching row

    U

  • upper(String) – switch to upper case

  • uniqueInList(List) – checks if a list contains unique values

I

  • in(Boolean, Boolean List) – check to see if a value is inside a list

  • in(Double, Double List)

  • in(String, String List)

  • identicalInList(List) – checks that all elements in a list are identical

  • isIdentical(List, List, …) – returns a boolean indicating if each list contains identical elements (elements are identical inside a single list) – Ex: isIdentical(price[*],quantity[*],value[*])

  • isUnique(List, List, …) – returns a boolean indicating if the combination of elements from each list for all the index positions are unique– Ex: isUnique(price[*],quantity[*],value[*])

  • isTrue(Boolean) – checks if a boolean is true

  • isNull(Boolean/String/Double) – checks if a value is NULL

  • inLookup(value1, value2, …, datasetReference, “fieldName1”, “fieldName2”,… ) – returns a boolean indicating whether or not the value or combination of values is defined for the fieldname or combination of fieldnames in the specified dataset

    L

  • lastIndexOf(String toSearch, String searchIn, Double startingFrom) – last occurance of the toSearch String in the searchIn String

  • length(String) – returns the length of the String

  • like(String, String) – compares two strings in a SQL manner

  • log(Double)

  • ln(Double)

  • ltrim(String) – trim left side of String

  • lower(String) – switch to lower case

  • left(String S, Double N) – returns the first N characters from S

    M

  • max(Double List) – maximum value from a list

  • min(Double List) – minimum value from a list

  • mean(Double List) – computes the average value not counting null values

  • missingMean(Double List) – computes the average value counting null values as zero

  • mod(Double N, Double n) – N%n


2013 functionalities

2013 functionalities

  • Scalability improvements

  • Gesmes full integration

  • Internationalisation and interface improvements


Thank you for your attention

THANK YOU FOR YOUR ATTENTION

EDIT? Hmm


  • Login