Data standards workflow
Sponsored Links
This presentation is the property of its rightful owner.
1 / 42

Data Standards Workflow PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on
  • Presentation posted in: General

Data Standards Workflow. Extract. Load. Provide. Transform. Raw data. Scripts. Database. Charts & Maps. Store raw data in subversion to keep track of history. Add meta information Script to convert raw data into netcdf. Stored files (netcdf) accessible through the web.

Download Presentation

Data Standards Workflow

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data Standards Workflow

Extract

Load

Provide

Transform

Raw data

Scripts

Database

Charts & Maps

Store raw data in subversion to keep track of history

Add meta information Script to convert raw data into netcdf

Stored files (netcdf) accessible through the web

Tools and websites

OpenEarthRawData

OpenEarth OPeNDAP

OpenEarthTools


Data Standards Workflow

Extract

Load

Provide

Transform

Raw data

Scripts

Database

Charts & Maps

Store raw data in subversion to keep track of history

Add meta information Script to convert raw data into netcdf

Stored files (netcdf) accessible through the web

Tools and websites

OpenEarthRawData

OpenEarth OPeNDAP

OpenEarthTools


Transform

  • Add metadata

  • Store in netcdf

  • Save script in subversion


Transform

Add metadata

  • Use the inspire meta data form to store information about the dataset.

    • http://www.inspire-geoportal.eu/inspireEditor.htm

    • Click launch editor


Transform – add metadata

validation

Turn validation on


Transform – add metadata

File identification

Location in subversion

micore


Transform – add metadata

quality

History of your data.


Transform – add metadata

constraints

Please fill in limitations of use.


Transform – add metadata

Save metadata file

  • Save metadata file (local)

  • Add to subversion (local)

  • Commit => metadata into subversion (remote)

  • Store in

  • course/Pcnumber/inspire_description.xml


Transform

  • Add metadata

  • Store in netcdf

  • Save script in subversion


Transform

Store in netcdf

  • What’s netcdf?

  • Write a script to transform data into netcdf

  • Using CF convention


Transform – store in netcdf - netcdf

What is netcdf

  • Data format defined by unidata

  • Data store used for coverage data and multidimensional data

  • CF Metadata convention


T

Y

Z

X

Transform

Transform – store in netcdf - netcdf

What is netcdf

  • An array based data structure for storing multidimensional data

  • N-dimensional coordinates systems

    • X coordinate (e.g. longitude)

    • Y coordinate (e.g. latitude)

    • Z coordinate (e.g. altitude)

    • Time dimension

    • … other dimensions

  • Variables – support for multiple variables

    • Temperature, humidity, pressure, salinity, etc

  • Geometry – implicit or explicit

    • Regular grid (implicit)

    • Irregular grid

    • Points


Transform – store in netcdf - netcdf

Storing Multidimensional Data

X

Y

Z

14 numbers

32 numbers


Transform – store in netcdf - netcdf

Data Model

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details


Transform – store in netcdf – netcdf - applications

ArcGis

ArcGis also reads and writes netcdf files.


Transform – store in netcdf - netcdf

Your favorite text editor

xml representation of a netcdf file


Transform – store in netcdf - netcdf

Other Tools

Not so stable.

Very useful

IDV

NCO

#diff

ncdiff -v time file1.nc file2.nc

#compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression)

#selecting variables by regex

ncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc.

Web hyperslabs, cool!


Data Standards Workflow

Extract

Load

Provide

Transform

Raw data

Scripts

Database

Charts & Maps

Store raw data in subversion to keep track of history

Add meta information Script to convert raw data into netcdf

Stored files (netcdf) accessible through the web

Tools and websites

OpenEarthRawData

OpenEarth OPeNDAP

OpenEarthTools


Transform – store in netcdf - script

Store in netcdf

  • What’s netcdf?

  • Write a script to transform data into netcdf

  • Using CF convention


Transform – store in netcdf - script

Write script

  • Read raw data

    • Read header line

    • Read data

    • Read all data

    • Create function to read all data

    • Use function in Matlab

  • Raw data into empty netcdf file

    • Create empty netcdf file

    • Add dimensions and variables

    • Store variables

  • Read values


Transform – store in netcdf - script

Reading raw data into memory

  • Use one of the following matlab functions to read the file data into an array

    • fscanf


Transform – store in netcdf - script

Example: Transect.txt file

Header line

Year number of points

1999 58

-135 3531 -130 3541 -125 3631 -120 4171 -115 6221

-110 8231 -105 9841 -100 10971 -95 12171 -90 12951

200 -2415 210 -2995 220 -3595 99999999999 99999999999

2000 58

-135 3531 -130 3541 -125 3631 -120 4171 -115 6221

-110 8231 -105 9841 -100 10971 -95 12171 -90 12951

Points

X Z X Z …. 9999999

Location: OpenEarthRawData\course\example\raw


Transform – store in netcdf - script

Read header line

>> fid = fopen('..\raw\transect.txt')

fid =

15

>> header = fscanf(fid, '%d', 2)

header =

2000

58

>> year = header(1)

year =

2000

>> npoint = header(2)

npoint =

58


Transform – store in netcdf - script

Read data

1

>> % read data

data = fscanf(fid, '%d', npoint*2)

data =

-150

3741

-140

3581

-135

2

>> data = reshape(data, [2, npoint])

data =

Columns 1 through 7

-150 -140 -135 -130

3741 3581 3531 3541

3

% read header

header = fscanf(fid, '%d', 2);

year = header(1);

% store year in time

time(i) = year;

npoint = header(2);

% read data

data = fscanf(fid, '%d', npoint*2);

data = reshape(data, [2, npoint]);

% use column vectors

data = data';

>> % use column vectors

data = data'

data =

-150 3741

-140 3581

-135 3531


Transform – store in netcdf - script

Read all data

% preallocate all data % (time, coastward)

transectseries = NaN(3, 58);

coastward_distance = NaN(58, 1);

time = NaN(3, 1);

% open file and get file id

fid = fopen('..\raw\transect.txt');

i = 1;

while (~feof(fid))

% read header

header = fscanf(fid, '%d', 2);

year = header(1);

% store year in time

time(i) = year;

npoint = header(2);

% read data

data = fscanf(fid, '%d', npoint*2);

data = reshape(data, [2, npoint]);

% use column vectors

data = data'

% store data in transect series

transectseries(i,:) = data(:,2);

coastward_distance(:) = data(:,1);

fgetl(fid);

i = i + 1;

end


Transform – store in netcdf - script

Create a function

function transect = readtransect(filename)

% preallocate all data % (time, coastward)

transectseries = NaN(3, 58);

coastward_distance = NaN(58, 1);

time = NaN(3, 1);

% open file and get file id

fid = fopen(filename);

i = 1;

while (~feof(fid))

% read header

header = fscanf(fid, '%d', 2);

year = header(1);

% store year in time

time(i) = year;

npoint = header(2);

% read data

data = fscanf(fid, '%d', npoint*2);

data = reshape(data, [2, npoint]);

% use column vectors

data = data';

% store data in transect series

transectseries(i,:) = data(:,2);

coastward_distance(:) = data(:,1);

fgetl(fid);

i = i + 1;

end

transect = struct('series', transectseries, …

'distance', coastward_distance, 'time', time);

end


Transform – store in netcdf - script

Use the new function

>> data = readtransect('..\raw\transect.txt')

data =

series: [3x58 double]

distance: [58x1 double]

time: [3x1 double]


Transform – store in netcdf - script

Loading data into netcdf

  • What does a netcdf file look like

  • Required meta information


Transform – store in netcdf - script

Netcdf file

transect.nc

netcdf transect {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

float coastward_distance(coastward) ;

coastward_distance:unit = "metre" ;

float year(time) ;

year:unit = "year" ;

float height(time, coastward) ;

height:unit = "metre" ;

data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200,

210, 220 ;

year = 1999, 2000, 2001 ;

height =

353, 354, … -142, -146, -170, -206, -232, -273, -309, -346,

-375, -388,

-32, … -92, -110, -127, -143, -156, -177, -211, -259,

-303, -334 ;

}


Transform – store in netcdf - script

Create an empty netcdf file

>> nc_create_empty(outputfile)

>> nc_dump(outputfile)

netcdf transect.nc {

dimensions:

variables:

}


Transform – store in netcdf - script

Add dimensions

nc_add_dimension(outputfile, 'crossshore', 58)

nc_add_dimension(outputfile, 'time', 3)

nc_dump(outputfile)

>>

netcdf transect.nc {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

}

help nc_add_dimension


Transform – store in netcdf - script

Add variables

crossshoreVariable = struct(...

'Name', 'crossshore_distance', ...

'Nctype', 'float', ...

'Dimension', {{‘crossshore'}}, ...

'Attribute', struct('Name', 'unit', 'Value', 'metre') ...

);

nc_addvar(outputfile, crossshoreVariable);

timeVariable = struct(...

'Name', 'year', ...

'Nctype', 'float', ...

'Dimension', {{'time'}}, ...

'Attribute', struct('Name', 'unit', 'Value', 'year') ...

);

nc_addvar(outputfile, timeVariable);

heightVariable = struct(...

'Name', 'height', ...

'Nctype', 'float', ...

'Dimension', {{'time', ‘crossshore'}}, ...

'Attribute', struct('Name', 'unit', 'Value', 'metre') ...

);

nc_addvar(outputfile, heightVariable);

nc_dump(outputfile)

help nc_addvar


Transform – store in netcdf - script

Result

netcdf transect.nc {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

float coastward_distance(coastward), shape = [58]

coastward_distance:unit = "metre"

float year(time), shape = [3]

year:unit = "year"

float height(time,coastward), shape = [3 58]

height:unit = "metre"

}


Transform – store in netcdf - script

Store variables

nc_varput(outputfile, 'height', data.series)

nc_varput(outputfile, 'year', data.time)

nc_varput(outputfile, 'coastward_distance', data.distance)

help nc_varput


Transform – store in netcdf - script

Result: Netcdf file

transect.nc

netcdf transect {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

float coastward_distance(coastward) ;

coastward_distance:unit = "metre" ;

float year(time) ;

year:unit = "year" ;

float height(time, coastward) ;

height:unit = "metre" ;

data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200,

210, 220 ;

year = 1999, 2000, 2001 ;

height =

353, 354, … -142, -146, -170, -206, -232, -273, -309, -346,

-375, -388,

-32, … -92, -110, -127, -143, -156, -177, -211, -259,

-303, -334 ;

}


Transform – store in netcdf - script

Read values

surface(nc_varget(outputfile, 'height')')


Transform – store in netcdf - convention

Store in netcdf

  • What’s netcdf?

  • Write a script to transform data into netcdf

  • Using CF convention


Transform – store in netcdf - convention

CF convention

Standard used by USGS, NOAA, Arcgis, GDAL

Climate and Forecast (CF) Convention

http://www.unidata.ucar.edu/software/netcdf/docs/conventions.html

Initially developed for

  • Climate and forecast data

  • Atmosphere, surface and ocean model-generated data

  • Also used for observational datasets

  • CF is the most widely used convention for geospatial netCDF data.


Transform – store in netcdf - convention

Improve output

  • Store extra attributes

    • Title

    • Author

    • Standard_name


Transform

  • Add metadata

  • Store in netcdf

  • Save script in subversion


Transform – save script

Save script

  • Save script (local, using matlab https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/)

  • Add to subversion (local)

  • Commit => script into subversion (remote)


  • Login