Data standards workflow
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Data Standards Workflow PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Data Standards Workflow. Extract. Load. Provide. Transform. Raw data. Scripts. Database. Charts & Maps. Store raw data in subversion to keep track of history. Add meta information Script to convert raw data into netcdf. Stored files (netcdf) accessible through the web.

Download Presentation

Data Standards Workflow

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data standards workflow

Data Standards Workflow

Extract

Load

Provide

Transform

Raw data

Scripts

Database

Charts & Maps

Store raw data in subversion to keep track of history

Add meta information Script to convert raw data into netcdf

Stored files (netcdf) accessible through the web

Tools and websites

OpenEarthRawData

OpenEarth OPeNDAP

OpenEarthTools


Data standards workflow1

Data Standards Workflow

Extract

Load

Provide

Transform

Raw data

Scripts

Database

Charts & Maps

Store raw data in subversion to keep track of history

Add meta information Script to convert raw data into netcdf

Stored files (netcdf) accessible through the web

Tools and websites

OpenEarthRawData

OpenEarth OPeNDAP

OpenEarthTools


Transform

Transform

  • Add metadata

  • Store in netcdf

  • Save script in subversion


Add metadata

Transform

Add metadata

  • Use the inspire meta data form to store information about the dataset.

    • http://www.inspire-geoportal.eu/inspireEditor.htm

    • Click launch editor


Validation

Transform – add metadata

validation

Turn validation on


File identification

Transform – add metadata

File identification

Location in subversion

micore


Quality

Transform – add metadata

quality

History of your data.


Constraints

Transform – add metadata

constraints

Please fill in limitations of use.


Save metadata file

Transform – add metadata

Save metadata file

  • Save metadata file (local)

  • Add to subversion (local)

  • Commit => metadata into subversion (remote)

  • Store in

  • course/Pcnumber/inspire_description.xml


Transform1

Transform

  • Add metadata

  • Store in netcdf

  • Save script in subversion


Store in netcdf

Transform

Store in netcdf

  • What’s netcdf?

  • Write a script to transform data into netcdf

  • Using CF convention


What is netcdf

Transform – store in netcdf - netcdf

What is netcdf

  • Data format defined by unidata

  • Data store used for coverage data and multidimensional data

  • CF Metadata convention


What is netcdf1

T

Y

Z

X

Transform

Transform – store in netcdf - netcdf

What is netcdf

  • An array based data structure for storing multidimensional data

  • N-dimensional coordinates systems

    • X coordinate (e.g. longitude)

    • Y coordinate (e.g. latitude)

    • Z coordinate (e.g. altitude)

    • Time dimension

    • … other dimensions

  • Variables – support for multiple variables

    • Temperature, humidity, pressure, salinity, etc

  • Geometry – implicit or explicit

    • Regular grid (implicit)

    • Irregular grid

    • Points


Storing multidimensional data

Transform – store in netcdf - netcdf

Storing Multidimensional Data

X

Y

Z

14 numbers

32 numbers


Data model

Transform – store in netcdf - netcdf

Data Model

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details


Arcgis

Transform – store in netcdf – netcdf - applications

ArcGis

ArcGis also reads and writes netcdf files.


Your favorite text editor

Transform – store in netcdf - netcdf

Your favorite text editor

xml representation of a netcdf file


Other tools

Transform – store in netcdf - netcdf

Other Tools

Not so stable.

Very useful

IDV

NCO

#diff

ncdiff -v time file1.nc file2.nc

#compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression)

#selecting variables by regex

ncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc.

Web hyperslabs, cool!


Data standards workflow2

Data Standards Workflow

Extract

Load

Provide

Transform

Raw data

Scripts

Database

Charts & Maps

Store raw data in subversion to keep track of history

Add meta information Script to convert raw data into netcdf

Stored files (netcdf) accessible through the web

Tools and websites

OpenEarthRawData

OpenEarth OPeNDAP

OpenEarthTools


Store in netcdf1

Transform – store in netcdf - script

Store in netcdf

  • What’s netcdf?

  • Write a script to transform data into netcdf

  • Using CF convention


Write script

Transform – store in netcdf - script

Write script

  • Read raw data

    • Read header line

    • Read data

    • Read all data

    • Create function to read all data

    • Use function in Matlab

  • Raw data into empty netcdf file

    • Create empty netcdf file

    • Add dimensions and variables

    • Store variables

  • Read values


Reading raw data into memory

Transform – store in netcdf - script

Reading raw data into memory

  • Use one of the following matlab functions to read the file data into an array

    • fscanf


Example transect txt file

Transform – store in netcdf - script

Example: Transect.txt file

Header line

Year number of points

1999 58

-135 3531 -130 3541 -125 3631 -120 4171 -115 6221

-110 8231 -105 9841 -100 10971 -95 12171 -90 12951

200 -2415 210 -2995 220 -3595 99999999999 99999999999

2000 58

-135 3531 -130 3541 -125 3631 -120 4171 -115 6221

-110 8231 -105 9841 -100 10971 -95 12171 -90 12951

Points

X Z X Z …. 9999999

Location: OpenEarthRawData\course\example\raw


Read header line

Transform – store in netcdf - script

Read header line

>> fid = fopen('..\raw\transect.txt')

fid =

15

>> header = fscanf(fid, '%d', 2)

header =

2000

58

>> year = header(1)

year =

2000

>> npoint = header(2)

npoint =

58


Read data

Transform – store in netcdf - script

Read data

1

>> % read data

data = fscanf(fid, '%d', npoint*2)

data =

-150

3741

-140

3581

-135

2

>> data = reshape(data, [2, npoint])

data =

Columns 1 through 7

-150 -140 -135 -130

3741 3581 3531 3541

3

% read header

header = fscanf(fid, '%d', 2);

year = header(1);

% store year in time

time(i) = year;

npoint = header(2);

% read data

data = fscanf(fid, '%d', npoint*2);

data = reshape(data, [2, npoint]);

% use column vectors

data = data';

>> % use column vectors

data = data'

data =

-150 3741

-140 3581

-135 3531


Read all data

Transform – store in netcdf - script

Read all data

% preallocate all data % (time, coastward)

transectseries = NaN(3, 58);

coastward_distance = NaN(58, 1);

time = NaN(3, 1);

% open file and get file id

fid = fopen('..\raw\transect.txt');

i = 1;

while (~feof(fid))

% read header

header = fscanf(fid, '%d', 2);

year = header(1);

% store year in time

time(i) = year;

npoint = header(2);

% read data

data = fscanf(fid, '%d', npoint*2);

data = reshape(data, [2, npoint]);

% use column vectors

data = data'

% store data in transect series

transectseries(i,:) = data(:,2);

coastward_distance(:) = data(:,1);

fgetl(fid);

i = i + 1;

end


Create a function

Transform – store in netcdf - script

Create a function

function transect = readtransect(filename)

% preallocate all data % (time, coastward)

transectseries = NaN(3, 58);

coastward_distance = NaN(58, 1);

time = NaN(3, 1);

% open file and get file id

fid = fopen(filename);

i = 1;

while (~feof(fid))

% read header

header = fscanf(fid, '%d', 2);

year = header(1);

% store year in time

time(i) = year;

npoint = header(2);

% read data

data = fscanf(fid, '%d', npoint*2);

data = reshape(data, [2, npoint]);

% use column vectors

data = data';

% store data in transect series

transectseries(i,:) = data(:,2);

coastward_distance(:) = data(:,1);

fgetl(fid);

i = i + 1;

end

transect = struct('series', transectseries, …

'distance', coastward_distance, 'time', time);

end


Use the new function

Transform – store in netcdf - script

Use the new function

>> data = readtransect('..\raw\transect.txt')

data =

series: [3x58 double]

distance: [58x1 double]

time: [3x1 double]


Loading data into netcdf

Transform – store in netcdf - script

Loading data into netcdf

  • What does a netcdf file look like

  • Required meta information


Netcdf file

Transform – store in netcdf - script

Netcdf file

transect.nc

netcdf transect {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

float coastward_distance(coastward) ;

coastward_distance:unit = "metre" ;

float year(time) ;

year:unit = "year" ;

float height(time, coastward) ;

height:unit = "metre" ;

data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200,

210, 220 ;

year = 1999, 2000, 2001 ;

height =

353, 354, … -142, -146, -170, -206, -232, -273, -309, -346,

-375, -388,

-32, … -92, -110, -127, -143, -156, -177, -211, -259,

-303, -334 ;

}


Create an empty netcdf file

Transform – store in netcdf - script

Create an empty netcdf file

>> nc_create_empty(outputfile)

>> nc_dump(outputfile)

netcdf transect.nc {

dimensions:

variables:

}


Add dimensions

Transform – store in netcdf - script

Add dimensions

nc_add_dimension(outputfile, 'crossshore', 58)

nc_add_dimension(outputfile, 'time', 3)

nc_dump(outputfile)

>>

netcdf transect.nc {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

}

help nc_add_dimension


Add variables

Transform – store in netcdf - script

Add variables

crossshoreVariable = struct(...

'Name', 'crossshore_distance', ...

'Nctype', 'float', ...

'Dimension', {{‘crossshore'}}, ...

'Attribute', struct('Name', 'unit', 'Value', 'metre') ...

);

nc_addvar(outputfile, crossshoreVariable);

timeVariable = struct(...

'Name', 'year', ...

'Nctype', 'float', ...

'Dimension', {{'time'}}, ...

'Attribute', struct('Name', 'unit', 'Value', 'year') ...

);

nc_addvar(outputfile, timeVariable);

heightVariable = struct(...

'Name', 'height', ...

'Nctype', 'float', ...

'Dimension', {{'time', ‘crossshore'}}, ...

'Attribute', struct('Name', 'unit', 'Value', 'metre') ...

);

nc_addvar(outputfile, heightVariable);

nc_dump(outputfile)

help nc_addvar


Result

Transform – store in netcdf - script

Result

netcdf transect.nc {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

float coastward_distance(coastward), shape = [58]

coastward_distance:unit = "metre"

float year(time), shape = [3]

year:unit = "year"

float height(time,coastward), shape = [3 58]

height:unit = "metre"

}


Store variables

Transform – store in netcdf - script

Store variables

nc_varput(outputfile, 'height', data.series)

nc_varput(outputfile, 'year', data.time)

nc_varput(outputfile, 'coastward_distance', data.distance)

help nc_varput


Result netcdf file

Transform – store in netcdf - script

Result: Netcdf file

transect.nc

netcdf transect {

dimensions:

coastward = 58 ;

time = 3 ;

variables:

float coastward_distance(coastward) ;

coastward_distance:unit = "metre" ;

float year(time) ;

year:unit = "year" ;

float height(time, coastward) ;

height:unit = "metre" ;

data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200,

210, 220 ;

year = 1999, 2000, 2001 ;

height =

353, 354, … -142, -146, -170, -206, -232, -273, -309, -346,

-375, -388,

-32, … -92, -110, -127, -143, -156, -177, -211, -259,

-303, -334 ;

}


Read values

Transform – store in netcdf - script

Read values

surface(nc_varget(outputfile, 'height')')


Store in netcdf2

Transform – store in netcdf - convention

Store in netcdf

  • What’s netcdf?

  • Write a script to transform data into netcdf

  • Using CF convention


Cf convention

Transform – store in netcdf - convention

CF convention

Standard used by USGS, NOAA, Arcgis, GDAL

Climate and Forecast (CF) Convention

http://www.unidata.ucar.edu/software/netcdf/docs/conventions.html

Initially developed for

  • Climate and forecast data

  • Atmosphere, surface and ocean model-generated data

  • Also used for observational datasets

  • CF is the most widely used convention for geospatial netCDF data.


Improve output

Transform – store in netcdf - convention

Improve output

  • Store extra attributes

    • Title

    • Author

    • Standard_name


Transform2

Transform

  • Add metadata

  • Store in netcdf

  • Save script in subversion


Save script

Transform – save script

Save script

  • Save script (local, using matlab https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/)

  • Add to subversion (local)

  • Commit => script into subversion (remote)


  • Login