Download
1 / 47

Project sponsors - PowerPoint PPT Presentation


  • 445 Views
  • Updated On :

Project sponsors. Earth System Grid - DOE/SciDAC Coupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM Virtual Solar-Terrestrial Observatory - NSF/CISE/SCI Related DODS/OPeNDAP work - NASA and NCAR/HAO. Overview. Report on experience with data ‘systems’ and data ‘frameworks’

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Project sponsors' - sherlock_clovis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Project sponsors l.jpg
Project sponsors

  • Earth System Grid - DOE/SciDAC

  • Coupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM

  • Virtual Solar-Terrestrial Observatory - NSF/CISE/SCI

  • Related DODS/OPeNDAP work - NASA and NCAR/HAO

Fox


Overview l.jpg
Overview

  • Report on experience with data ‘systems’ and data ‘frameworks’

  • CEDARWEB

  • Earth System Grid

  • Compare and contrast success in terms of use(rs)

  • Technology integration - when and how does it work and scale?

  • Outline a merged approach for Virtual Observatory concept

Fox



Cedarweb heritage l.jpg
CEDARWEB: heritage

  • CEDAR is a large scientific and technical community focusing on the Earth’s middle and upper atmosphere. The program features ground-based observing networks, models and integrative studies. Funded by NSF, in third phase (3rd decade)

  • CEDAR data history

    • Started as an incoherent radar database in 1983 as a tape archive (back to 1966)

    • Grew by late 80’s adding other instruments, models, indices

    • Went on-line in early 90’s (became a single-tiered data system)

    • Web access in 1996, three versions of the interface

  • Holdings - some satellite data, geophysical indices, modesl (GCM, empirical, tides, etc.), ISRs, HF Radars, Digisondes, FPIs, IR Michelson Interferometers, Spectrometers, Airglow Imagers, All-Sky Cameras, LIDARs, Multi-Channel Photometers, MST Radars, MF Radars, LF Radars, Meteor Wind Radars, Campaigns, Presentations, Surveys, Jobs, Workshops, etc.

  • Community, 600+, 300+ registered users, ~ 100 active data users per year

  • NCAR tasked with community support, and especially in the early days to ‘take care’ of the data and work with data providers and users

  • Significant effort in catalogs, metadata, controlled vocabulary

  • System has labored in getting past the code/mnemonic schemes of the past, base data format

Fox


Cedar pre web l.jpg
CEDAR pre-web

Data query, selection and retrieval interface, without any integrated tools or ability to preview data before retrieving it.

Fox




Cedarweb 3 x l.jpg
CEDARWEB 3.x

Data query, selection and retrieval interface, with integrated tools, e.g. ability to plot (preview) data before retrieving it.

Fox




Cedarweb 3 1 l.jpg
CEDARWEB 3.1

Ability to quickly plot data to assess suitability, quality, and produce a quick copy with some customization for a preliminary study.

Fox


Experience cedarweb l.jpg
Experience: CEDARWEB

Don’t just provide data, but also build in community information and ancillary information that is of value.

Fox


Inside cedarweb l.jpg
Inside CEDARWEB

  • Rich metadata; categorized

  • OPeNDAP for data access and transport

  • MySQL for catalog and user records

  • https and cookies for session authentication

  • Script-enabled interface with plotting built in (ION) delivers html to browsers

  • ‘Hides’ organizational data record structure (sort of)

  • Low-level data product, but also high-level

  • Disconnect between delivery of data and attributes

  • Today: framework is inside the data system!

Fox


Experience cedarweb15 l.jpg
Experience: CEDARWEB

CEDARWEB has been developed and improved over more than 10 years of interaction with users, data providers, and a community steering committee. Each of these elements has directly contributed to changes in what services are provided, what information and materials are made available via the web site and what levels of authorization and authentication are required.

Biggest lesson: systems approach has worked because of the heritage of the data collection but users (esp. new or very experienced) see a barrier to entry and don’t understand where system starts/stops.

http://cedarweb.hao.ucar.edu

Fox


Earth system grid overview l.jpg
Earth System Grid Overview

  • The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.

  • Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.

  • Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.

  • Foundation is Globus Grid technology

Fox


Esg u s collaborations development l.jpg
ESG: U.S. Collaborations & Development

ANL: Computational grids,

& grid-based applications

LBNL: Climate storage

facility

LLNL: Model diagnostics

& inter-comparison

USC/ISI: Computational grids,

& grid-based applications

ORNL: Climate storage &

computational resources

LANL: Next generation

coupled models & computing

NCAR: Climate change

predication and scenarios

Fox


Slide18 l.jpg

ESG leverages existing software and projects

  • DODS/OPeNDAP:Distributed Oceanographic Data System (Unidata)

    • Integrations of Globus GridFTP, DODS data access

  • THREDDS: THematic Real‑time Environmental Distributed Data Services (Unidata)

  • LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)

    • Works with CDAT, Ferret, GrADS, …

  • CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualization

  • Community Data Portal project (NCAR)

  • NCL (NCAR)

  • Globus Grid technology(ANL, ISI): GridFTP, CAS Community Access Portal

Fox




Slide21 l.jpg

The Earth System Grid

DATA storage

SECURITY services

METADATA services

TRANSPORT services

LBNL

ANALYSIS & VIZ services

MONITORING services

gridFTP server/client

HRM

FRAMEWORK services

DISK

ANL

Auth metadata

NCAR

MySQL

GSI

CAS server

RLS

SLAMON daemon

TOMCAT

AXIS

GRAM

CAS client

GSI

NCL openDAPg client

LAS server

NERSC

HPSS

gridFTP server/client

HRM

openDAPg server

ORNL

NCAR

MSS

DISK

TOMCAT

LLNL

SLAMON daemon

CDAT openDAPg client

MySQL

Xindice

RLS

THREDDS catalogs

gridFTP server/client

HRM

gridFTP server/client

HRM

CAS client

MyProxy client

MyProxy server

GSI

ORNL

HPSS

DISK

DISK

openDAPg server

ISI

MySQL

MySQL

RLS

MySQL

Xindice

RLS

MCS

OGSA-DAIS

Fox

CAS client

GSI

GSI

GSI



Community data portal l.jpg
Community Data Portal

Free text search

Authentication

Applications

Live Access

News

THREDDS catalog

Fox



Las cdat example of a web based data portal l.jpg
LAS/CDAT: Example of a Web-based Data Portal

  • Technology: Web Based (end user requirements)

    • LAS, DODS, ESG (i.e., Globus), CDAT

  • Portal should hide/simplify the Grid for users

    • Single sign-on

    • Community-based authorization

    • Simplified resource location

    • Remote job submission, management

  • Accesses the ESG Grid Testbed

Fox


Esg example of a web based data portal serving 40 simulations amip cmip and pcm l.jpg
ESG: Example of a Web-based Data Portal (serving 40+ simulations: AMIP, CMIP, and PCM)

Fox



Metadata centric view of esg services l.jpg
Metadata-centric view of ESG services

DATA TRANSPORT

USER AUTHENTICATION

AND AUTHORIZATION

LOCATION

METADATA

DATA ANALYSIS &

VISUALIZATION

ACCESS AND

AUTHORIZATION

METADATA

AGGREGATION

METADATA

METADATA

SERVICES

CATALOGUING

METADATA

CONTENT METADATA

DATA BROWSING

ANNOTATION & HISTORY

METADATA

LOGGING

METADATA

SYSTEM MONITORING

AND CONTROL

DATA SEARCH & DISCOVERY

Fox


Esg metadata services architecture l.jpg
ESG Metadata Services Architecture

3-layer architecture:

  • Metadata Holdings: physical metadata content, stored in a system of relational and/or XML native databases

  • Core Metadata Services: modules and libraries that mediates all access to the Metadata Holdings (insert, update, delete, query) – expose an API that hides the specific implementation of the databases and query languages

  • High Level Metadata Services: system of applications that make use of the Core Metadata Services to fulfill a specific atomic functionality – will be invoked by external clients

Fox


Slide30 l.jpg

ESG CLIENTS API

& USER INTERFACES

PUBLISHING

ANALYSIS & VISUALIZATION

SEARCH & DISCOVERY

ADMINISTRATION

BROWSING & DISPLAY

HIGH LEVEL METADATA SERVICES

METADATA

EXTRACTION

METADATA

ANNOTATION

METADATA & DATA

REGISTRATION

METADATA

BROWSING

METADATA

SEARCH, QUERY

& DISCOVERY

METADATA

AGGREGATION

METADATA

VALIDATION

METADATA

CONVERSION

METADATA

DISPLAY

CORE METADATA SERVICES

METADATA ACCESS

(update, insert, delete, query)

SERVICE TRANSLATION

LIBRARY

METADATA HOLDINGS

Replica

Location

Services

Metadata

Cataloguing

Services

THREDDS

catalogs

XML DB

Fox


Esg metadata services goal functionality l.jpg
ESG Metadata Services Goal Functionality

  • Services responsible for the creation, management and utilization of metadata associated with geophysical data

  • Functionality:

    • Metadata extraction (automatically, from files in different format and according to various possible metadata standards)

    • Metadata conversion (from one standard to another)

    • Metadata aggregation (associated with data collections)

    • Metadata annotation (manually by humans)

    • Metadata validation (basic quality control of metadata)

    • Registration (population of metadata holdings)

    • Harvesting (combination of metadata from different repositories)

    • Metadata browsing and display (for humans)

    • Search and discovery of data through metadata

    • Metadata query (by agents or clients for data analysis and visualization)

Fox


Esg metadata services current development l.jpg
ESG Metadata Services Current Development

Currently have in production the following technologies :

  • Replica Location Services : database to manage and index multiple copies of the same data stored at different centers

  • Metadata Cataloguing Services : relational database to store scientific metadata (developed for high energy physics and geophysical data)

  • XML native (**) and SQL databases

  • THREDDS (by Unidata ) : system for hierarchical cataloguing of datasets and associated metadata (http://www.unidata.ucar.edu/projects/THREDDS)

  • NcML (Netcdf Markup Language) : XML language for encoding of metadata associated with data in netcdf format (and more…)

Fox


Esg metadata policy l.jpg
ESG Metadata Policy

  • Premise : geophysical sciences are too broad and complex to impose a single, omnicomprehensive metadata standard to capture the relevant information for all datasets, projects, instruments, scientists

  • ESG will not mandate use of any metadata schema or convention

  • Allow data providers, scientists to use their metadata of choice, provide technologies and tools to store and access metadata through common services (MCS, XML DB, THREDDS catalogs)

  • Encourage development and reuse of a limited set of domain-specific standards (climate data, radar data, airborn instrumentation etc), encoding in XML (according to community developed schemas), interoperability and combination of schemas (XML namespaces and RDF-based ontologies - developed but not used)

Fox


Opendap for esg ii l.jpg
OPeNDAP for ESG II

  • DODS since ~ 1995 was been based on http and cgi-style architecture

  • Two concerns

    • Application support and performance of HTTP

    • Housekeeping abilities of cgi architecture

  • Solution evolve OPeNDAP the discipline neutral aspect of DODS

Fox


Opendap ctd l.jpg
OPeNDAP ctd.

  • Data transport protocol and access protocol separated

  • Revised server architecture

  • Address Grid-style authentication

  • Memory management

  • Exception handling

  • All these changes and retain interoperation with HTTP and cgi

  • Advanced requirements: URL should support more than one dataset, or object, i.e. aggregation

Fox


Opendap 3 x vs opendap g architecture l.jpg

Simple and easy to install

One CGI process per URL request

Limited memory management – external

Limited scalability

Limited status reporting to web server

Returns data stream from one format

Standalone server or httpd module

Can manage multiple daemon processes

Strong memory management – internal

Reuse processes, scales

Coupled to OPeNDAP server for status

Returns multiple formats in a single stream, multiple protocols

OPeNDAP 3.x vs OPeNDAP-g Architecture

Fox




Status l.jpg
Status

  • Refactor core classes to remove http/libwww, etc.

  • Operational/production release of standalone OPeNDAP server (no dependence on web server)

  • Multi-protocol support: file, http, GridFTP, ftp, etc.

  • Re-architected for aggregation support and performance

  • Run OPeNDAP server as a client to GridFTP server

  • Portal application client in production, prototype of netCDF client operational

  • Authentication is handled outside OPeNDAP server

  • URL syntax is more complex

Fox


Esg framework experience l.jpg
ESG: Framework experience

  • ESG is a highly collaborative effort and will allow users to quickly access data storage facilities storing petabytes of raw or processed data in an application independent manner.

  • Payoffs of this distributed collaborative infrastructure have included:

    • Distributed data-sharing, RLS works! SRM/HRM work! OPeNDAP-g works!

    • Simplified data discovery of climate data, the work on metadata paid off! Scalability?

    • Large-scale climate data processing and analysis via highly integrated portal

    • Increased collaboration among climate research scientists, people use it!

    • Aid in climate assessments and estimates of future climate variability and trends, IPCC!

  • Authentication and authorization have been a significant challenge

    • GSI to CAS

    • MyProxy - session based and seems to work well, more compatible with heterogeneous framework services

    • SAML is working for multi-file batch transfer

Fox


Esg framework experience41 l.jpg
ESG: Framework experience

  • Privatization

    • Portal interface (and much of the holdings) are cloned

    • Closed communities are breeding dead-end alley developments, e.g. delivering netCDF

  • Transport - GridFTP versus HTTP

    • Server to server

    • Very good performance

    • Depends on a very specific version of GRIDftp server (stripped)

    • Clients are not as capable due to ‘weight’ of globus, revert to HTTP

  • Scalability and response times (data AND metadata)

    • Framework architecture supports re-layered for tuning

  • Service monitoring

    • to support the distributed collaborative infrastructure

    • need lots or all services to really make a production environment work

  • Many Globus services not used (GRIS, MDS, GIIS, … )

  • Feeling lucky? Try out ESG by visiting the website at: http://www.earthsystemgrid.org

Fox


Success l.jpg
Success?

  • Users are generally happy

  • Exploited new technology components

    • Integration - when and how does it work and scale?

      • XML

      • SQL

      • DODS

      • OPeNDAP and OPeNDAP-g

    • Portals

    • P2P - clients are not as ready as we think

  • Globus provides a suite of framework components, some are easier to integrate than others, some just don’t fit our use-cases and architecture

  • Data framework - e.g. OPeNDAP has been extremely successful

Fox


User needs l.jpg
User needs

In discussions with data providers and users, the needs are clear:

``Fast access to `portable' data, in a way that works with the tools we have; information must be easy to access, retrieve and work with.'’

Too often users (and data providers) have to deal with the organizational structure of the data sets which varies significantly --- data may be stored at one site in a small number of large files while similar data may be stored at another site in a large number of relatively smaller files. There is an equally large problem with the range of metadata descriptions for the data. Users often only want subsets of the data and struggle with getting it efficiently. One user expresses it as:

``(Please) solve the interface problem.''

Fox


Vision for building science cyberinfrastructure l.jpg
Vision for building science cyberinfrastructure

Use-case, then requirements

Then derive architecture and choose technology components

Build a working system for users from the start

Get your funding source and community to commit to an evolving architecture

If you choose a major framework technology, e.g. Globus, OPeNDAP, THREDDS, partner with them

Data framework - e.g. OPeNDAP has been extremely successful

Fox


One paradigm l.jpg
One paradigm

Goal - find the right balance of data/model holdings, portals and client software that a researchers can use without effort or interference as if all the materials were available on his/her local computer.

E.g.

The Virtual Solar-Terrestrial Observatory (VSTO) is proposed to be:

  • a distributed, scalable education and research environment for searching, integrating, and analyzing observational, experimental and model databases in the fields of solar, solar-terrestrial and space physics

    Comprises:

  • a system-like framework which provides virtual access to specific data, model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use

Fox


Virtual observatory need better glue l.jpg
Virtual Observatory? Need better glue

  • Basic problem: schema are categorized rather than developed from an object model/class hierarchy -> significantly limits non-human use. However, they all form the basis to organize catalog interfaces for all types of data, images, etc.

  • This limits data systems utilizing frameworks and prevents frameworks from truly interoperating (SOAP, WSDL only a start)

  • Directories, e.g. NASA GCMD, CEDAR catalog, FITS (flat) keyword/ value pairs, are being turned into ontologies (SWEET, VSTO)

  • Markup languages, e.g. ESML, SPDML, ESG/ncML are excellent bases

  • Evolve, recast, merge (where appropriate) using formal processes, tools with intended use in mind - for interface specifications, reasoning, validation, etc. beyond the usual search and access

Fox


Summary l.jpg
Summary

  • Basic success in both data systems and data framework approaches

  • Satisfying user and sponsor needs (from ‘just’ to ‘outstanding’)

  • Experience with Globus ranges from very good, to not ready for our need

  • Experience with OPeNDAP is very good, especially with core services

  • Scalability and performance require an adaptable architecture which is something system-level interfaces can still hide from the user

  • Challenge - to bring these attributes to a framework, i.e. in which the user is more exposed

  • Interoperate, interoperate, interoperate - interface, interface, interface

  • User interfaces still require significant HCI efforts

  • Metadata services are extremely important

Fox


ad