1 / 19

Earth System Grid - ESG

Earth System Grid - ESG. Toni Saarinen, Tite4 Tomi Ruuska, Tite4. ESG Overview. Earth System Grid enables management, discovery, distributed access, processing and analysis of distributed terascale climate research data

phong
Download Presentation

Earth System Grid - ESG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Earth System Grid - ESG Toni Saarinen, Tite4 Tomi Ruuska, Tite4

  2. ESG Overview • Earth System Grid enables management, discovery, distributed access, processing and analysis of distributed terascale climate research data • A “Collaboratory Pilot Project” funded by the DOE(Department of Energy) SciDAC program • Build upon ESG-I, Globus Toolkit, DataGrid technologies

  3. ESG Overview • The main goal of ESG is to make climate data an easily accessible community resource. • Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. • The broad strategy is to develope a collection of server-side capabilities – minimize the amount of data movement • Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation

  4. ESG Participants • ANL Argonne National Laboratory (Argonne, IL) • ISI Information Sciences Institute (Marina del Rey, CA) • LANL Los Alamos National Laboratory (Los Alamos, NM) • LBNL Lawrence Berkeley National Laboratory (Berkeley, CA) • LLNL Lawrence Livermore Nat. Laboratory (Livermore, CA) • NCAR Nat. Center for Atmospheric Research (Boulder, CO) • NERSC Nat. Energy Res. Scient. Comp. Center (Oakland, CA) • ORNL Oak Ridge National Laboratory (Oak Ridge, TN) • USC University Of Southern California (Los Angeles, CA)

  5. ESG History • ESG-I: DOE NGI(Next Generation Internet) project • Focus on high-performance data movement, Grid-enabled versions of LLNL tools • Early successes include bandwidth challenge at SC’2001, significant technology output • Experimental deployments only, at participating sites • ESG-II: DOE SciDAC(Scientific Discovery through Advanced Computing) project • “Smart servers” for server-side data reduction • Integration with common “thin” clients, e.g. DODS and Data Portals • Client software in the hands of environmental scientists • Production deployments at participating instances

  6. Climate GRID Example for Ocean Model Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_bounds(i,j,4) Lon_bounds(i,j,4)

  7. Geographical Overview

  8. ESG-II Architecture

  9. Demonstration Workflow: Metadata Search Replica Location and transfer User authentication Data analysis and visualization ESG Components • Globus Toolkit (ANL, ISI) • GridFTP data transfer • GRAM resource access • Community Authorization Service (CAS) • Replica Location Service (RLS) • Metadata Catalog Service (MCS) • Web interface (NCAR) and workflow manager • Hierarchical Resource Manager (HRM) (LBNL) • Storage Resource Manager • Metadata (NCAR, LLNL, ISI) • OpenDAP-G (NCAR, ANL) • Live Access Server (NCAR)

  10. The Globus Toolkit™ • An Open Source Project • Security • Directory, Metadata, and Replica Services • Resource Management • Data Access and Management • Distributed Computation • Open Grid Services Architecture (OGSA) • Reliable, persistent web services

  11. The Globus Toolkit™ • Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids. • GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library. • ESG is integrating OpenDAP (DODS protocol) with GridFTP protocol. • Single sign-on using Grid Security Infrastructure • Proxy certificates • Community Authorization Service (CAS) • Replica Location Service: manages copying and placement of files in a distributed environment. • Logical vs. physical files

  12. Distributed Data Access Protocol • Grid + OpenDAP • Transparency • Performance • Security • Resource Management • Analysis functions Typical Application DistributedApplication Application Application Application netCDF lib OpenDAP Client ESG client OpenDAP Via http OpenDAP Via Grid ESG Grid + DODS data OpenDAP Server ESG Server Data (local) Data (remote) Big Data (remote)

  13. ESG CLIENTS API & USER INTERFACES PUBLISHING ANALYSIS & VISUALIZATION SEACH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY HIGH LEVEL METADATA SERVICES METADATA EXTRACTION METADATA ANNOTATION METADATA & DATA REGISTRATION METADATA BROWSING METADATA QUERY METADATA AGGREGATION METADATA VALIDATION METADATA DISPLAY METADATA DISCOVERY CORE METADATA SERVICES METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY METADATA HOLDINGS mirror Dublin Core XML Files Data & Metadata Catalog Dublin Core Database CF Database COMMENTS XML Files ESG Metadata Services

  14. Resource Management • Hierarchical Resource Manager • - queuing of file transfer requests • - reordering of request to optimize Parallel FTP • - monitoring progress and error messages • - re-schedules failed transfers • - enforces local resource policy • Storage Resource Management • - Manage space • - Manage files on behalf of a user • - Manage file sharing • - Get files from remote locations when necessary • - Manage multi-file requests • - Provide grid access to/from mass storage • - Transfer protocol negotiation

  15. Live Access Server • General purpose Web server for geo-science data sets • Directs communications between a user and an application running under a Web server • Converts requests into a series of commands which actually does the data access

  16. ESG Data Portal Goal: Make large ESG data sets easily accessible to Scientistsfor production use

  17. LBNL disk HPSS High Performance Storage System ANL openDAPg server CAS Community Authorization Services CAS-enabled Striped-gridFTP server gridFTP Striped gridFTP client CAS-enabled Striped-gridFTP server SRM Storage Resource Management gridFTP gridFTP gridFTP server openDAPg server MyProxy server NCAR GRAM gatekeeper disk CAS-enabled Striped-gridFTP server MyProxy client CAS client openDAPg server TOMCAT Servlet engine MCS client LLNL RLS client ORNL SRM Storage Resource Management gridFTP server gridFTP server gridFTP gridFTP server gridFTP SRM Storage Resource Management ISI LAS Live Access Server SRM Storage Resource Management SOAP MCS Metadata Cataloguing Services HPSS High Performance Storage System RMI RLS Replica Location Services disk MSS Mass Storage System disk

  18. ESG: Strategies & Goals • Move data a minimal amount, keep it close to computational point of origin when possible • Data access protocols, distributed analysis • When we must move data, do it fast and with a minimum amount of human intervention • Storage Resource Management, fast networks • Keep track of what we have, particularly what’s on deep storage • Metadata and Replica Catalogs • Harness a federation of sites • Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

  19. ESG Development in 2003 • Metadata Conventions and Services • Application groups deciding on one (or more) metadata schemas • Better MCS support for XML schema • Distribution and federation of heterogeneous metadata catalogs • Integration of DODS server and GridFTP data transport protocol • Customization of Replica Location Service for ESG • Storage Resource Manager (from LBNL) to optimize storage transfers • Community authorization service to provide fine-grained access control

More Related