1 / 20

NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP)

NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP). Beth Weinstein NASA GSFC. May 8, 2006. LDCM Grid Prototype (LGP) Introduction. A Grid infrastructure allows scientists at resource-poor sites access to remote resource-rich sites Enables greater scientific research

saima
Download Presentation

NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006

  2. LDCM Grid Prototype (LGP) Introduction • A Grid infrastructure allows scientists at resource-poor sites access to remote resource-rich sites • Enables greater scientific research • Maximizes existing resources • Limits the expense of building new facilities • The objective of the LDCM Grid Prototype (LGP) is to assess the applicability and effectiveness of a data grid to serve as the infrastructure for research scientists to generate Landsat-like data products

  3. LGP Milestones • Capability 1 (C1) (12/03 - 12/04) • Demonstrated a basic grid infrastructure to enable a science user to run their program on a specified resource in a virtual organization • Virtual organization (VO) included GSFC labs and USGS EROS resources • Basic Globus Toolkit 2.4 (e.g. GSI, GridFTP, GRAM) • Capability 2 (C2) (12/04 - 9/05) • Demonstrated an expanded grid infrastructure to allow the dynamic allocation of resources to enable a specific science application • VO included NASA GSFC labs, USGS EROS, University of Maryland (UMD) • Workflow enabled • NASA ROSES ACCESS A.26 (1/06 – 1/08) • Land Cover Change Processing and Analysis System: LC-ComPS

  4. Capability 1 Science Scenario LEDAPSL7ESR MODIS MOD09GHK

  5. Capability 1 Summary • Prepare two heterogeneous data sets at different remote locations for like “footprint” comparison from a science user’s home site • The MODIS Reprojection Tool (MRT) serves as our “typical science application” developed at the science users site (GSFC Building 32 in demo) • mrtmosaic and resample (subset and reproject) • Operates on MODIS and LEDAPS (Landsat surface reflectance) scenes • Data distributed at remote facilities • NASA GSFC Building 23 (MODIS scenes) • USGS EROS (LEDAPS scenes) • Solves a realistic scientific scenario using grid-enabled resources

  6. Capability 2 Science Scenario Landsat Scene 1 Path/Row: 182/61 Date: 2/12/2002 2002 182/61 Composite Landsat Scene 2 Path/Row: 182/61 Date: 6/4/2002

  7. Capability 2 Summary • Create direct reflectance composite products using Landsat data • Blender Task 1 scenario and modules were contributed by Jeff Masek and Feng Gao • Modules • lndcal - calibration • lndcsm – cloud shadow mask • lndsr – surface reflectance • lndreg - registration • lndcom - composite • Input data • Up to 5 Landsat scenes: spatially coincident • GSFC ancillary data: • TOMS (ozone) • Reanalysis (Water Vapor) • Output data: 1 LEDAPS/Blender composite scene

  8. Capability 2 Scenario EROS Pool 2001 Landsat Scene 1 2001 Landsat Scene 2 2001 Landsat Scene 3 2001 Landsat Scene 4 lndcal lndcal lndcal lndcal lndcsm lndcsm lndcsm lndcsm ancillary inputs ancillary inputs ancillary inputs ancillary inputs lndsr lndsr lndsr lndsr lndreg lndreg lndreg lndcom 30m resolution 2001 composite product (single path-row)

  9. Capability 2 Virtual Organization UMD1 (2) UMD College Park Edclxs66 (2) USGS EROS Sioux Falls, SD LGP23 (4) GSFC B23/W316 MacCl23 (12) GSFC B23/W316 1 Gbps 1 Gbps USGS EROS 1 Gbps Backbone 1 Gbps GSFC SEN 1Gbps Backbone 1 Gbps USGS EROS MAX (College Park) OC48, 2.4Gbps Backbone 1 Gbps OC12, 622 Mbps LGP32 (2) Science User_1 GSFC B32/C101 vBNS+ (Chicago) OC48, 2.4Gbps Backbone OC12, 622 Mbps Shared with DREN GSFC Capability 2 SEN: Science and Engineering Network MAX: Mid-Atlantic Crossroads DREN: Defense Research and Engineering Network vBNS+: Very high Performance Network Service Capability 3

  10. Capability 2 Grid Workflow • In Capability 1, jobs were run on a specific resource • In Capability 2, workflow provided the ability to submit a job to the “Grid” (VO) • Leverage distributed resource sharing and collaboration on a large-scale • Grid resource management • Automatic allocation of grid resources • Sub task management • Reliable job completion • Leverage idle cpu cycles

  11. Capability 2 Workflow Software: Karajan • Karajan provides grid workflow functions • Includes task management language and an execution engine • Integrated with the Java Commodity Grid (CoG) Kit • Includes a task scheduler • Runs gridExecute and gridTransfer tasks on grid resources • Manages both local and remote resources • Specifies workflow using XML • Supplies command line and GUI interfaces Java CoG 4_0_a1 Karajan Globus Toolkit 2.4.3 Globus Gate keeper GridFTP GRAM Karajan – Globus Grid Architecture

  12. User Configuration File Specification • User creates product.spec configuration file • Path, row, and acquisition date provided for each input scene # product.spec example file host: edclxs66.cr.usgs.gov base_directory: /data/LEDAPS 182 062 20010719 base 182 062 20030215 182 062 20040218 182 062 20040609 - # default to host and base_directory specified above 182 061 20020212 base 182 061 20020604 182 061 20040101 182 061 20040218 182 061 20040711 -

  13. Capability 2 Architecture Product.spec driver.pl driver.xml Karajan <parallel> Host … … Host 1 Host 2 create_composite1.xml create_composite2.xml <sequential> <sequential> lndpm lndpm lndpm lndpm Scene 1 <path, row, acqDate> Base Scene Scene 1 <path, row, acqDate> Base Scene lndcal lndcal lndcal lndcal lndcsm lndcsm lndcsm lndcsm Scene 2 <path, row, acqDate> Scene 2 <path, row, acqDate> lndsr lndsr lndsr lndsr … … lndreg lndreg lndcom lndcom Copy_ output Copy_ output

  14. Capability 2 Performance • Processing benchmarks: *Each composite had 4 input scenes • Transfer rates • File Transfer using 8 parallel streams • Raw Data Files (TIF) 57 Mb in 45-50 Sec. (~ 1.26 Mbps) • Final Output File (HDF) 1.25 Gb in 5 Minutes (~ 4 Mbps) • Conclusion: Larger files are more efficient

  15. Performance Research and Potential Plans • Benchmarked processing rates for producing up to 50 output scenes • Completed initial analysis of transfer and processing rates obtained using Netlogger • Netlogger provides the ability to monitor applications within a complex distributed environment in order to determine exactly where time is spent • Room for Optimization • Analyze process flows to optimize running in operational setting and implement optimization strategies below • Complete input file compression on data host prior to file transfer • Increase the parallelization • Parallel runs of multiple input scenes for a single composite • Parallel file transfer • Add more CPUs and maximize CPU utilization • Look at error handling and possibility of automatic re-starting of jobs

  16. LGP Lessons Learned • The Open Source environment can be very beneficial • Reuse, Collaboration incentive • “Hardened software” (i.e. GSI) • A surprising amount of time was spent on basic network administration and security • Network performance • Firewall/ports • Maintaining configuration management across independent agencies and centers is difficult • MapCenter - System status tool (QA/Calibration) • Understanding the processing flow and modules required for optimization • Once size doesn’t fit all (at least not yet) • Allow for remote processing; dynamic ancillary data • CPU intensive vs. data intensive • Karajan is somewhat immature, but we have passed on requests to CoG developers • Karajan does provide the basic framework for creating workflows in an operational setting. Functionality not provided by the basic framework is being provided by external wrapper scripts • Developed workaround to pass environment variables across processing runs • Provided wrapper script to pass arguments to underlying Globus executables • Very elementary Job Scheduler

  17. Current and Future Work • LDCM Grid Prototype work will continue • Receiving NASA ROSES ACCESS A.26 funding for Land Cover Change Processing and Analysis System (LC-ComPS) • Use grid technology to allow regional and continental-scale land cover analysis at high resolution • Use Globus 4.0 as the underlying Grid infrastructure • Improve error handling in the workflow scripts and handle automatic re-starting of tasks in the event of failures • Expand the “pool” of machines in VO

  18. Backup Slides

  19. Sponsors LDCM - Bill Ochs, Matt Schwaller Code 500/580 - Peter Hughes, Julie Loftis LGP Team members Jeff Lubelczyk (Lead) Gail McConaughy (Branch Principal) Beth Weinstein (SW Lead) Ben Kobler (HW, Networks) Eunice Eng (SW Dev, Data) Valerie Ward (SW Dev, Apps) Ananth Rao ([SGT] SW Arch/Dev, Grid Expert) Brooks Davis ([Aerospace Corp] Grid Expert) Wayne Yu ([QSS] Sys Admin) GSFC Science Input Jeff Masek (Blender) Feng Gao (Blender) USGS EROS Stu Doescher (Mgmt) Chris Doescher (POC) John Dwyer Tom Mcelroy Mike Neiers (Sys Support) Cory Ranschau (Sys Admin) University of Maryland (UMD) Paul Davis Gary Jackson Acknowledgements

  20. Acronym List • ACCESS Advancing Collaborative Connections for Earth-Sun System Science • EROS Earth Resources Observation and Science • FTP File Transfer Protocol • GASS Globus Access to Secondary Storage • GRAM Grid Resource Allocation & Management • GSI Grid Security Infrastructure • LC-ComPS Land Cover Change Processing and Analysis System • LDCM Landsat Data Continuity Mission • LEDAPS Landsat Ecosystem Disturbance Analysis Adaptive Processing System • LGP LDCM Grid Prototype • LP DAAC Land Processes Distributed Active Archive Center • MDS Monitoring & Discovery System (MDS) • MODIS Moderate Resolution Imaging Spectroradiometer • MRT MODIS Reprojection Tool • ROSES Research Opportunities in Space and Earth Sciences

More Related