1 / 10

CEDPS Data Services

CEDPS Data Services. Ann Chervenak USC Information Sciences Institute. Goals of CEDPS Data Area. Assist DOE applications with petascale data management requirements Includes assisting with evaluation and deployment of existing services Globus GridFTP for secure, efficient data transfer

mariah
Download Presentation

CEDPS Data Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CEDPS Data Services Ann Chervenak USC Information Sciences Institute

  2. Goals of CEDPS Data Area • Assist DOE applications with petascale data management requirements • Includes assisting with evaluation and deployment of existing services • Globus GridFTP for secure, efficient data transfer • Replica Location Service for data registration and discovery • Data Replication Service • Condor NeST, etc. • Development of new functionality • Improvements to GridFTP for better resource management • Policy-driven data placement services

  3. New Data Services in CEDPS • Develop tools and techniques for reliable, high-performance, secure, and policy-driven placement of data within a distributed science environment • Managed Object Placement Service — enhancement to today’s GridFTP—that allows for management of: • Space • Bandwidth • Connections • Other resources needed to endpoints of data transfers • Data placement and distribution services that implement different data distribution and placement behaviors

  4. Extending GridFTP: The Managed Object Placement Service (MOPS) Functionality that will be added • Adding Resource management to GridFTP • Memory usage limitation • Enforce appropriate storage usage • Enforce appropriate bandwidth usage • Eliminates the potential to consume too many system resources • Bandwidth and storage reservation • Transfer scheduling

  5. MOPS • Released under the CEDPS project • MOPS 1.0 is available at http://www.cedps.net/wiki/index.php/Software • Includes: • Optimization for lots of small files transfer • Globus fork (Gfork) - inetd like service that allows state to be maintained across connections • Gfork plugin for GridFTP - allows for dynamic addition/removal of data movers, limit memory usage • Lotman - manage storage • GridFTP plugin to enforce storage usage policies using lotman

  6. GridFTP - New Features • GridFTP over UDT • Users can substitute UDT for TCP • UDT provides a reliable layer on top of UDP • 4-5 times performance improvement over TCP • GridFTP over SSH • Globus-url-copy (GridFTP client) uses the standard ssh program to remotely start GridFTP server as user • stdin/out becomes the control channel • No data channel authentication • GridFTP Where there’s FTP (GWFTP) • A proxy server that allows use of any FTP client to transfer data to/from GridFTP server • GFork • An inetd like service and allows sharing of state between sessions

  7. Data Placement Services: Motivation • Scientific applications often perform complex computational analyses that consume and produce large data sets • Computational and storage resources distributed in the wide area • The placement of data onto storage systems can have a significant impact on • performance of applications • reliability and availability of data sets • We want to identify data placement policies that distribute data sets so that they can be • staged into or out of computations efficiently • replicated to improve performance and reliability

  8. Layered Data Placement Architecture • Decide where to place objects and replicas in the distributed Grid environment • Policy-driven, based on needs of application and the Virtual Organization • Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution

  9. Higher-Level Data Placement Services • Recently released first generation of data placement service • Seeking application input on requirements for placement services they need “Data Placement for Scientific Applications in Distributed Environments,” Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi, in Proceedings of Grid 2007 Conference, Austin, TX, September 2007.

  10. Summary of CEDPS Data Services • Goal is to assist DOE applications with petascale data management requirements • Help applications evaluate and deploy existing services (GridFTP, RLS, etc.) • New development to meet additional application requirements • Improvements to GridFTP for better resource management • Policy-driven data placement services • Actively seeking DOE applications to use services and help define requirements

More Related