100 likes | 232 Views
CEDPS Data Services. Ann Chervenak USC Information Sciences Institute. Goals of CEDPS Data Area. Assist DOE applications with petascale data management requirements Includes assisting with evaluation and deployment of existing services Globus GridFTP for secure, efficient data transfer
E N D
CEDPS Data Services Ann Chervenak USC Information Sciences Institute
Goals of CEDPS Data Area • Assist DOE applications with petascale data management requirements • Includes assisting with evaluation and deployment of existing services • Globus GridFTP for secure, efficient data transfer • Replica Location Service for data registration and discovery • Data Replication Service • Condor NeST, etc. • Development of new functionality • Improvements to GridFTP for better resource management • Policy-driven data placement services
New Data Services in CEDPS • Develop tools and techniques for reliable, high-performance, secure, and policy-driven placement of data within a distributed science environment • Managed Object Placement Service — enhancement to today’s GridFTP—that allows for management of: • Space • Bandwidth • Connections • Other resources needed to endpoints of data transfers • Data placement and distribution services that implement different data distribution and placement behaviors
Extending GridFTP: The Managed Object Placement Service (MOPS) Functionality that will be added • Adding Resource management to GridFTP • Memory usage limitation • Enforce appropriate storage usage • Enforce appropriate bandwidth usage • Eliminates the potential to consume too many system resources • Bandwidth and storage reservation • Transfer scheduling
MOPS • Released under the CEDPS project • MOPS 1.0 is available at http://www.cedps.net/wiki/index.php/Software • Includes: • Optimization for lots of small files transfer • Globus fork (Gfork) - inetd like service that allows state to be maintained across connections • Gfork plugin for GridFTP - allows for dynamic addition/removal of data movers, limit memory usage • Lotman - manage storage • GridFTP plugin to enforce storage usage policies using lotman
GridFTP - New Features • GridFTP over UDT • Users can substitute UDT for TCP • UDT provides a reliable layer on top of UDP • 4-5 times performance improvement over TCP • GridFTP over SSH • Globus-url-copy (GridFTP client) uses the standard ssh program to remotely start GridFTP server as user • stdin/out becomes the control channel • No data channel authentication • GridFTP Where there’s FTP (GWFTP) • A proxy server that allows use of any FTP client to transfer data to/from GridFTP server • GFork • An inetd like service and allows sharing of state between sessions
Data Placement Services: Motivation • Scientific applications often perform complex computational analyses that consume and produce large data sets • Computational and storage resources distributed in the wide area • The placement of data onto storage systems can have a significant impact on • performance of applications • reliability and availability of data sets • We want to identify data placement policies that distribute data sets so that they can be • staged into or out of computations efficiently • replicated to improve performance and reliability
Layered Data Placement Architecture • Decide where to place objects and replicas in the distributed Grid environment • Policy-driven, based on needs of application and the Virtual Organization • Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution
Higher-Level Data Placement Services • Recently released first generation of data placement service • Seeking application input on requirements for placement services they need “Data Placement for Scientific Applications in Distributed Environments,” Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi, in Proceedings of Grid 2007 Conference, Austin, TX, September 2007.
Summary of CEDPS Data Services • Goal is to assist DOE applications with petascale data management requirements • Help applications evaluate and deploy existing services (GridFTP, RLS, etc.) • New development to meet additional application requirements • Improvements to GridFTP for better resource management • Policy-driven data placement services • Actively seeking DOE applications to use services and help define requirements