GriPhyN & iVDGL Architectural Issues

Grid Physics NetworkInternational Virtual Data Grid Laboratory GriPhyN & iVDGLArchitectural Issues GGF5 BOF Data Intensive Applications Common Architectural Issues and Drivers Edinburgh, 23 July 2002 Mike Wilde Argonne National Laboratory

Project Summary • Principle requirements • IT Research: virtual data and transparent execution • Grid building: deploy international grid lab at scale • Components developed/used • Virtual Data Toolkit; Linux deployment platform • Virtual Data Catalog, Request planner and executor, DAGman, NeST • Scale of current testbeds • ATLAS Test Grid – 8 sites • CMS Test Grid – 5 sites • Compute nodes: ~900 @ UW, UofC, UWM, UTB, ANL • >50 researchers and grid-builders working on IT research challenge problems and demos • Future directions (2002 & 2003) • Extensive work on virtual data, planning, and catalog architecture, and fault tolerance

Chimera Overview • Concept: Tools to support management of transformations and derivations as community resources • Technology: Chimera virtual data system including virtual data catalog and virtual data language; use of GriPhyN virtual data toolkit for automated data derivation • Results: Successful early applications to CMS and SDSS data generation/analysis • Future: Public release of prototype, new apps, knowledge representation, planning

“Chimera” Virtual Data Model • Transformation designers create programmatic abstractions • Simple or compound; augment with metadata • Production managers create bulk derivations • Can materialize data products or leave virtual • Users track their work through derivations • Augment (replace?) the scientist’s log book • Definitions can be augmented with metadata • The key to intelligent data retrieval • Issues relating to metadata propagation

begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_fileendbegin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_fileendbegin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_fileendbegin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_fileendbegin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_dbendbegin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_dbend CMS Pipeline in VDL-0 pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis

Data Dependencies – VDL-1 file1 TR tr1( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app1"; argument stdin = ${a1}; argument stdout = ${a2}; } TR tr2( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app2"; argument stdin = ${a1}; argument stdout = ${a2}; } DV x1->tr1( a2=@{out:file2}, a1=@{in:file1}); DV x2->tr2( a2=@{out:file3}, a1=@{in:file2}); x1 file2 x2 file3

Job A Job B Job C Job D Executor Example: Condor DAGMan • Directed Acyclic Graph Manager • Specify the dependencies between Condor jobs using DAG data structure • Manage dependencies automatically • (e.g., “Don’t run job “B” until job “A” has completed successfully.”) • Each job is a “node” in DAG • Any number of parent or children nodes • No loops Slide courtesy Miron Livny, U. Wisconsin

Galaxy cluster size distribution Chimera Virtual Data System + GriPhyN Virtual Data Toolkit + iVDGL Data Grid (many CPUs) Chimera Application:Sloan Digital Sky Survey Analysis Size distribution of galaxy clusters? Joint work with Jim Annis, Steve Kent, FNAL

catalog tsObj core core brg field tsObj tsObj cluster field brg field tsObj brg brg field 5 4 3 2 1 2 1 2 1 1 2 3 Cluster-finding Data Pipeline

Small SDSS Cluster-Finding DAG

And Even Bigger:744 Files, 387 Nodes 50 60 168 108

VDC VDC VDC VDC Vision:Distributed Virtual Data Service Tier 1 centers Distributed virtual data service Regional Centers Local sites apps

Knowledge Management - Strawman Architecture • Knowledge based requests are formulated in terms of science data • Eg, Give me a specific transform of channels c,p,&t over time range t0-t1 • Finder finds the data files • Translates range “t0-t1” into a set of files • Coder creates an execution plan and defines derivations from known transformations • Can deal with missing files (e.g, file c in LIGO example) • Knowledge request is answered in terms of datasets • Coder translates datasets into logical files (or objects, queries, tables,…) • Planner translates logical entities into physical entities

MCAT; GriPhyN catalogs MDS MDS GDMP DAGMAN, Kangaroo GSI, CAS Globus GRAM GridFTP; GRAM; SRM GriPhyN/PPDGData Grid Architecture Application DAG (abstract) Catalog Services Monitoring Planner Info Services DAG (concrete) Repl. Mgmt. Executor Policy/Security Reliable Transfer Service Compute Resource Storage Resource

Common Problem #1(evolving) View of Data Grid Stack Publish-Subscribe Service (GDMP) Reliable Replication Storage Element Manager Reliable File Transfer Replica Location Service Data Transport (GridFTP) Local Repl Catalog (Flat or Hierarchical) Storage Element

Architectural Complexities

Common Problem #2:Request Planning • Map of grid resources • Incoming work to plan • Queue? With lookahead? • Status of grid resources • State (up/down) • Load (current, queued, and anticipated) • Reservations • Policy • Allocation (commitment of resource to VO or group based on policy) • Ability to change decisions dynamically

Policy • Focus is on resource allocation (not with security) • Allocation examples: • “CMS should get 80% of the resources at Caltech” (averaged monthly) • “Higgs group has high prio at BNL till 8/1” • Need to apply fair share scheduling to grid • Need to understand the allocation models dictated by funders and data centers

Grids as overlays on shared resources

Grid Scheduling Problem • Given an abstract DAG representing logical work: • Where should each compute job be executed? • What does site and VO policy say? • What does grid “weather” dictate? • Where is the required data now? • Where should data results be sent? • Stop and re-schedule computations? • Suspend or de-prioritize work in progress to let higher prio work go through? • Degree of policy control? • Is a “grid” an entity? - “aggregator” of resources? • How is data placement coordinated with planning? • Use of a Execution profiler in the planner arch: • Characterize resource needs of an app over time • Parameterize resource reqs of app by its parameters • What happens when things go wrong?

Policy and the Planner • Planner considers: • Policy (fairly static, from CAS/SAS) • Grid status • Job (user/group) resource consumptn history • Job profiles (resources over time) from Prophesy

Open Issues – Planner (1) • Does the planner have a queue?If so, how does a planner manage its queue? • How many planners are there? Is it a service? • How is responsibility between planner and the executor (cluster scheduler) partitioned? • How many other entities need to be coordinated? • RFT, DAPman, SRM, NeST, …? • How to wait on reliable file transfers? • How does planner estimate times if it only has partial responsibility for when/where things run? • How is data placement planning coordinated with request planning?

Open Issues – Planner (2) • Clearly need incremental planning (eg for analysis) • Stop and re-schedule computations? • Suspend or de-prioritize work in progress to let higher prio work go through? • Degree of policy control? • Is the “grid” an entity? • Use of a Execution profiler in the planner arch: • Characterize the resource requirements of an app over time • Parameterize the res reqs of an app w.r.t its (salient) parameters • What happens when things go wrong?

Issue Summary • Consolidate the data grid stack • Reliable file transfer • Reliable replication • Replica catalog and virtual data catalog scaled for global use • Define interfaces and locations of planners • Unify job workflow representation around DAGs • Define how to state and manage policy • Strategies for fault tolerance – similar to replanning for weather and policy changes? • Evolution of services to OGSA

GriPhyN & iVDGL Architectural Issues