1 / 23

Project Status Report : SAMGrid

Project Status Report : SAMGrid. SAMGrid Management, Status, Operations – Merritt SAMGrid Development I. – Veseli SAMGrid Development II. – Kennedy SAMGrid Future Plans – St. Denis . SAMGrid Project Description.

dolf
Download Presentation

Project Status Report : SAMGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Status Report : SAMGrid • SAMGrid Management, Status, Operations – Merritt • SAMGrid Development I. – Veseli • SAMGrid Development II. – Kennedy • SAMGrid Future Plans – St. Denis Computing Division Project Status Report

  2. SAMGrid Project Description • Purpose: Provide data handling services to Run II experiments and other interested experiments with similar problems. These services should scale in performance and convenience for cataloging and delivery of Petabyte-sized datasets, and should evolve to availability in relevant Grid environments. • Current stakeholders: CDF, DØ, MINOS, CD • Duration: Development effort is expected to extend through ’07 as the components move to become Grid services. High-level maintenance ( I.e., effort that includes capability to respond to feature requests) is expected to continue through at least the data collection lifetime of the stakeholders. Computing Division Project Status Report

  3. The SAM-Grid Team • Revised Management Plan went into effect Dec 03 Project Co-Leaders: Wyatt Merritt CD/DØCA Rick St. Denis CDF/ U Glasgow Project Technical Co-Leaders: Rob Kennedy CD/CDFSinisa Veseli CD/DØCA CCF: Andrew Baranovski, Gabriele Garzoglio, Igor Terekhov CEPA: Carmenita Moore, Steve White (0.5 FTE) CDF: Randy Herber, Art Kreymer, Stefan Stonjek (GS) DØCA: Lauri Loebel Carpenter, Robert Illingworth, Adam Lyon Computing Division Project Status Report

  4. The SAM-Grid Team - Extended Database support (CSS-DSG):Diana Bonham, Anil Kumar Associated internal projects:RUNJOB (with CMS)Authorization Project (with CMS, still being defined) Associated external projects:PPDG Sankalp Jain, Aditya NishandarGridPP Morag Burgon-Lyon, Valeria Bartsch, Iain Bertram, Dave Evans, Peter Love SBIR II Matt Vranicar, Jeremy Simmons, Josh Gramlich, Ngan MacDonald, John Grace Computing Division Project Status Report

  5. SAMGrid Project Management & Organization • Project co-leaders • Represent largest stakeholders: requirements & priorities • Run weekly design meetings • Project technical leaders • Run weekly operations meeting • Conduct subproject assessments • Active Subprojects: C++ API, DBServer, JIM, H Stream Reco for CDF, Caching, Chains&Links, CDF DFC, Test Harness, Linux deploy of DBServers, Config Man • Planned Subprojects: Request system, Autodest, Further monitoring (MIS) • Related Subprojects: d0tools, SBIR II, Condor mods, workflow packages for CDF & D0, Authorization & Accounting • Recently completed Subprojects: Python API, V5.1 Schema Design, Batch Adapter, D0 Online dcache TDP, 1st Gen Monitoring Tools, Data Dimensions Grammar Computing Division Project Status Report

  6. SAMGrid Components • Event/File Catalog for metadata (contents & processing) and locations • Dbservers for accessing catalog • Station servers for file delivery to projects • Optimizer • File storage server • Interface to station cache and MSS (samcp) • JIM components for Grid job submission & monitoring • User API • C++ client API Computing Division Project Status Report

  7. Status and deployments of SAMGrid For DØ • Operational @ FNAL: online, reco farm, d0mino, cab, new cab, clued0 • Operational @ Monte Carlo production sites • Operational @ remote analysis sites: ~20 active, ~40 deployed • Operational 11/03 – 2/04 for remote reconstruction: IN2P3, UKGrid (Manchester/ICL/RAL), WestGRID, GridKA, NIKHEF -- 97M events reprocessed remotely • Stats: ~78K proj FNAL, >14K proj remote (since 1/1/03) 60 billion evts, 3 PB, 8 M files consumed (all D0 stations) Computing Division Project Status Report

  8. D0 Computing Division Project Status Report

  9. D0 Files 4000-8000 Files/Day Computing Division Project Status Report

  10. D0 Files Per Month By Year 1999 2000 2001 2002 2003 100,000 files Run II Start Computing Division Project Status Report

  11. D0 Total Files 2.5Million Files Served Computing Division Project Status Report

  12. D0 Total Data Moved 700TB moved Computing Division Project Status Report

  13. Status and deployments of SAMGrid For CDF • Operational 24/7 to store online metadata • Operational at remote stations: ~15 active, ~30 deployed Large recent increase: Fla. Wkshp! • In testing for Monte Carlo production • File delivery tests up to 20 TB on testcaf • Statistics: ~3000 proj total (since 1/1/03) • Note CDF usage pattern is different from DØ: CDF moves more GB (but not more events) because it does not use small summary format like DØ thumbnail. Computing Division Project Status Report

  14. CDF Florida DH Workshop Now 20! • 11 installations in about 2 hours. Integrated with dCAF in 2 cases in 2 days. • 3 in Asia, 4 in Europe • 6 sites committed to summer 2004 usage of their facilities for all of CDF (mostly MC) • Sam installation now: initsam cdf <stationname> • Follow-up on April 1. • Each site has a local user support person to reduce load on core development team. • Generally: Security ate 80% of the effort! Computing Division Project Status Report

  15. Computing Division Project Status Report

  16. 2TB/Day: Karlsruhe Computing Division Project Status Report

  17. CDF Dcache on CAF ALL CDF on CAF reads 20TB/Day Computing Division Project Status Report

  18. Computing Division Project Status Report

  19. Computing Division Project Status Report

  20. Status and deployments of JIM • Job broker, execution and submission site software, job monitor, client software for grid job submission • Deployment plan for DØ Monte Carlo • Test at 3 sites (Manchester, CCIN2P3, Wisconsin) with basic functionality and measure efficiency of job completion • Verify use by experimenter for job submission (this week) • Add merging • Move to production at these 3 sites (DØ milestone: Mar 1) • Add remainder of DØ MC sites (Lancaster, SAR, NIKHEF, Prague) • Improve brokering algorithm Computing Division Project Status Report

  21. JIM Issues • Site operational requirements (e.g. clock synch, disk & node reliability, OS issues) • Experiment operational requirements (e.g. code footprint may exceed site capability and is variable w/ release) • File transfer capabilities & policies: cf. mtg this week w/ GridKA rep • Allocation of services to head node vs worker nodes • Sandboxing mechanisms (last week design mtg) • Merging mechanism, brokering (this week design mtg) Computing Division Project Status Report

  22. Operational Model • Experiments provide shifters for 1st line problem fielding and solving • Project provides on-call list from developers • At DØ, on average ~60 – 80% of problems are answered by shifters • Classes of problems • Routine jobs like adding info to database • Less routine: cleanup after failed stores • Answering user questions regarding usage • Updating documentation • Investigating user reports of problems, and problems visible in project monitoring tools • Providing solutions for problems Computing Division Project Status Report

  23. Operations Outlook • Improve documentation with aim of improving shifter & user ability to diagnose/solve problems • Expect doubling of central station capacity at DØ • Expect transition to more SAM usage at CDF • Expect Grid operations in production for simulation, first at DØ then at CDF Computing Division Project Status Report

More Related