1 / 16

THE INFN GRID PROJECT

THE INFN GRID PROJECT.

dlawhorn
Download Presentation

THE INFN GRID PROJECT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE INFN GRID PROJECT • Scope: Study and develop a general INFN computing infrastructure, based on GRID technologies, to be validated (as first use case) implementing distributed Regional Center prototypes for LHC expts: ATLAS, CMS, ALICE and, later on, also for other INFN expts (Virgo, Gran Sasso ….) • Project Status: • Outline of proposal submitted to INFN management 13-1-2000 • 3 Year duration • Next meeting with INFN management 18th of February • Feedback documents from LHC expts by end of February (sites, FTEs..) • Final proposal to INFN by end of March

  2. INFN & “Grid Related Projects” • Globus tests • “Condor on WAN” as general purpose computing resource • “GRID” working group to analyze viable and useful solutions (LHC computing, Virgo…) • Global architecture that allows strategies for the discovery, allocation, reservation and management of resource collection • MONARC project related activities

  3. Evaluation of the Globus ToolKit • 5 sites Testbed (Bologna, CNAF, LNL, Padova, Roma1) • Use case: HTL CMS studies • MC Prod.  Complete HLT chain • Services to test/implement • Resource management • fork()  Interface to different local resource managers (Condor, LSF) • Resources chosen by hand  Smart Broker to implement a Global resource manager • Data Mover (Gass, Gsiftp…) • to stage executable and input files • to retrieve output files • Bookkeeping (Is this a worth a general tool ?)

  4. Use Case: CMS HLT studies

  5. Status • Globus installed in 5 Linux PCs in 3 sites • Globus Security Infrastructure • works !! • MDS • Initial problems accessing data (long response time and time out) • GRAM, GASS, Gloperf • Work in progress

  6. Condor on WAN Objectives • Large INFN project of the Computing Commission involving ~20 sites • INFN collaboration with Condor Team UWISC • I goal: Condor “tuning” on WAN • verify Condor reliability and robustness in Wide Area Network environment • Verify suitability to INFN computing needs • Network I/O impact and measures

  7. II goal: Network as a Condor Resource • Dynamic checkpointing and Checkpoint domain configuration • Pool partitioned in checkpoint domains (a dedicated ckpt server for each domain) • Definition of a checkpoint domain according: • Presence of a sufficiently large CPU capacity • Presence of a set of machines with an efficient network connectivity • Sub-pools

  8. Checkpointing: next step • Distributed dynamic checkpointing • Pool machines select the “best” checkpoint server (from a network view) • Association between execution machine and checkpoint server dynamically decided

  9. Implementation Characteristics of the INFN Condor pool: • Single pool • To optimize CPU usage of all INFN hosts • Sub-pools • To define policies/priorities on resource usage • Checkpoint domains • To guarantee the performance and the efficiency of the system • To reduce network traffic for checkpointing activity

  10. CKPT domain # hosts Default CKPT domain @ Cnaf USA INFN Condor Pool on WAN: checkpoint domains EsNet 155Mbps 15 TRENTO 4 10 40 UDINE GARR-B Topology 155 Mbps ATM based Network access points (PoP) main transport nodes MILANO TORINO PADOVA LNL TRIESTE 15 FERRARA PAVIA 10 GENOVA 65 Central Manager PARMA CNAF 3 BOLOGNA PISA 1 FIRENZE S.Piero 6 PERUGIA LNGS 10 3 ROMA 5 L’AQUILA ROMA2 LNF 3 SASSARI NAPOLI 15 BARI 2 LECCE SALERNO 2 T3 CAGLIARI COSENZA ~180 machines  500-1000 machines 6 ckpt servers  25 ckpt servers 5 PALERMO CATANIA LNS USA

  11. Management • Central management (condor-admin@infn.it) • Local management (condor@infn.it) • Steering committee • software maintenance contract with Condor_support team of University of Madison

  12. INFN-GRID project requirements Networked Workload Management: • Optimal co-allocation of data and CPU and network for a specific “grid/network-aware” job • distributed scheduling (data and/or code migration) • unscheduled/ scheduled job submission • Management of heterogeneous computing systems • Uniform interface to various local resource managers and schedulers • Priorities, policies on resource (CPU, Data, Network) usage • bookkeeping and ‘web’ user interface

  13. Project req. (cont.) Networked Data Management: • Universal name space: transparent, location independent • Data replication and caching • Data mover (scheduled/interactive at OBJ/file/DB granularity) • Loose synchronization between replicas • Application Metadata, interfaced with DBMS, i.e. Objectivity, … • Network services definition for a given application • End systems network protocol tuning

  14. Project req. (cont.) Application Monitoring/Management: • Performance, “instrumented systems” with timing information and analysis tools • Run-time analysis of collected application events • Bottleneck analysis • Dynamic monitoring of GRID resources to optimize resource allocation • Failure management

  15. Project req. (cont.) Computing Fabric and general utilities for a global managed Grid: • Configuration management of computing facilities • Automatic software installation and maintenance • System, service, network monitoring and global alarm notification, automatic recovery from failures • resource use accounting • Security of GRID resources and infrastructure usage • Information service

  16. Grid Tools

More Related