Building Large Scale Fabrics – A Summary

Building Large Scale Fabrics – A Summary Marcel Kunze, FZK

Observation • Everybody seems to need unprecedented amount of CPU, Disk and Network b/w • Trend to PC based computing fabrics and commodity hardware • LCG (CERN), L. Robertson • CDF (Fermilab), M. Neubauer • D0 (FermiLab), I. Terekhov • Belle (KEK), P. Krokovny • Hera-B (DESY), J. Hernandez • Ligo, P. Shawhan • Virgo, D. Busculic • AMS, A.Klimentov • Considerable savings in cost wrt. RISC based farm:Not enough ‘bang for the buck’ (M. Neubauer) Marcel Kunze - FZK

AMS02 Benchmarks 1) Executive time of AMS “standard” job compare to CPU clock 1) V.Choutko, A.Klimentov AMS note 2001-11-01 Marcel Kunze - FZK

Fabrics and Networks: Commodity Equipment Needed for LHC at CERN in 2006: Storage Raw recording rate 0.1 – 1 GB/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing 200’000 of today’s (2001) fastest PCs Networks 5-10 Gbps between main Grid nodes Distributed computing effort to avoid congestion: 1/3 at CERN 2/3 elsewhere Marcel Kunze - FZK

PC Cluster 5 (Belle) 1U server Pentium III 1.2GHz 256 CPU (128 nodes) Marcel Kunze - FZK

3U PC Cluster 6 Blade server: LP Pentium III 700MHz 40CPU (40 nodes) Marcel Kunze - FZK

Disk Storage Marcel Kunze - FZK

IDE Performance Marcel Kunze - FZK

Basic Questions • Compute farms contain several 1000s of computing elements • Storage farms contain 1000s of disk drives • How to build scalable systems ? • How to build reliable systems ? • How to operate and maintain large fabrics ? • How to recover from errors ? • EDG deals with the issue (P. Kunszt) • IBM deals with the issue (N. Zheleznykh) • Project Eliza: Self healing clusters • Several ideas and tools are already on the market Marcel Kunze - FZK

Storage Scalability • Difficult to scale up to systems of 1000s of components and keep single system image:NFS-Automounter, Symbolic links etc. (M.Neubauer, CAF: ROOTD does not need this and allows for direct worldwide access to distributed files w/o mounts) • Scalability in size and throughput by means of storage virtualisation • Allows to set up non-TCP/IP based systems to handle multi-GB/s Marcel Kunze - FZK

Internet Intranet Virtualisation of Storage Data Servers mount virtual storage as SCSI-Device Input Load balancing switch Shared Data Access (Oracle, PROOF) Storage Area Network (FCAL, InfiniBand,…) 200 MB/s sustained Scalability Marcel Kunze - FZK

Storage Elements(M. Gasthuber) • PNFS = Perfectly Normal FileSystem • Store MetaData with the Data • 8 hierarchies of file tags • Migration of data (hierarchical storage systems): dCache • Development of DESY and FermiLab • ACLs, Kerberos, ROOT-aware • Web-Monitoring • Cached as well as direct tape access • Fail-safe Marcel Kunze - FZK

Necessary admin. Tools(A. Manabe) • System (SW) Installation /update • Dolly++ (Image cloning) • Configuration • Arusha (http://ark.sourceforge.net) • LCFGng (http://www.lcfg.org) • Status Monitoring/ System Health Check • CPU/memory/disk/network utilization: Ganglia*1,plantir*2 • (Sub-)system service sanity check: Pikt*3/Pica*4/cfengine*1 http://ganglia.sourceforge.net *2 http://www.netsonde.com*3 http://pikt.org *4 http://pica.sourceforge.net/wtf.html • Command Execution • WANI: WEB base remote command executer Marcel Kunze - FZK

WANI is implemented on `Webmin’ GUI Start Command input Node selection Marcel Kunze - FZK

Command execution result Host name Results from 200nodes in 1 Page Marcel Kunze - FZK

Stdout output Click here Click here Stderr output Marcel Kunze - FZK

CPU Scalability • The current tools scale up to ~1000 CPUs(In the previous example 10000 CPUs would require to check 50 pages) • Autonomous operation required • Intelligent self-healing clusters Marcel Kunze - FZK

Resource Scheduling • Problem: How to access local resources from the Grid ? • Local batch queues vs. Global batch queues • Extension of Dynamite (Amsterdam university) to work with Globus: Dynamite-G (I. Shoshmina) • Open Question: How do we deal with interactive applications on the Grid ? Marcel Kunze - FZK

Conclusions • A lot of tools exist • A lot of work needs yet to be done in the Fabric area in order to get reliable, scalable systems Marcel Kunze - FZK

Building Large Scale Fabrics – A Summary

Building Large Scale Fabrics – A Summary

Presentation Transcript

Surviving Large Scale Internet Outages

Map Scale and Generalization Concepts

The Cloud Resolving Storm Simulator: Large-scale Parallel Computations

Large Scale Integrated Circuits

Design and Analysis of Large Scale Log Studies A CHI 2011 course v11

The Cloud Resolving Storm Simulator: Large-scale Parallel Computations

Building Web Scale for Libraries

Using Large-Scale Climate Information to Forecast Seasonal Streamflows in the Truckee and Carson Rivers

Large-Scale Financial Risk Management Services

Large-Scale Copy Detection

Introduction to Large Scale Modeling Systems

Large Scale Machine Learning for Content Recommendation and Computational Advertising

Scalability and Efficiency Challenges in Large-Scale Web Search Engines

Direct synthesis of large-scale asynchronous controllers using a Petri-net-based approach

Routing in Large Scale Ad Hoc and Sensor Networks

Unit Four

Geologic Time Scale

Building Web Scale for Libraries

STRING Large-scale data and text mining

Large Scale Studies of Dyslexia in Florida

Summary on the session :Hardware and Computing Fabrics