1 / 19

Building Large Scale Fabrics – A Summary

Building Large Scale Fabrics – A Summary. Marcel Kunze, FZK . Observation. Everybody seems to need unprecedented amount of CPU, Disk and Network b/w Trend to PC based computing fabrics and commodity hardware LCG (CERN), L. Robertson CDF (Fermilab), M. Neubauer D0 (FermiLab), I. Terekhov

finley
Download Presentation

Building Large Scale Fabrics – A Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Large Scale Fabrics – A Summary Marcel Kunze, FZK

  2. Observation • Everybody seems to need unprecedented amount of CPU, Disk and Network b/w • Trend to PC based computing fabrics and commodity hardware • LCG (CERN), L. Robertson • CDF (Fermilab), M. Neubauer • D0 (FermiLab), I. Terekhov • Belle (KEK), P. Krokovny • Hera-B (DESY), J. Hernandez • Ligo, P. Shawhan • Virgo, D. Busculic • AMS, A.Klimentov • Considerable savings in cost wrt. RISC based farm:Not enough ‘bang for the buck’ (M. Neubauer) Marcel Kunze - FZK

  3. AMS02 Benchmarks 1) Executive time of AMS “standard” job compare to CPU clock 1) V.Choutko, A.Klimentov AMS note 2001-11-01 Marcel Kunze - FZK

  4. Fabrics and Networks: Commodity Equipment Needed for LHC at CERN in 2006: Storage Raw recording rate 0.1 – 1 GB/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing 200’000 of today’s (2001) fastest PCs Networks 5-10 Gbps between main Grid nodes Distributed computing effort to avoid congestion: 1/3 at CERN 2/3 elsewhere Marcel Kunze - FZK

  5. PC Cluster 5 (Belle) 1U server Pentium III 1.2GHz 256 CPU (128 nodes) Marcel Kunze - FZK

  6. 3U PC Cluster 6 Blade server: LP Pentium III 700MHz 40CPU (40 nodes) Marcel Kunze - FZK

  7. Disk Storage Marcel Kunze - FZK

  8. IDE Performance Marcel Kunze - FZK

  9. Basic Questions • Compute farms contain several 1000s of computing elements • Storage farms contain 1000s of disk drives • How to build scalable systems ? • How to build reliable systems ? • How to operate and maintain large fabrics ? • How to recover from errors ? • EDG deals with the issue (P. Kunszt) • IBM deals with the issue (N. Zheleznykh) • Project Eliza: Self healing clusters • Several ideas and tools are already on the market Marcel Kunze - FZK

  10. Storage Scalability • Difficult to scale up to systems of 1000s of components and keep single system image:NFS-Automounter, Symbolic links etc. (M.Neubauer, CAF: ROOTD does not need this and allows for direct worldwide access to distributed files w/o mounts) • Scalability in size and throughput by means of storage virtualisation • Allows to set up non-TCP/IP based systems to handle multi-GB/s Marcel Kunze - FZK

  11. Internet Intranet Virtualisation of Storage Data Servers mount virtual storage as SCSI-Device Input Load balancing switch Shared Data Access (Oracle, PROOF) Storage Area Network (FCAL, InfiniBand,…) 200 MB/s sustained Scalability Marcel Kunze - FZK

  12. Storage Elements(M. Gasthuber) • PNFS = Perfectly Normal FileSystem • Store MetaData with the Data • 8 hierarchies of file tags • Migration of data (hierarchical storage systems): dCache • Development of DESY and FermiLab • ACLs, Kerberos, ROOT-aware • Web-Monitoring • Cached as well as direct tape access • Fail-safe Marcel Kunze - FZK

  13. Necessary admin. Tools(A. Manabe) • System (SW) Installation /update • Dolly++ (Image cloning) • Configuration • Arusha (http://ark.sourceforge.net) • LCFGng (http://www.lcfg.org) • Status Monitoring/ System Health Check • CPU/memory/disk/network utilization: Ganglia*1,plantir*2 • (Sub-)system service sanity check: Pikt*3/Pica*4/cfengine*1 http://ganglia.sourceforge.net *2 http://www.netsonde.com*3 http://pikt.org *4 http://pica.sourceforge.net/wtf.html • Command Execution • WANI: WEB base remote command executer Marcel Kunze - FZK

  14. WANI is implemented on `Webmin’ GUI Start Command input Node selection Marcel Kunze - FZK

  15. Command execution result Host name Results from 200nodes in 1 Page Marcel Kunze - FZK

  16. Stdout output Click here Click here Stderr output Marcel Kunze - FZK

  17. CPU Scalability • The current tools scale up to ~1000 CPUs(In the previous example 10000 CPUs would require to check 50 pages) • Autonomous operation required • Intelligent self-healing clusters Marcel Kunze - FZK

  18. Resource Scheduling • Problem: How to access local resources from the Grid ? • Local batch queues vs. Global batch queues • Extension of Dynamite (Amsterdam university) to work with Globus: Dynamite-G (I. Shoshmina) • Open Question: How do we deal with interactive applications on the Grid ? Marcel Kunze - FZK

  19. Conclusions • A lot of tools exist • A lot of work needs yet to be done in the Fabric area in order to get reliable, scalable systems Marcel Kunze - FZK

More Related