1 / 32

Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994. Frédéric Hemmer Computing & Networks Division CERN, Geneva, switzerland. CERN - The European Laboratory for Particle Physics. Fundamental research in particle physics Designs, builds & operates large accelerators

iren
Download Presentation

Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyse de Physique sur machines RISC : expériences au CERNSACLAY20 JUIN 1994 Frédéric HemmerComputing & Networks DivisionCERN, Geneva, switzerland

  2. CERN - The European Laboratory for Particle Physics • Fundamental research in particle physics • Designs, builds & operates large accelerators • Financed by 19 European countries • SFR 950M budget - operation + new accelerators • 3,000 staff • Experiments conducted by a small number of large collaborations: 400 physicists, 50 institutes, 18 countriesusing experimental apparatus costing 100s of MSFR

  3. Computing at CERN • computers are everywhere • embedded microprocessors • 2,000 personal computers • 1,400 scientific workstations • RISC clusters, even mainframes • estimate 40 MSFR per year (+ staff)

  4. Central Computing Services • 6,000 users • Physics data processing traditionally: mainframes + batchemphasis on:reliability, utilisation level • Tapes: 300,000 active volumes 22,000 tape mounts per week

  5. Application Characteristics • inherent coarse grain parallelism (at event or job level) • Fortran • modest floating point content • high data volumes • disks • tapes, tape robots • moderate, but respectable, data rates - a few MB/sec per fast RISC cpu Obvious candidate for RISC clusters A major challenge

  6. CORE - Centrally Operated Risc Environment • Single management domain • Services configured for specific applications, groupsbut common system management • Focus on data - external access to tape and disk services from CERN network, or even outside CERN

  7. equipment installed or on order Jamuary 1994 CSF Simulation Facility CERN CORE Physics Services SHIFTData intensive services Central Data Services Shared Tape Servers 25 H-P 9000-735 H-P 9000-750 3 tape robots 21 tape drives 6 EXABYTEs 7 IBM, SUN servers Processors: 24 SGI; 11 DEC Alpha; 9 H-P; 2 SUN; 1 IBM Embedded disk: 1.1 TeraBytes Shared Disk Servers PIAF - Interactive Analysis Facility 260 GBytes 6 SGI, DEC, IBM servers 5 H-P 9000-755 100 GB RAID disk Home directories& registry Scalable Parallel Processors • 8 node SPARCcenter • 32 node Meiko CS-2 • (Early 1994) SPARCservers Baydel RAID disks tape juke box CERN Network consoles &monitors SPARCstations les robertson /cn

  8. equipment installed or on order Jamuary 1994 CSF Simulation Facility CERN CORE Physics Services SHIFTData intensive services Central Data Services Shared Tape Servers 25 H-P 9000-735 H-P 9000-750 3 tape robots 21 tape drives 6 EXABYTEs 7 IBM, SUN servers Processors: 24 SGI; 11 DEC Alpha; 9 H-P; 2 SUN; 1 IBM Embedded disk: 1.1 TeraBytes Shared Disk Servers PIAF - Interactive Analysis Facility 260 GBytes 6 SGI, DEC, IBM servers 5 H-P 9000-755 100 GB RAID disk Home directories& registry Scalable Parallel Processors • 8 node SPARCcenter • 32 node Meiko CS-2 • (Early 1994) SPARCservers Baydel RAID disks tape juke box CERN Network consoles &monitors SPARCstations les robertson /cn

  9. equipment installed or on order Jamuary 1994 CSF Simulation Facility CERN CORE Physics Services SHIFTData intensive services Central Data Services Shared Tape Servers 25 H-P 9000-735 H-P 9000-750 3 tape robots 21 tape drives 6 EXABYTEs 7 IBM, SUN servers Processors: 24 SGI; 11 DEC Alpha; 9 H-P; 2 SUN; 1 IBM Embedded disk: 1.1 TeraBytes Shared Disk Servers PIAF - Interactive Analysis Facility 260 GBytes 6 SGI, DEC, IBM servers 5 H-P 9000-755 100 GB RAID disk Home directories& registry Scalable Parallel Processors • 8 node SPARCcenter • 32 node Meiko CS-2 • (Early 1994) SPARCservers Baydel RAID disks tape juke box CERN Network consoles &monitors SPARCstations les robertson /cn

  10. equipment installed or on order Jamuary 1994 CSF Simulation Facility CERN CORE Physics Services SHIFTData intensive services Central Data Services Shared Tape Servers 25 H-P 9000-735 H-P 9000-750 3 tape robots 21 tape drives 6 EXABYTEs 7 IBM, SUN servers Processors: 24 SGI; 11 DEC Alpha; 9 H-P; 2 SUN; 1 IBM Embedded disk: 1.1 TeraBytes Shared Disk Servers PIAF - Interactive Analysis Facility 260 GBytes 6 SGI, DEC, IBM servers 5 H-P 9000-755 100 GB RAID disk Home directories& registry Scalable Parallel Processors • 8 node SPARCcenter • 32 node Meiko CS-2 • (Early 1994) SPARCservers Baydel RAID disks tape juke box CERN Network consoles &monitors SPARCstations les robertson /cn

  11. equipment installed or on order Jamuary 1994 CSF Simulation Facility CERN CORE Physics Services SHIFTData intensive services Central Data Services Shared Tape Servers 25 H-P 9000-735 H-P 9000-750 3 tape robots 21 tape drives 6 EXABYTEs 7 IBM, SUN servers Processors: 24 SGI; 11 DEC Alpha; 9 H-P; 2 SUN; 1 IBM Embedded disk: 1.1 TeraBytes Shared Disk Servers PIAF - Interactive Analysis Facility 260 GBytes 6 SGI, DEC, IBM servers 5 H-P 9000-755 100 GB RAID disk Home directories& registry Scalable Parallel Processors • 8 node SPARCcenter • 32 node Meiko CS-2 • (Early 1994) SPARCservers Baydel RAID disks tape juke box CERN Network consoles &monitors SPARCstations les robertson /cn

  12. CSF - Central Simulation Facility • second generation, joint project with H-P ethernet • interactive host • job queues • shared, • load balanced • H-P 750 tape servers FDDI • 25 H-P 735s - 48 MB memory, 400MB disk • one job per processor • generates data on local disk • staged out to tape at end of job • long jobs (4 to 48 hours) • very high cpu utilisation : >97% • very reliable : > 1 month MTBI

  13. SHIFTScalable, Heterogeneous, Integrated, Facility • Designed in 1990 • fast access to large amounts of disk data • good tape support • cheap & easy to expand • vendor independent • mainframe quality • First implementation in production within 6 months

  14. Design choices • Unix + TCP/IP • system-wide batch job queues “single system image” target Cray style & service quality • pseudo distributed file system assumes no read/write file sharing • distributed tape staging model (disk cache of tape files) • the tape access primitives are copy disk file to tape copy tape file to disk

  15. disk servers cpu servers stage servers queue servers tape servers The Software Model IP network • Define functional interfaces ---- scalable • heterogeneous • distributed

  16. Basic Software } • Unix Tape Subsystem • (multi-user, labels, multi-file, operation) • Fast Remote File Access System • Remote Tape Copy System • Disk Pool Manager • Tape Stager • Clustered NQS batch system • Integration with standard I/O packages • FATMEN, RZ, FZ, EPIO, .. • Network Operation • Monitoring

  17. Unix Tape Control • tape daemon • operator interface / robot interface • tape unit allocation / deallocation • label checking, writing

  18. Remote Tape Copy System • selects a suitable tape server • initiates the tape-disk copy tpread -v CUT322 -g SMCF -q 4,6 pathname tpwrite -v IX2857 -q 3-5 file 3 file4 file5 tpread -v UX3465 `sfget -p opaldst file34`

  19. Remote File Access System - RFIO high performance, reliability (improve on NFS) • C I/O compatibility library Fortran subroutine interface • rfio daemon started by open on remote machine • optimised for specific networks • asynchronous operation (read ahead) • optional vector pre-seek • ordered list of the records which will probably be read next

  20. a disk pool is a collection of Unix file systems, possibly on several nodes, viewed as a single chunk of allocatable space sgi1 dec24 disk pool sun5

  21. Disk Pool Management • allocation of files to pools • pools can be public or private • and filesystems • capacity management • name server • garbage collection • pools can be temporary or permanent • example: • sfget -p opaldst file26 • may create file like: • /shift/shd01/data6/ws/panzer/file26

  22. Tape Stager • implements a disk cache of magnetic tape files • integrates: Remote Tape Copy System & Disk Pool Management • queues concurrent requests for same tape file • provides full error recovery - restage &/or operator control on hardware/system error initiate garbage collection if disk full • supports disk pools & single (private) file systems • available from any workstation

  23. Tape Stager independent stage control for each disk pool stage control sfget file tpread tape, file tape server rtcopy tape, file RFIO disk server cpu server (user job) stagein tape, file

  24. SHIFT Statusequipment installed or on order January 1994 CERN group configuration -- capacity -- cpu(CU*) disk(GB) OPAL SGI Challenge 4-cpu + 8-cpu (R4400 - 150 MHz) 290 590 Two SGI 340S 4-cpu (R3000 - 33MHz) ALEPHSGI Challenge 4-cpu (R4400 - 150MHz) 216 200 Eight DEC 9000-400 DELPHI Two H-P 9000/735 52 200 L3 SGI Challenge 4-cpu (R4400 - 150MHz) 80 300 ATLAS H-P 9000/755 26 23 CMS H-P 9000/735 26 23 SMC SUN SPARCserver10, 4/630 22 4 CPLEAR DEC 3000-300AXP, 500AXP 29 10 CHORUS IBM RS/6000-370 15 15 NOMAD DEC 3000-500 AXP 19 15 Totals 775 1380 * CERN-Units:one CU equals approx. 4 SPECints (CERN IBM mainframe 120 600)

  25. Current SHIFT Usage • 60% cpu utilisation • 9,000 tape mounts per week, 15% write still some way from holding the active data on disk • MTBI - cpu and disk servers 400 hours for an individual server • MTBF for disks: 160K hours maturing service, but does not yet surpass the quality of the mainframe

  26. CORE Networking Ethernet+ Fibronics hubs - aggregate 2 MBytes/sec sustained FDDI + GigaSwitch - 2-3 MBytes sustained Simulation service UltraNet 1 Gbps backbone 6 MBytes/sec sustained IBM mainframe SHIFT cpu servers Home directories SHIFT disk servers SHIFT tape servers connection to CERN & external networks

  27. FDDI Performance(September 1993) 100 MByte disk file read/written sequentially using 32KB records client: H-P 735 server: SGI Crimson, SEAGATE Wren 9 disk system read write NFS 1.6 MB/sec 300 KB/sec RFIO 2.7 MB/sec 1.7 MB/sec

  28. PIAF - Parallel Interactive Data Analysis Facility(R.Brun, A.Nathaniel, F.Rademakers CERN) • the data is “spread” across the interactive server cluster • the user formulates a transaction on his personal workstation • the transaction is executed simultaneously on all servers • the partial results are combined and returned to the user’s workstation

  29. PIAF Architecture user personal workstation display manager PIAF client PIAF Service PIAF server PIAF worker PIAF worker PIAF worker PIAF worker PIAF worker

  30. Scalable Parallel Processors • embarrassingly parallel application -therefore in competition with workstation clusters • SMPs and SPPs should do a better job for SHIFT than loosely coupled clusters • computing requirements will increase by three orders of magnitude over next ten years • R&D project started, funded by ESPRIT - GPMIMD2 32 processor Meiko CS-2 25 man-years development

  31. Conclusion • Workstation clusters have replaced mainframes at CERN for physics data processing • For the first time, we see computing budgetscome within reach of the requirements • Very large, distributed & scalable disk and tape configurations can be supported • Mixed manufacturer environments work, and allow smooth expansion of the configuration • Network performance is the biggest weakness in scalability • Requires a different operational style & organisation from mainframe services

  32. Operating RISC machines • SMP’s easier to manage • SMP’s requires less manpower • Distributed management not yet robust • Network is THE problem • Much easier than mainframes, and • ... cost effective

More Related