1 / 12

CASPUR Site Report

CASPUR Site Report. Andrei Maslennikov Group Leader - Systems RAL, April 1999. Will be shortly covered:. Central computers Other nodes Network Distributed storage Tape-related systems CASPUR and HEP Gentes/Ateneo project Short-term plans.

zelig
Download Presentation

CASPUR Site Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CASPUR Site Report Andrei Maslennikov Group Leader - Systems RAL, April 1999

  2. Will be shortly covered: • Central computers • Other nodes • Network • Distributed storage • Tape-related systems • CASPUR and HEP • Gentes/Ateneo project • Short-term plans A.Maslennikov - HEPiX - RAL 99

  3. Central computers • Alpha SMP Cluster 4100 - 28 processors - DU 4.0d • interactive (front-end) : 1 x 400Mhz/2Gb • parallel batch (LSF) : 4 x 400Mhz/1Gb + 2 x 600Mhz/2Gb • 1999: 20 more EV6 processors (or upgrade to), 32-proc “wildfire”? • Sun SMP - 22 processors - Solaris 2.6 • interactive + parallel batch (LSF) : 1 x 3500/336Mhz/2Gb (8 processors) • parallel batch (LSF) : 1 x 4500/336Mhz/3.6GB (14 processors) • 1999: waiting for new SMP models • IBM SP2 - 32 processors - AIX 4.3.2++/PSSP2.4++ • interactive : 4 thin nodes (390) • serial batch (LSF) : 12 thin nodes • parallel batch+interactive (EASY) : 16 thin nodes • 1999: waiting for SP3 offer (need SMP nodes with 4-16 proc) A.Maslennikov - HEPiX - RAL 99

  4. Other nodes • Some 200 UNIX nodes under our direct supervision • (all UNIX flavours, single nodes and clusters). • Around 100 PCs running Windows and Linux. • Worth mentioning: • - Linux Beowulf Cluster (10 PPro 200 + 4 PII 400) • (MPI with GAMMA protocol on Digital FE cards) • - Graphics nodes: 2 Alpha 533au(2) with 4D51T • and 4D60T cards with 64 MB of texture memory; • - 2 Power-3 biprocessor AIX nodes A.Maslennikov - HEPiX - RAL 99

  5. Network • In 1998 our LAN became fully switched, currently • we have around 100 100baseT switch ports. • Switch hardware: several Cabletron and Compaq switches • interconnected via Gigabit Ethernet; we also use virtual LANs • Principal nodes are on FDDI (22 DEC GigaSwitch ports) • Planning to try Gigabit Ethernet at host level, • few GE cards are already under test on Sun and Linux A.Maslennikov - HEPiX - RAL 99

  6. Distributed Storage • TCP/IP-less datastore with true data sharing across platforms • is not yet available. So we are still investing in both NFS and • AFS solutions. • NFS is mainly used as a store for large data files, and as an • element of the Staging System. • AFS is used for home directories and as a store for collections • of various ready-to-run software. We currently run 6 cells with • some 300 Gb online, also over WAN. A.Maslennikov - HEPiX - RAL 99

  7. NFS: one more Filer • Current NFS Server: F540 Network Appliance Filer with • 150 Gb of formatted RAID space on FE and FDDI. • Just ordered: another Filer (F760/600Mhz/1Gb) with • 300 Gb of RAID disk and GE/FDDI network interfaces • - 3 times more NFSops/sec than F540 • - allows for clustering (better scaleability) A.Maslennikov - HEPiX - RAL 99

  8. AFS: news since last report • Purchased AFS Source Code. This allowed us to compile AFS on Solaris/Intel • (thanks to Rainer Toebbicke /CERN who proved that this is possible). • University of Rome-3 went Solaris/Intel also for DB (3 servers). • Abdus Salam Centre for Theoretical Physics joined our AFS License. • Upgraded central servers (now 3 Alpha 500au on FE and FDDI). • Proved to be very stable and performant. • We go Fibre Channel! • - Just ordered 280 Gb of RAID-5/FC from Artecon • - Dual active-active controllers • - Gadzoox hubs and HBAs from Genroco • - This system will be replacing most of the on-site AFS disks. A.Maslennikov - HEPiX - RAL 99

  9. Tape access • During l998, all services which use the tape robotics • operated steamlessly: AFS and ADSM backups, staging. • Some 80 Gb were deeply archived via the Staging System. • With F540 Filer we stage at 4+ Mbytes/sec, almost at the • limit of Timberline tape. • In 1999 we plan to replace the STK Silo with 9840 library: • - doubles the tape speed • - BABAR-compliant • - smaller maintenance fees • - frees the physical space in computer centre. A.Maslennikov - HEPiX - RAL 99

  10. CASPUR and HEP • Geographical AFS system support for INFN • Regular ASIS mirroring over WAN to 17 INFN Sections across Italy • Linux system support for INFN. • - Linux tree maintenance • - AFS-enabled bootable Linux CDs at the latest patchlevel. • Software collaboration with CERN (ASIS, Linux, AFS). • Regional Centre for BABAR: fullscale system support. A.Maslennikov - HEPiX - RAL 99

  11. Gentes/Ateneo project • Scope: provide a turnkey computing environment for a generic research organization / university department. • Fully Intel-based • Desktop on Linux and/or WNT • Just 4 Intel machines make into a core: • - Entry Point Linux host with a firewall • - AFS fileserver on Solaris • - Management Linux host with YARD dbms and https tools • - General Services (mail,web,print,efax,ppp,majordomo etc) • on a single Linux (SMP) machine • WNT/Linux AFS-based integration: single password, • common filestore, YARD ODBC • Client installation: cloning with Norton Ghost • Progressing well. First presentation: June 1999. A.Maslennikov - HEPiX - RAL 99

  12. Some short-term plans • Compile AFS 3.5 Server on Solaris/Intel • - will improve performance for en masse serving of small files • Test FC on Linux (QLogic card) • - first to provide a RAID space for mail spool • - next to take a look at Global File System (w. Seagate disks) • Test FC on AIX • - CASPUR will be probably asked to propose a set of • high availability services for PCM; IBM DFS with • FC RAID might make into a good combination. • Try LoadLeveler on Solaris • - LSF becomes too expensive (they charge per CPU) A.Maslennikov - HEPiX - RAL 99

More Related