1 / 8

Planned Machines: ASCI Purple, ALC and M&IC MCR

Planned Machines: ASCI Purple, ALC and M&IC MCR. Presented to SOS7 Mark Seager seager@llnl.gov 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore National Laboratory.

nhu
Download Presentation

Planned Machines: ASCI Purple, ALC and M&IC MCR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager seager@llnl.gov 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore National Laboratory This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.

  2. Q1: What is unique in structure and function of your machine? • Purple’s unique structure is fat SMPs with 16 rails of Federation interconnect • MCR+ALC’s unique structure is the shared global file system • However, most important point is that applications are highly mobile between Purple, MCR+ALC, White, Q and other clusters of SMP systems…

  3. NFS Login NFS Login NFS Login NFS Login Login Net Login Net Login Net Login Net I/O I/O I/O I/O I/O I/O I/O I/O Purple’s unique structure is fat SMPs with 16 rails of interconnect Fibre Channel 2 I/O Network … 16 Federation links per SMP in four switch planes System Data and Control Networks System Data and Control Networks System Data and Control Networks … 191 Parallel Batch/Interactive/Visualization Nodes • Purple System • 100 TF/s + 30-45 TF/s delivered on sPPM+UMT2000 • 50 TB memory, 2.0 PB of disk @ 108 GB/s delivered • 197 x 64-way Armada SMP w 16 Federation Links • 4 Login/network nodes • Login/network nodes for login/NFS • 8x10 Gb/s for parallel FTP on each Login • All external networking is 1-10 Gb/s Ethernet • Clustered I/O services for cluster wide file system • Fibre Channel2 I/O attach does not extend • Programming/Usage Model • Application launch over all compute nodes up to 8,192 tasks • 1 MPI task/CPU and Shared Memory, full 64b support • Scalable MPI (MPI_allreduce, buffer space) • Likely usage • multiple MPI tasks/node with 4-16 OpenMP/MPI task • Single STDIO interface • Parallel I/O to single file, multiple serial I/O (1 file/MPI task)

  4. 924 P4 Compute Nodes 1,116 P4 Compute Nodes 960 Port (10x96D32U+4x80D48U) QsNet Elan3 QsNet Elan3, 100BaseT Control 1,152 Port (10x96D32U+4x96D32U) QsNet Elan3 QsNet Elan3, 100BaseT Control 2 MetaData (fail-over) Servers 32 Gateway nodes @ 140 MB/s delivered Lustre I/O over 2x1GbE MDS MDS MDS MDS GW GW GW GW GW GW GW GW GW GW GW GW GW GW GW GW 2 Service GbEnet Federated Switch 2 Login nodes with 4 Gb-Enet OST OST OST OST OST OST OST OST OST OST OST OST OST OST OST OST Aggregated OST for Single Lustre file system OST OST OST OST OST OST OST OST OST OST OST OST OST OST OST OST GbEnet Federated Switch 2 Service Unique feature of ALC+MCR is Lustre Lite shared file system† †Cluster wide file system leverages DOE/NNSA ASCI PathForward Open Source Lustre development

  5. Q2: What characterizes your applications? Examples are: Intensities of message passing, memory utilization, computing, IO, and data. • Applications characterized as multi-physics package simulations • All applications compute/comms intensive • Each package pushes performance envelope along a different dimension • Some packages are MPI latency dominated • Some packages are MPI BW dominated • Memory BW is critical factor, but expensive memory subsystems don’t perform much better than commodity ones…

  6. Q3: What prior experience guided you to this choice? • Mission and Applications • Budgets • Politics • Delivered performance • Balanced risk and cost performance

  7. Strategic Approach: straddle multiple curves to balance risk and opportunity of new disruptive technologies • Any given technology curve is ultimately limited by Moore’s Law • Three complementary curves… • Delivers to today’s stockpile’s demanding needs • Production environment • For “must have” deliverables now • Delivers transition for next generation • “Near production” environment • Provides cycles for science • Provides cycles for stockpile • Leading to next generation production systems • These are the capacity systems in a strategic capacity/capability mix • Delivers affordable path to petaFLOP/s • Research environment, leading transition to petaflop systems? • Are there other paths to a breakthrough regime by 2006-7? Cell-Based (IBM BG/L) IA32/ IA64/AMD + Linux Vendor integrated SMP Cluster (IBM SP, HP SC) $170K/TF $7M/TF (Q) $ 500K /TF Mainframes (RIP) Performance $2M/TF (Purple C) $1.2M/TF (MCR) Straddle strategy for stability and preeminence $10 M/TF (White) Today Time FY05

  8. Q4. Other than your own machine, for your needs what are the best and worst machines? And, why? • Clusters of SMPs with full node OS makes system administration and programming much easier, but scalability is an issue • Vectors suck • 10x potential speed-up from vectorization on Cray YMP class machines yielded only 1.5-2x in delivered performance boost to stockpile codes

More Related