1 / 27

NCAR’s Data-Centric Supercomputing Environment Yellowstone

NCAR’s Data-Centric Supercomputing Environment Yellowstone. December 21, 2011 Anke Kamrath, OSD Director/CISL anke @ ucar.edu. Overview. Strategy Moving from Process to Data-Centric Computing HPC/Data Architecture What we have today at ML What’s planned for NWSC

talia
Download Presentation

NCAR’s Data-Centric Supercomputing Environment Yellowstone

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCAR’s Data-Centric Supercomputing EnvironmentYellowstone December 21, 2011Anke Kamrath, OSD Director/CISLanke@ucar.edu

  2. Overview • Strategy • Moving from Process to Data-Centric Computing • HPC/Data Architecture • What we have today at ML • What’s planned for NWSC • Storage/Data/Networking – Data in Flight • WAN and High-Performance LAN • Central Filesystem Plans • Archival Plans NWSC Now Open!

  3. Evolving the Scientific Workflow • Common data movement issues • Time consuming to move data between systems • Bandwidth to archive system is insufficient • Lack of sufficient disk space • Need to evolve data management techniques • Workflow management systems • Standardize metadata information • Reduce/eliminate duplication of datasets (ie – CMIP5) • User Education • Effective methods for understanding workflow • Effective methods for streamlining workflow

  4. Traditional Workflow Process Centric Data Model

  5. Evolving Scientific Workflow Information Centric Data Model

  6. Current Resources @ Mesa Lab GLADE BLUEFIRE LYNX

  7. NWSC: Yellowstone Environment Yellowstone HPC resource, 1.55 PFLOPS peak GLADE Central disk resource 11 PB (2012), 16.4 PB (2014) Geyser & Caldera DAV clusters NCAR HPSS Archive 100 PB capacity ~15 PB/yr growth 1Gb/10Gb Ethernet (40Gb+ future) Data Transfer Services Science Gateways RDA, ESG Partner Sites Remote Vis XSEDE Sites High Bandwidth Low Latency HPC and I/O Networks FDR InfiniBand and 10Gb Ethernet

  8. NWSC-1 Resources in a Nutshell • Centralized Filesystems and Data Storage Resource (GLADE) • >90 GB/sec aggregate I/O bandwidth, GPFS filesystems • 10.9 PetaBytes initially -> 16.4 PetaBytes in 1Q2014 • High Performance Computing Resource (Yellowstone) • IBM iDataPlex Cluster with Intel Sandy Bridge EP† processors with Advanced Vector Extensions (AVX) • 1.552 PetaFLOPs – 29.8 bluefire-equivalents – 4,662 nodes – 74,592 cores • 149.2 TeraBytes total memory • Mellanox FDR InfiniBand full fat-tree interconnect • Data Analysis and Visualization Resource (Geyser & Caldera) • Large Memory System with Intel Westmere EX processors • 16 nodes, 640 cores, 16 TeraBytes memory, 16 NVIDIA Kepler GPUs • GPU-Computation/Vis System with Intel Sandy Bridge EP processors with AVX • 16 nodes, 128 SB cores, 1 TeraByte memory, 32 NVIDIA Kepler GPUs • Knights Corner System with Intel Sandy Bridge EP processors with AVX • 16 nodes, 128 SB cores, 992 KC cores, 1 TeraByte memory - Nov’12 delivery †“Sandy Bridge EP” is the Intel® Xeon® E5-2670 processor

  9. GLADE • 10.94 PB usable capacity  16.42 PB usable (1Q2014) Estimated initial file system sizes • collections ≈ 2 PB RDA, CMIP5 data • scratch ≈ 5 PB shared, temporary space • projects ≈ 3 PB long-term, allocated space • users ≈ 1 PB medium-term work space • Disk Storage Subsystem • 76 IBM DCS3700 controllers & expansion drawers • 90 2-TB NL-SAS drives/controller • add 30 3-TB NL-SAS drives/controller (1Q2014) • GPFS NSD Servers • 91.8 GB/s aggregate I/O bandwidth; 19 IBM x3650 M4 nodes • I/O Aggregator Servers (GPFS, GLADE-HPSS connectivity) • 10-GbE & FDR interfaces; 4 IBM x3650 M4 nodes • High-performance I/O interconnect to HPC & DAV • Mellanox FDR InfiniBand full fat-tree • 13.6 GB/s bidirectional bandwidth/node

  10. NCAR Disk Storage Capacity Profile

  11. YellowstoneNWSC High-Performance Computing Resource • Batch Computation • 4,662 IBM dx360 M4 nodes – 16 cores, 32 GB memory per node • Intel Sandy Bridge EP processors with AVX – 2.6 GHz clock • 74,592 cores total – 1.552 PFLOPs peak • 149.2 TB total DDR3-1600 memory • 29.8 Bluefire equivalents • High-Performance Interconnect • Mellanox FDR InfiniBand full fat-tree • 13.6 GB/s bidirectional bw/node • <2.5 µs latency (worst case) • 31.7 TB/s bisection bandwidth • Login/Interactive • 6 IBM x3650 M4 Nodes; Intel Sandy Bridge EP processors with AVX • 16 cores & 128 GB memory per node

  12. NCAR HPC Profile 30x Bluefire performance

  13. Geyser and CalderaNWSC Data Analysis & Visualization Resource • Geyser: Large-memory system • 16 IBM x3850 nodes – Intel Westmere-EX processors • 40 cores, 1 TB memory, 1 NVIDIA Kepler Q13H-3 GPU per node • Mellanox FDR full fat-tree interconnect • Caldera: GPUcomputation/visualization system • 16 IBM x360 M4 nodes – Intel Sandy Bridge EP/AVX • 16 cores, 64 GB memory, 2 NVIDIA Kepler Q13H-3 GPUs per node • Mellanox FDR full fat-tree interconnect • Knights Corner system (November 2012 delivery) • Intel Many Integrated Core (MIC) architecture • 16 IBM Knights Corner nodes • 16 Sandy Bridge EP/AVX cores, 64 GB memory, 1 Knights Corner adapter per node • Mellanox FDR full fat-tree interconnect

  14. ErebusAntarctic Mesoscale Prediction System (AMPS) 0° • IBM iDataPlex Compute Cluster • 84 IBM dx360 M4 Nodes; 16 cores, 32 GB • Intel Sandy Bridge EP; 2.6 GHz clock • 1,344 cores total – 28 TFLOPs peak • Mellanox FDR InfiniBand full fat-tree • 0.54 Bluefire equivalents • Login Nodes • 2 IBM x3650 M4 Nodes • 16 cores & 128 GB memory per node • Dedicated GPFS filesystem • 57.6 TB usable disk storage • 9.6 GB/sec aggregate I/O bandwidth 90° E 90° W 180° Erebus, on Ross Island, is Antarctica’s most famous volcanic peak and is one of the largest volcanoes in the world – within the top 20 in total size and reaching a height of 12,450 feet.

  15. Yellowstone Software • Compilers, Libraries, Debugger & Performance Tools • Intel Cluster Studio (Fortran, C++, performance & MPI libraries, trace collector & analyzer) 50 concurrent users • Intel VTune Amplifier XE performance optimizer 2 concurrent users • PGI CDK (Fortran, C, C++, pgdbg debugger, pgprof) 50 conc. users • PGI CDK GPU Version (Fortran, C, C++, pgdbg debugger, pgprof) for DAV systems only, 2 concurrent users • PathScale EckoPath (Fortran C, C++, PathDB debugger) 20 concurrent users • Rogue Wave TotalView debugger 8,192 floating tokens • IBM Parallel Environment (POE), including IBM HPC Toolkit • System Software • LSF-HPC Batch Subsystem / Resource Manager • IBM has purchased Platform Computing, Inc. (developers of LSF-HPC) • Red Hat Enterprise Linux (RHEL) Version 6 • IBM General Parallel Filesystem (GPFS) • Mellanox Universal Fabric Manager • IBM xCAT cluster administration toolkit

  16. NCAR HPSS Archive Resource • NWSC • Two SL8500 robotic libraries (20k cartridge capacity) • 26 T10000C tape drives (240 MB/sec I/O rate each) and T10000C media (5 TB/cartridge, uncompressed) initially; +20 T10000C drives ~Nov 2012 • >100 PB capacity • Current growth rate ~3.8 PB/year • Anticipated NWSC growth rate ~15 PB/year • Mesa Lab • Two SL8500 robotic libraries (15k cartridge capacity) • Existing data (14.5 PB): • 1st & 2nd copies will be ‘oozed’ to new media @ NWSC, begin 2012 • New data @ Mesa: • Disaster-recovery data only • T10000B drives & media to be retired • No plans to move Mesa Lab SL8500 libraries (more costly to move than to buy new under AMSTAR Subcontract) Plan to release an “AMSTAR-2” RFP 1Q2013, with target for first equipment delivery during 1Q2014 to further augment the NCAR HPSS Archive.

  17. Yellowstone Physical Infrastructure

  18. Yellowstone allocations (% of resource) NCAR’s 29% represents 170 million core-hours per year for Yellowstone alone (compared to less than 10 million per year on Bluefire) plus a similar fraction of the DAV and GLADE resources.

  19. Yellowstone Schedule

  20. Data in Flight to NWSC • NWSC Networking • Central Filesystem • Migrating from GLADE-ML to GLADE-NWSC • Archive • Migrating HPSS data from ML to NWSC

  21. a

  22. BiSON to NWSC • Initially three 10G circuits active • Two 10G connections back to Mesa Lab for internal traffic • One 10G direct to FRGP for general Internet2 / NLR / Research and Education traffic • Options for dedicated 10G connections for high performance computing to other BiSON members • System is engineered for 40 individual lambdas • Each lambda can be a 10G, 40G, or 100G connection • Independent lambdas can be sent each direction around the ring (two ADVA shelves at NWSC – one for each direction) • With a major upgrade system could support 80 lambdas • 100Gbps * 80 channels * 2 paths = 16Tbps

  23. High performance LAN • Data Center Networking (DCN) • high speed data center computing with 1G and 10G client facing ports for supercomputer, mass storage, and other data center components • redundant design (e.g., multiple chassis and separate module connections) • future option for 100G interfaces • Juniper awarded RFP after NSF approval • Access switches: QFX3500 series • Network core and WAN edge: EX8200 series switch/routers • Includes spares for lab/testing purposes • Juniper training for NETS staff early Jan-2012 • Deploy Juniper equipment late Jan-2012

  24. Moving Data… Ugh

  25. Migrating GLADE Data • Temporary work spaces (/glade/scratch, /glade/user) • No data will automatically be moved to NWSC • Allocated project spaces (/glade/projxx) • New allocations will be made for the NWSC • No data will automatically be moved to NWSC • Data transfer option so users may move data they need • Data collections (/glade/data01, /glade/data02) • CISL will move data from ML to NWSC • Full production capability will need to be maintained during the transition • Network Impact • Current storage max performance is 5GB/s • Can sustain ~2GB/s for reads while under a production load • Will move 400TB in a couple of days, however we will saturate a 20Gb/s network link

  26. Migrating Archive • What Migrates???? • Data: 15PBs and counting… • Format: MSS to HPSS • Tape: Tape B (1TB tapes) to Tape C (5TB tapes) • Location: ML to NWSC • HPSS at NWSC to become primary site in Spring 2012 • 1 day outage when metadata servers get moved • ML HPSS will remain as Disaster Recovery Site • Data Migration will take until early 2014 • Will throttle migration to not overload network

  27. Whew… A lot of work ahead!Questions?

More Related