ncar s data centric supercomputing environment yellowstone
Download
Skip this Video
Download Presentation
NCAR’s Data-Centric Supercomputing Environment Yellowstone

Loading in 2 Seconds...

play fullscreen
1 / 27

NCAR’s Data-Centric Supercomputing Environment Yellowstone - PowerPoint PPT Presentation


  • 160 Views
  • Uploaded on

NCAR’s Data-Centric Supercomputing Environment Yellowstone. December 21, 2011 Anke Kamrath, OSD Director/CISL anke @ ucar.edu. Overview. Strategy Moving from Process to Data-Centric Computing HPC/Data Architecture What we have today at ML What’s planned for NWSC

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' NCAR’s Data-Centric Supercomputing Environment Yellowstone' - talia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Strategy
    • Moving from Process to Data-Centric Computing
  • HPC/Data Architecture
    • What we have today at ML
    • What’s planned for NWSC
  • Storage/Data/Networking – Data in Flight
    • WAN and High-Performance LAN
    • Central Filesystem Plans
    • Archival Plans

NWSC Now Open!

evolving the scientific workflow
Evolving the Scientific Workflow
  • Common data movement issues
    • Time consuming to move data between systems
    • Bandwidth to archive system is insufficient
    • Lack of sufficient disk space
  • Need to evolve data management techniques
    • Workflow management systems
    • Standardize metadata information
    • Reduce/eliminate duplication of datasets (ie – CMIP5)
  • User Education
    • Effective methods for understanding workflow
    • Effective methods for streamlining workflow
traditional workflow
Traditional Workflow

Process Centric Data Model

evolving scientific workflow
Evolving Scientific Workflow

Information Centric Data Model

slide7

NWSC: Yellowstone Environment

Yellowstone

HPC resource, 1.55 PFLOPS peak

GLADE

Central disk resource

11 PB (2012), 16.4 PB (2014)

Geyser & Caldera

DAV clusters

NCAR HPSS Archive

100 PB capacity

~15 PB/yr growth

1Gb/10Gb Ethernet (40Gb+ future)

Data Transfer

Services

Science Gateways

RDA, ESG

Partner Sites

Remote Vis

XSEDE Sites

High Bandwidth Low Latency HPC and I/O Networks

FDR InfiniBand and 10Gb Ethernet

nwsc 1 resources in a nutshell
NWSC-1 Resources in a Nutshell
  • Centralized Filesystems and Data Storage Resource (GLADE)
    • >90 GB/sec aggregate I/O bandwidth, GPFS filesystems
    • 10.9 PetaBytes initially -> 16.4 PetaBytes in 1Q2014
  • High Performance Computing Resource (Yellowstone)
    • IBM iDataPlex Cluster with Intel Sandy Bridge EP† processors with Advanced Vector Extensions (AVX)
    • 1.552 PetaFLOPs – 29.8 bluefire-equivalents – 4,662 nodes – 74,592 cores
    • 149.2 TeraBytes total memory
    • Mellanox FDR InfiniBand full fat-tree interconnect
  • Data Analysis and Visualization Resource (Geyser & Caldera)
    • Large Memory System with Intel Westmere EX processors
      • 16 nodes, 640 cores, 16 TeraBytes memory, 16 NVIDIA Kepler GPUs
    • GPU-Computation/Vis System with Intel Sandy Bridge EP processors with AVX
      • 16 nodes, 128 SB cores, 1 TeraByte memory, 32 NVIDIA Kepler GPUs
    • Knights Corner System with Intel Sandy Bridge EP processors with AVX
      • 16 nodes, 128 SB cores, 992 KC cores, 1 TeraByte memory - Nov’12 delivery

†“Sandy Bridge EP” is the Intel® Xeon® E5-2670 processor

glade
GLADE
  • 10.94 PB usable capacity  16.42 PB usable (1Q2014)

Estimated initial file system sizes

    • collections ≈ 2 PB RDA, CMIP5 data
    • scratch ≈ 5 PB shared, temporary space
    • projects ≈ 3 PB long-term, allocated space
    • users ≈ 1 PB medium-term work space
  • Disk Storage Subsystem
    • 76 IBM DCS3700 controllers & expansion drawers
      • 90 2-TB NL-SAS drives/controller
      • add 30 3-TB NL-SAS drives/controller (1Q2014)
  • GPFS NSD Servers
    • 91.8 GB/s aggregate I/O bandwidth; 19 IBM x3650 M4 nodes
  • I/O Aggregator Servers (GPFS, GLADE-HPSS connectivity)
    • 10-GbE & FDR interfaces; 4 IBM x3650 M4 nodes
  • High-performance I/O interconnect to HPC & DAV
    • Mellanox FDR InfiniBand full fat-tree
    • 13.6 GB/s bidirectional bandwidth/node
yellowstone nwsc high performance computing resource
YellowstoneNWSC High-Performance Computing Resource
  • Batch Computation
    • 4,662 IBM dx360 M4 nodes – 16 cores, 32 GB memory per node
    • Intel Sandy Bridge EP processors with AVX – 2.6 GHz clock
    • 74,592 cores total – 1.552 PFLOPs peak
    • 149.2 TB total DDR3-1600 memory
    • 29.8 Bluefire equivalents
  • High-Performance Interconnect
    • Mellanox FDR InfiniBand full fat-tree
    • 13.6 GB/s bidirectional bw/node
    • <2.5 µs latency (worst case)
    • 31.7 TB/s bisection bandwidth
  • Login/Interactive
    • 6 IBM x3650 M4 Nodes; Intel Sandy Bridge EP processors with AVX
    • 16 cores & 128 GB memory per node
ncar hpc profile
NCAR HPC Profile

30x Bluefire performance

geyser and caldera nwsc data analysis visualization resource
Geyser and CalderaNWSC Data Analysis & Visualization Resource
  • Geyser: Large-memory system
    • 16 IBM x3850 nodes – Intel Westmere-EX processors
    • 40 cores, 1 TB memory, 1 NVIDIA Kepler Q13H-3 GPU per node
    • Mellanox FDR full fat-tree interconnect
  • Caldera: GPUcomputation/visualization system
    • 16 IBM x360 M4 nodes – Intel Sandy Bridge EP/AVX
    • 16 cores, 64 GB memory, 2 NVIDIA Kepler Q13H-3 GPUs per node
    • Mellanox FDR full fat-tree interconnect
  • Knights Corner system (November 2012 delivery)
    • Intel Many Integrated Core (MIC) architecture
    • 16 IBM Knights Corner nodes
    • 16 Sandy Bridge EP/AVX cores, 64 GB memory, 1 Knights Corner adapter per node
    • Mellanox FDR full fat-tree interconnect
erebus antarctic mesoscale prediction system amps
ErebusAntarctic Mesoscale Prediction System (AMPS)

  • IBM iDataPlex Compute Cluster
    • 84 IBM dx360 M4 Nodes; 16 cores, 32 GB
    • Intel Sandy Bridge EP; 2.6 GHz clock
    • 1,344 cores total – 28 TFLOPs peak
    • Mellanox FDR InfiniBand full fat-tree
    • 0.54 Bluefire equivalents
  • Login Nodes
    • 2 IBM x3650 M4 Nodes
    • 16 cores & 128 GB memory per node
  • Dedicated GPFS filesystem
    • 57.6 TB usable disk storage
    • 9.6 GB/sec aggregate I/O bandwidth

90° E

90° W

180°

Erebus, on Ross Island, is Antarctica’s most famous volcanic peak and is one of the largest volcanoes in the world – within the top 20 in total size and reaching a height of 12,450 feet.

yellowstone software
Yellowstone Software
  • Compilers, Libraries, Debugger & Performance Tools
    • Intel Cluster Studio (Fortran, C++, performance & MPI libraries, trace collector & analyzer) 50 concurrent users
    • Intel VTune Amplifier XE performance optimizer 2 concurrent users
    • PGI CDK (Fortran, C, C++, pgdbg debugger, pgprof) 50 conc. users
    • PGI CDK GPU Version (Fortran, C, C++, pgdbg debugger, pgprof) for DAV systems only, 2 concurrent users
    • PathScale EckoPath (Fortran C, C++, PathDB debugger) 20 concurrent users
    • Rogue Wave TotalView debugger 8,192 floating tokens
    • IBM Parallel Environment (POE), including IBM HPC Toolkit
  • System Software
    • LSF-HPC Batch Subsystem / Resource Manager
      • IBM has purchased Platform Computing, Inc. (developers of LSF-HPC)
    • Red Hat Enterprise Linux (RHEL) Version 6
    • IBM General Parallel Filesystem (GPFS)
    • Mellanox Universal Fabric Manager
    • IBM xCAT cluster administration toolkit
ncar hpss archive resource
NCAR HPSS Archive Resource
  • NWSC
    • Two SL8500 robotic libraries (20k cartridge capacity)
    • 26 T10000C tape drives (240 MB/sec I/O rate each) and T10000C media (5 TB/cartridge, uncompressed) initially; +20 T10000C drives ~Nov 2012
    • >100 PB capacity
    • Current growth rate ~3.8 PB/year
    • Anticipated NWSC growth rate ~15 PB/year
  • Mesa Lab
    • Two SL8500 robotic libraries (15k cartridge capacity)
    • Existing data (14.5 PB):
      • 1st & 2nd copies will be ‘oozed’ to new media @ NWSC, begin 2012
    • New data @ Mesa:
      • Disaster-recovery data only
    • T10000B drives & media to be retired
    • No plans to move Mesa Lab SL8500 libraries (more costly to move than to buy new under AMSTAR Subcontract)

Plan to release an “AMSTAR-2” RFP 1Q2013, with target for first equipment delivery during 1Q2014 to further augment the NCAR HPSS Archive.

yellowstone allocations of resource
Yellowstone allocations (% of resource)

NCAR’s 29% represents 170 million core-hours per year for Yellowstone alone (compared to less than 10 million per year on Bluefire) plus a similar fraction of the DAV and GLADE resources.

data in flight to nwsc
Data in Flight to NWSC
  • NWSC Networking
  • Central Filesystem
    • Migrating from GLADE-ML to GLADE-NWSC
  • Archive
    • Migrating HPSS data from ML to NWSC
bison to nwsc
BiSON to NWSC
  • Initially three 10G circuits active
    • Two 10G connections back to Mesa Lab for internal traffic
    • One 10G direct to FRGP for general Internet2 / NLR / Research and Education traffic
  • Options for dedicated 10G connections for high performance computing to other BiSON members
  • System is engineered for 40 individual lambdas
    • Each lambda can be a 10G, 40G, or 100G connection
  • Independent lambdas can be sent each direction around the ring (two ADVA shelves at NWSC – one for each direction)
    • With a major upgrade system could support 80 lambdas
    • 100Gbps * 80 channels * 2 paths = 16Tbps
high performance lan
High performance LAN
  • Data Center Networking (DCN)
    • high speed data center computing with 1G and 10G client facing ports for supercomputer, mass storage, and other data center components
    • redundant design (e.g., multiple chassis and separate module connections)
    • future option for 100G interfaces
  • Juniper awarded RFP after NSF approval
    • Access switches: QFX3500 series
    • Network core and WAN edge: EX8200 series switch/routers
    • Includes spares for lab/testing purposes
  • Juniper training for NETS staff early Jan-2012
  • Deploy Juniper equipment late Jan-2012
migrating glade data
Migrating GLADE Data
  • Temporary work spaces (/glade/scratch, /glade/user)
    • No data will automatically be moved to NWSC
  • Allocated project spaces (/glade/projxx)
    • New allocations will be made for the NWSC
    • No data will automatically be moved to NWSC
    • Data transfer option so users may move data they need
  • Data collections (/glade/data01, /glade/data02)
    • CISL will move data from ML to NWSC
    • Full production capability will need to be maintained during the transition
  • Network Impact
    • Current storage max performance is 5GB/s
    • Can sustain ~2GB/s for reads while under a production load
    • Will move 400TB in a couple of days, however we will saturate a 20Gb/s network link
migrating archive
Migrating Archive
  • What Migrates????
    • Data: 15PBs and counting…
    • Format: MSS to HPSS
    • Tape: Tape B (1TB tapes) to Tape C (5TB tapes)
    • Location: ML to NWSC
  • HPSS at NWSC to become primary site in Spring 2012
    • 1 day outage when metadata servers get moved
    • ML HPSS will remain as Disaster Recovery Site
    • Data Migration will take until early 2014
    • Will throttle migration to not overload network
ad