Ncar s data centric supercomputing environment yellowstone
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

NCAR’s Data-Centric Supercomputing Environment Yellowstone PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on
  • Presentation posted in: General

NCAR’s Data-Centric Supercomputing Environment Yellowstone. December 21, 2011 Anke Kamrath, OSD Director/CISL anke @ ucar.edu. Overview. Strategy Moving from Process to Data-Centric Computing HPC/Data Architecture What we have today at ML What’s planned for NWSC

Download Presentation

NCAR’s Data-Centric Supercomputing Environment Yellowstone

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ncar s data centric supercomputing environment yellowstone

NCAR’s Data-Centric Supercomputing EnvironmentYellowstone

December 21, 2011Anke Kamrath, OSD [email protected]


Overview

Overview

  • Strategy

    • Moving from Process to Data-Centric Computing

  • HPC/Data Architecture

    • What we have today at ML

    • What’s planned for NWSC

  • Storage/Data/Networking – Data in Flight

    • WAN and High-Performance LAN

    • Central Filesystem Plans

    • Archival Plans

NWSC Now Open!


Evolving the scientific workflow

Evolving the Scientific Workflow

  • Common data movement issues

    • Time consuming to move data between systems

    • Bandwidth to archive system is insufficient

    • Lack of sufficient disk space

  • Need to evolve data management techniques

    • Workflow management systems

    • Standardize metadata information

    • Reduce/eliminate duplication of datasets (ie – CMIP5)

  • User Education

    • Effective methods for understanding workflow

    • Effective methods for streamlining workflow


Traditional workflow

Traditional Workflow

Process Centric Data Model


Evolving scientific workflow

Evolving Scientific Workflow

Information Centric Data Model


Current resources @ mesa lab

Current Resources @ Mesa Lab

GLADE

BLUEFIRE

LYNX


Ncar s data centric supercomputing environment yellowstone

NWSC: Yellowstone Environment

Yellowstone

HPC resource, 1.55 PFLOPS peak

GLADE

Central disk resource

11 PB (2012), 16.4 PB (2014)

Geyser & Caldera

DAV clusters

NCAR HPSS Archive

100 PB capacity

~15 PB/yr growth

1Gb/10Gb Ethernet (40Gb+ future)

Data Transfer

Services

Science Gateways

RDA, ESG

Partner Sites

Remote Vis

XSEDE Sites

High Bandwidth Low Latency HPC and I/O Networks

FDR InfiniBand and 10Gb Ethernet


Nwsc 1 resources in a nutshell

NWSC-1 Resources in a Nutshell

  • Centralized Filesystems and Data Storage Resource (GLADE)

    • >90 GB/sec aggregate I/O bandwidth, GPFS filesystems

    • 10.9 PetaBytes initially -> 16.4 PetaBytes in 1Q2014

  • High Performance Computing Resource (Yellowstone)

    • IBM iDataPlex Cluster with Intel Sandy Bridge EP† processors with Advanced Vector Extensions (AVX)

    • 1.552 PetaFLOPs – 29.8 bluefire-equivalents – 4,662 nodes – 74,592 cores

    • 149.2 TeraBytes total memory

    • Mellanox FDR InfiniBand full fat-tree interconnect

  • Data Analysis and Visualization Resource (Geyser & Caldera)

    • Large Memory System with Intel Westmere EX processors

      • 16 nodes, 640 cores, 16 TeraBytes memory, 16 NVIDIA Kepler GPUs

    • GPU-Computation/Vis System with Intel Sandy Bridge EP processors with AVX

      • 16 nodes, 128 SB cores, 1 TeraByte memory, 32 NVIDIA Kepler GPUs

    • Knights Corner System with Intel Sandy Bridge EP processors with AVX

      • 16 nodes, 128 SB cores, 992 KC cores, 1 TeraByte memory - Nov’12 delivery

†“Sandy Bridge EP” is the Intel® Xeon® E5-2670 processor


Glade

GLADE

  • 10.94 PB usable capacity  16.42 PB usable (1Q2014)

    Estimated initial file system sizes

    • collections≈ 2 PBRDA, CMIP5 data

    • scratch≈ 5 PBshared, temporary space

    • projects≈ 3 PBlong-term, allocated space

    • users≈ 1 PBmedium-term work space

  • Disk Storage Subsystem

    • 76 IBM DCS3700 controllers & expansion drawers

      • 90 2-TB NL-SAS drives/controller

      • add 30 3-TB NL-SAS drives/controller (1Q2014)

  • GPFS NSD Servers

    • 91.8 GB/s aggregate I/O bandwidth; 19 IBM x3650 M4 nodes

  • I/O Aggregator Servers (GPFS, GLADE-HPSS connectivity)

    • 10-GbE & FDR interfaces; 4 IBM x3650 M4 nodes

  • High-performance I/O interconnect to HPC & DAV

    • Mellanox FDR InfiniBand full fat-tree

    • 13.6 GB/s bidirectional bandwidth/node


Ncar disk storage capacity profile

NCAR Disk Storage Capacity Profile


Yellowstone nwsc high performance computing resource

YellowstoneNWSC High-Performance Computing Resource

  • Batch Computation

    • 4,662 IBM dx360 M4 nodes – 16 cores, 32 GB memory per node

    • Intel Sandy Bridge EP processors with AVX – 2.6 GHz clock

    • 74,592 cores total – 1.552 PFLOPs peak

    • 149.2 TB total DDR3-1600 memory

    • 29.8 Bluefire equivalents

  • High-Performance Interconnect

    • Mellanox FDR InfiniBand full fat-tree

    • 13.6 GB/s bidirectional bw/node

    • <2.5 µs latency (worst case)

    • 31.7 TB/s bisection bandwidth

  • Login/Interactive

    • 6 IBM x3650 M4 Nodes; Intel Sandy Bridge EP processors with AVX

    • 16 cores & 128 GB memory per node


Ncar hpc profile

NCAR HPC Profile

30x Bluefire performance


Geyser and caldera nwsc data analysis visualization resource

Geyser and CalderaNWSC Data Analysis & Visualization Resource

  • Geyser: Large-memory system

    • 16 IBM x3850 nodes – Intel Westmere-EX processors

    • 40 cores, 1 TB memory, 1 NVIDIA Kepler Q13H-3 GPU per node

    • Mellanox FDR full fat-tree interconnect

  • Caldera: GPUcomputation/visualization system

    • 16 IBM x360 M4 nodes – Intel Sandy Bridge EP/AVX

    • 16 cores, 64 GB memory, 2 NVIDIA Kepler Q13H-3 GPUs per node

    • Mellanox FDR full fat-tree interconnect

  • Knights Corner system (November 2012 delivery)

    • Intel Many Integrated Core (MIC) architecture

    • 16 IBM Knights Corner nodes

    • 16 Sandy Bridge EP/AVX cores, 64 GB memory, 1 Knights Corner adapter per node

    • Mellanox FDR full fat-tree interconnect


Erebus antarctic mesoscale prediction system amps

ErebusAntarctic Mesoscale Prediction System (AMPS)

  • IBM iDataPlex Compute Cluster

    • 84 IBM dx360 M4 Nodes; 16 cores, 32 GB

    • Intel Sandy Bridge EP; 2.6 GHz clock

    • 1,344 cores total – 28 TFLOPs peak

    • Mellanox FDR InfiniBand full fat-tree

    • 0.54 Bluefire equivalents

  • Login Nodes

    • 2 IBM x3650 M4 Nodes

    • 16 cores & 128 GB memory per node

  • Dedicated GPFS filesystem

    • 57.6 TB usable disk storage

    • 9.6 GB/sec aggregate I/O bandwidth

90° E

90° W

180°

Erebus, on Ross Island, is Antarctica’s most famous volcanic peak and is one of the largest volcanoes in the world – within the top 20 in total size and reaching a height of 12,450 feet.


Yellowstone software

Yellowstone Software

  • Compilers, Libraries, Debugger & Performance Tools

    • Intel Cluster Studio (Fortran, C++, performance & MPI libraries, trace collector & analyzer) 50 concurrent users

    • Intel VTune Amplifier XE performance optimizer 2 concurrent users

    • PGI CDK (Fortran, C, C++, pgdbg debugger, pgprof) 50 conc. users

    • PGI CDK GPU Version (Fortran, C, C++, pgdbg debugger, pgprof) for DAV systems only, 2 concurrent users

    • PathScale EckoPath (Fortran C, C++, PathDB debugger) 20 concurrent users

    • Rogue Wave TotalView debugger 8,192 floating tokens

    • IBM Parallel Environment (POE), including IBM HPC Toolkit

  • System Software

    • LSF-HPC Batch Subsystem / Resource Manager

      • IBM has purchased Platform Computing, Inc. (developers of LSF-HPC)

    • Red Hat Enterprise Linux (RHEL) Version 6

    • IBM General Parallel Filesystem (GPFS)

    • Mellanox Universal Fabric Manager

    • IBM xCAT cluster administration toolkit


Ncar hpss archive resource

NCAR HPSS Archive Resource

  • NWSC

    • Two SL8500 robotic libraries (20k cartridge capacity)

    • 26 T10000C tape drives (240 MB/sec I/O rate each) and T10000C media (5 TB/cartridge, uncompressed) initially; +20 T10000C drives ~Nov 2012

    • >100 PB capacity

    • Current growth rate ~3.8 PB/year

    • Anticipated NWSC growth rate ~15 PB/year

  • Mesa Lab

    • Two SL8500 robotic libraries (15k cartridge capacity)

    • Existing data (14.5 PB):

      • 1st & 2nd copies will be ‘oozed’ to new media @ NWSC, begin 2012

    • New data @ Mesa:

      • Disaster-recovery data only

    • T10000B drives & media to be retired

    • No plans to move Mesa Lab SL8500 libraries (more costly to move than to buy new under AMSTAR Subcontract)

Plan to release an “AMSTAR-2” RFP 1Q2013, with target for first equipment delivery during 1Q2014 to further augment the NCAR HPSS Archive.


Yellowstone physical infrastructure

Yellowstone Physical Infrastructure


Yellowstone allocations of resource

Yellowstone allocations (% of resource)

NCAR’s 29% represents 170 million core-hours per year for Yellowstone alone (compared to less than 10 million per year on Bluefire) plus a similar fraction of the DAV and GLADE resources.


Yellowstone schedule

Yellowstone Schedule


Data in flight to nwsc

Data in Flight to NWSC

  • NWSC Networking

  • Central Filesystem

    • Migrating from GLADE-ML to GLADE-NWSC

  • Archive

    • Migrating HPSS data from ML to NWSC


Ncar s data centric supercomputing environment yellowstone

a


Bison to nwsc

BiSON to NWSC

  • Initially three 10G circuits active

    • Two 10G connections back to Mesa Lab for internal traffic

    • One 10G direct to FRGP for general Internet2 / NLR / Research and Education traffic

  • Options for dedicated 10G connections for high performance computing to other BiSON members

  • System is engineered for 40 individual lambdas

    • Each lambda can be a 10G, 40G, or 100G connection

  • Independent lambdas can be sent each direction around the ring (two ADVA shelves at NWSC – one for each direction)

    • With a major upgrade system could support 80 lambdas

    • 100Gbps * 80 channels * 2 paths = 16Tbps


High performance lan

High performance LAN

  • Data Center Networking (DCN)

    • high speed data center computing with 1G and 10G client facing ports for supercomputer, mass storage, and other data center components

    • redundant design (e.g., multiple chassis and separate module connections)

    • future option for 100G interfaces

  • Juniper awarded RFP after NSF approval

    • Access switches: QFX3500 series

    • Network core and WAN edge: EX8200 series switch/routers

    • Includes spares for lab/testing purposes

  • Juniper training for NETS staff early Jan-2012

  • Deploy Juniper equipment late Jan-2012


Moving data ugh

Moving Data… Ugh


Migrating glade data

Migrating GLADE Data

  • Temporary work spaces (/glade/scratch, /glade/user)

    • No data will automatically be moved to NWSC

  • Allocated project spaces (/glade/projxx)

    • New allocations will be made for the NWSC

    • No data will automatically be moved to NWSC

    • Data transfer option so users may move data they need

  • Data collections (/glade/data01, /glade/data02)

    • CISL will move data from ML to NWSC

    • Full production capability will need to be maintained during the transition

  • Network Impact

    • Current storage max performance is 5GB/s

    • Can sustain ~2GB/s for reads while under a production load

    • Will move 400TB in a couple of days, however we will saturate a 20Gb/s network link


Migrating archive

Migrating Archive

  • What Migrates????

    • Data: 15PBs and counting…

    • Format: MSS to HPSS

    • Tape: Tape B (1TB tapes) to Tape C (5TB tapes)

    • Location: ML to NWSC

  • HPSS at NWSC to become primary site in Spring 2012

    • 1 day outage when metadata servers get moved

    • ML HPSS will remain as Disaster Recovery Site

    • Data Migration will take until early 2014

    • Will throttle migration to not overload network


Whew a lot of work ahead questions

Whew… A lot of work ahead!Questions?


  • Login