Research Computing
Download
1 / 49

Mission - PowerPoint PPT Presentation


  • 144 Views
  • Uploaded on

Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration. Mission. Provide advanced computing resources required by a major research university Software Hardware Training Support. User Base. 40 Research groups

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mission' - jovan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Research ComputingUniversity Of South FloridaProviding Advanced Computing Resources for Research and InstructionthroughCollaboration


Mission

Mission

Provide advanced computing resources required by a major research university

Software

Hardware

Training

Support


User base

User Base

40 Research groups

6 Colleges

100 faculty

300 students


Hardware

Hardware

System was build on the condominium model and consists of 300 Nodes 2400 Processors

University provides infrastructure and some computational resources

Faculty funding provides bulk of computational resources


Software

Software

Over 50 scientific codes

Installation

Integration

Upgrades

Licensing


Support personnel

Support Personnel

Provide all systems administration

Software support

One-on-one consulting

System efficiency improvements

Users are no longer just the traditional “number crunchers


Current projects

Current Projects

Consolidating the last standalone cluster (of appreciable size)

Advanced Visualization Center

Group of 19 Faculty applied for funding

Personnel

Training

Large Resolution 3D display


Current projects1

Current Projects

New computational resources

Approximately 100 nodes

GPU resources

Upgrade parallel file system

Virtual Clusters

HPC for the other 90 %

FACC


Florida state university s shared hpc

Florida State University's Shared HPC

Building and Maintaining Sustainable Research

Computing at FSU


Shared fsu hpc mission

Shared-FSU HPC Mission

  • Support multidisciplinary research

  • Provide a general access computing platform

  • Encourage cost sharing by departments with dedicated computing needs

  • Provide a broad base of support and training opportunities


Turn key research solution participation is voluntary

Turn-key Research SolutionParticipation is Voluntary

  • University provides staffing

  • University provides general infrastructure

    • Network fabrics

    • Racks

    • Power/Cooling

  • Additional buy-in incentives

    • Leverage better pricing as a group

    • Matching funds

  • Offer highly flexible buy-in options

    • Hardware purchase only

    • Short-term Service Level Agreements

    • Long-term Service Level Agreements

  • Shoot for 50% of hardware costs covered by Buy-in


Research support @ fsu

Research Support @ FSU

  • 500 plus users

  • 33 Academic Units

  • 5 Colleges


Hpc owner groups

HPC Owner Groups

  • 2007

    • Department of Scientific Computing

    • Center for Ocean-Atmosphere Prediction Studies

    • Department of Meteorology

  • 2008

    • Gunzburger Group (Applied Mathematics)

    • Taylor Group (Structural Biology)

    • Department of Scientific Computing

    • Kostov Group (Chemical & Biomedical Engineering)

  • 2009

    • Department of Physics (HEP, Nuclear, etc.)

    • Institute of Molecular Biophysics

    • Bruschweiler Group (National High Magnetic Field Laboratory)

    • Center for Ocean-Atmosphere Prediction Studies (with the Department of Oceanography)

    • Torrey Pines Institute of Molecular Studies

  • 2010

    • Chella Group (Chemical Engineering)

    • Torrey Pines Institute of Molecular Studies

    • Yang Group (Institute of Molecular Biophysics)

    • Meteorology Department

    • Bruschweiler Group

    • Fajer Group (Institute of Molecular Biophysics)

    • Bass Group (Biology)


Research support @ fsu1

Research Support @ FSU

  • Publications

    • Macromolecules

    • Bioinformatics

    • Systematic Biology

    • Journal of Biogeography

    • Journal of Applied Remote Sensing

    • Journal of Chemical Theory and Computation

    • Physical Review Letters

    • Journal of Physical Chemistry

    • Proceeding of the National Academy of Science

    • Biophysical Journal

    • Journal Chemical Theory Computation

    • Journal: J. Phys. Chem.

    • PLoS Pathogens

    • Journal of Virology

    • Journal of the American Chemical Society

    • The Journal of Chemical Physics

    • PLoS Biology

    • Ocean Modeling

    • Journal of Computer-Aided Molecular Design


Fsu s shared hpc stage 1 infiniband connected cluster

FSU’s Shared-HPCStage 1: Infiniband Connected Cluster

Sliger Data Center

Shared-HPC

pfs



Fsu s shared hpc stage 2 alternative backfilling

FSU’s Shared-HPCStage 2: Alternative Backfilling

DSL Building

Condor

Sliger Data Center

Shared-HPC

pfs



Condor usage

Condor Usage Condor

  • ~1000 processor cores available for single processor computations

  • 2,573,490 processor hours used since Condor was made available to all HPC users in September

  • Seven users have been using Condor from HPC

  • Dominate users are Evolutionary Biology, Molecular Dynamics, and Statistics (same users that were submitting numerous single proc. jobs)

  • Two workshop introducing it to HPC users




Fsu s shared hpc stage 3 scalable smp

FSU’s Shared-HPC CondorStage 3: Scalable SMP

DSL Building

Condor

Sliger Data Center

Shared-HPC

pfs

SMP


Fsu s shared hpc stage 3 scalable smp1

FSU’s Shared-HPC CondorStage 3: Scalable SMP

  • One MOAB Queue for SMP or very large memory jobs

  • Three “nodes”

    • M905 blade with 16 cores and 64GB mem

    • M905 blade with 24 cores and 64GB mem

    • 3Leaf system with up to 132 cores and 528 GB mem


DSL Building Condor

Condor

Sliger Data Center

Shared-HPC

pfs

SMP

DSL Data Center

fs

Vis


Interactive cluster functions

Interactive Cluster CondorFunctions

  • Facilitates data exploration

  • Provides venue for software not well suited for a batch scheduled environment

    • (e.g., some MatLab, VMD, R, Python, etc.)

  • Provides access to hardware not typically found on standard desktops/laptops/mobile devises (e.g. lots of memory, high-end GPUs)

  • Provides licensing and configuration support for software applications and libraries


Interactive cluster hardware layout

Interactive Cluster CondorHardware Layout

  • 8 high-end CPU based host nodes

    • Multi-core Intel or AMD processors

    • 4 to 8 GB of memory per core

    • 16X PCIe connectivity

    • QDR IB connectivity to Luster storage

    • IP (read-only) connectivity to Panasas

    • 10 Gbps connectivity to campus network backbone

  • One C410x external PCI chassis

    • Compact

    • IPMI management

    • Supports up to 16 NVIDIATesla M2050

      • Up to 16.48 teraflops


DSL Building Condor

Condor

Sliger Data Center

Shared-HPC

pfs

SMP

DSL Data Center

fs

Vis

Db.Web


Web database hardware function

Web/Database Hardware CondorFunction

  • Facilitates creation of Data analysis Pipelines/Workflows

  • Favored by external funding agencies

    • Demonstrated cohesive Cyberinfrastructure

    • Fits well into required Data Management Plans (NSF)

  • Intended to facilitate access to data on Secondary storage or cycles on owner share of HPC

  • Basic Software Install, no development support

  • Bare Metal or VM


Web database hardware examples

Web/Database Hardware CondorExamples


Web database hardware examples1

Web/Database Hardware CondorExamples


Fsu research ci

FSU Research CI Condor

HTC

HPC

DB and Web

Storage

storage

Vis and interactive

SMP


Florida state university s shared hpc1

Florida State University's Shared HPC Condor

  • Universities are by design multifaceted and lack a singular focus of support

  • Local HPC resources should also be multifaceted and have a broad basis of support


University of florida

University of Florida Condor

HPC Center

HPC Summit


Short history

Short history Condor

Started in 2003

2004 Phase I:

CLAS – Avery – OIT

2005 Phase IIb:

COE – 9 investors

2007 Phase IIb:

COE – 3 investors

2009 Phase III:

DSR – 17 investors - ICBR - IFAS

2011 Phase IV:

22 investors

HPC Summit


Budget

Budget Condor

Total budget

2003-3004 $0.7 M

2004-2005 $1.8 M

2005-2006 $0.3 M

2006-2007 $1.2 M

2007-2008 $1.6 M

2008-2009 $0.4 M

2009-2010 $0.9 M

HPC Summit


Hardware1

Hardware Condor

4,500 cores

500 TB storage

InfiniBand connected

In three machine rooms

Connected by 20 Gbit/sec Campus Research Network

HPC Summit


System software

System software Condor

RedHat Enterprise Linux

through free CentOS distribution

upgrade once per year

Lustre file system

mounted on all nodes

Scratch only

Provide backup through CNS service

Requires separate agreement between researcher and CNS

HPC Summit


Other software

Other software Condor

Moab scheduler (commercial license)

Intel compilers (commercial license)

Numerous applications

Open and commercial

HPC Summit


Operation

Operation Condor

Shared cluster

some hosted systems

300 users

90% - 95% utilization

HPC Summit


Investor model

Investor Model Condor

Normalized Computing Unit

$400 per NCU

Is one core

In fully functional system (RAM, disk, shared file system)

For 5 years

HPC Summit


Investor model1

Investor Model Condor

Optional Storage Unit

$140 per OSU

1 TB of file storage (RAID) on one of a few global parallel file systems (Lustre)

For 1 year

HPC Summit


Other options

Other options Condor

Hosted system

Buy all hardware, we operate

No sharing

Pay as you go

Agree to pay monthly bill

Equivalent (almost) to $400 NCU prorated on a monthly basis

Or rates are 0.009 cents per hour

Cheaper than Amazon Elastic Cloud

HPC Summit



Mission statement

Mission Statement Condor

  • UM CCS is establishing nationally and internationally recognized research programs, focusing on those of an interdisciplinary nature, and actively engaging in computational research to solve the complex technological problems of modern society. We provide a framework for promoting collaborative and multidisciplinary activities across the University and beyond


Ccs overview

CCS overview Condor

  • Started in June 2007

  • Faculty Senate approval in 2008

  • Four Founding Schools: A&S, CoE, RSMAS, Medical

  • Offices in all Campus

  • ~30 FTEs

  • Data Center at the NAP of Americas


Um ccs research programs and cores

UM CCS CondorResearch Programs and Cores

Physical Science

&

Engineering

Data Mining

Computational Biology

&

Bioinformatics

Visualization

Computational

Chemistry

Social Systems Informatics

High Performance

Computing

Software

Engineering


Quick facts

  • Over 1,000 UM users Condor

  • 5,200 cores of Linux Based Cluster

  • 1,500 cores of Power-based Cluster

  • ~2.0 PT of Storage

  • 4.0 PT of Back-up

  • More at:

    • http://www.youtube.com/watch?v=JgUNBRJHrC4

    • www.ccs.miami.edu

Quick Facts


High performance computing

High Performance Computing Condor

  • UM Wide Resource Provides Academic Community & Research Partners with Comprehensive HPC Resources:

    • Hardware & Scientific Software Infrastructure

    • Expertise in Designing & Implementing HPC Solutions

    • Designing & Porting Algorithms & Programs to Parallel Computing Models

  • Open Access of compute processing (first come serve)

    • Peer Review for large projects – Allocation Committee

    • Cost Center for priority access

  • HPC services

    • Storage Cloud

    • Visualization and Data Analysis Cloud

    • Processing Cloud


ad