introduction to scientific computing l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Scientific Computing PowerPoint Presentation
Download Presentation
Introduction to Scientific Computing

Loading in 2 Seconds...

play fullscreen
1 / 80

Introduction to Scientific Computing - PowerPoint PPT Presentation


  • 430 Views
  • Uploaded on

Introduction to Scientific Computing. Shubin Liu, Ph.D. Research Computing Center University of North Carolina at Chapel Hill. Course Goals. An introduction to high-performance computing and UNC Research Computing Center Available Research Computing hardware facilities

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Scientific Computing' - Lucy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to scientific computing

Introduction to Scientific Computing

Shubin Liu, Ph.D.

Research Computing Center

University of North Carolina at Chapel Hill

course goals
Course Goals
  • An introduction to high-performance computing and UNC Research Computing Center
  • Available Research Computing hardware facilities
  • Available software packages
  • Serial/parallel programming tools and libraries
  • How to efficiently make use of Research Computing facilities on campus
agenda
Agenda
  • Introduction to High-Performance Computing
  • Hardware Available
    • Servers, storage, file systems, etc.
  • How to Access
  • Programming Tools Available
    • Compilers & Debugger tools
    • Utility Libraries
    • Parallel Computing
  • Scientific Packages Available
  • Job Management
  • Hands-on Exercises– 2nd hour

The PPT format of this presentation is available here:

http://its2.unc.edu/divisions/rc/training/scientific/

pre requisites
Pre-requisites
  • An account on Emerald cluster
  • UNIX Basics

Getting started: http://help.unc.edu/?id=5288

Intermediate: http://help.unc.edu/?id=5333

vi Editor: http://help.unc.edu/?id=152

Customizing: http://help.unc.edu/?id=208

Shells: http://help.unc.edu/?id=5290

ne Editor: http://help.unc.edu/?id=187

Security: http://help.unc.edu/?id=217

Data Management: http://help.unc.edu/?id=189

Scripting: http://help.unc.edu/?id=213

HPC Application: http://help.unc.edu/?id=4176

about us
About Us

ITS – Information Technology Services

http://its.unc.edu

http://help.unc.edu

Physical locations:

401 West Franklin St.

211 Manning Drive

10 Divisions/Departments

Information SecurityIT Infrastructure and Operations

Research Computing CenterTeaching and Learning

User Support and EngagementOffice of the CIO

Communication TechnologiesCommunications

Enterprise ApplicationsFinance and Administration

research computing center
Research Computing Center

Where and who are we and what do we do?

ITS Manning: 211 Manning Drive

Website

http://its.unc.edu/research-computing.html

Groups

Infrastructure -- Hardware

User Support -- Software

Engagement -- Collaboration

about myself
About Myself

Ph.D. from Chemistry, UNC-CH

Currently Senior Computational Scientist @ Research Computing Center, UNC-CH

Responsibilities:

Support Computational Chemistry/Physics/Material Science software

Support Programming (FORTRAN/C/C++) tools, code porting, parallel computing, etc.

Conduct research and engagement projects in Computational Chemistry

Development of DFT theory and concept tools

Applications in biological and material science systems

what is scientific computing
What is Scientific Computing?
  • Short Version
    • To use high-performance computing (HPC) facilities to solve real scientific problems.
  • Long Version, from Wikipedia.com
    • Scientific computing (or computational science) is the field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific and engineering problems. In practical use, it is typically the application of computer simulation and other forms of computation to problems in various scientific disciplines.
what is scientific computing9
What is Scientific Computing?

Engineering

Sciences

Theory/Model Layer

Algorithm Layer

Scientific

Computing

Computer

Science

Applied

Mathematics

Hardware/Software

Natural

Sciences

Application Layer

  • From scientific discipline viewpoint
  • From operational viewpoint

High- Performance

Computing

Scientific

Computing

Parallel

Computing

  • From Computing Perspective
what is hpc
What is HPC?
  • Computing resources which provide more than an order of magnitude more computing power than current top-end workstations or desktops – generic, widely accepted.
  • HPC ingredients:
    • large capability computers (fast CPUs)
    • massive memory
    • enormous (fast & large) data storage
    • highest capacity communication networks (Myrinet, 10 GigE, InfiniBand, etc.)
    • specifically parallelized codes (MPI, OpenMP)
    • visualization
why hpc
Why HPC?
  • What are the three-dimensional structures of all of the proteins encoded by an organism's genome and how does structure influence function, both spatially and temporally?
  • What patterns of emergent behavior occur in models of very large societies?
  • How do massive stars explode and produce the heaviest elements in the periodic table?
  • What sort of abrupt transitions can occur in Earth’s climate and ecosystem structure?
  • How do these occur and under what circumstances? If we could design catalysts atom-by-atom, could we transform industrial synthesis?
  • What strategies might be developed to optimize management of complex infrastructure systems?
  • What kind of language processing can occur in large assemblages of neurons?
  • Can we enable integrated planning and response to natural and man-made disasters that prevent or minimize the loss of life and property?

http://www.nsf.gov/pubs/2005/nsf05625/nsf05625.htm

measure of performance
Measure of Performance

1 CPU, Units in MFLOPS (x106)

Machine/CPU

Type

LINPACK Performance

Peak Performance

Intel Pentium 4 (2.53 GHz)

2355

5060

Mega FLOPS (x106)

Giga FLOPS (x109)

Tera FLOPS (x1012)

Peta FLOPS (x1015)

Exa FLOPS (x1018)

Zetta FLOPS (x1021)

Yotta FLOPS (x1024)

NEC SX-6/1 (1proc. 2.0 ns)

7575

8000

HP rx5670 Itanium2 (1GHz)

  3528

4000

IBM eServer pSeries 690 (1300 MHz)

2894

5200

Cray SV1ex-1-32(500MHz)

1554

2000

Compaq ES45 (1000 MHz)

1542

2000

AMD Athlon MP1800+(1530MHz)

1705

3060

Intel Pentium III (933 MHz)

507

933

http://en.wikipedia.org/wiki/FLOPS

SGI Origin 2000 (300 MHz)

533

600

Intel Pentium II Xeon (450 MHz)

295

450

Sun UltraSPARC (167MHz)

237

333

Reference: http://performance.netlib.org/performance/html/linpack.data.col0.html

how to quantify performance top500
How to Quantify Performance? TOP500
  • A list of the 500 most powerful computer systems over the world
  • Established in June 1993
  • Compiled twice a year (June & November)
  • Using LINPACK Benchmark code (solving linear algebra equation aX=b )
  • Organized by world-wide HPC experts, computational scientists, manufacturers, and the Internet community
  • Homepage: http://www.top500.org
top500 november 2007
TOP500:November 2007

TOP 5, Units in GFLOPS (=1024 MGLOPS)

Rank

Installatio Site

/Year

ManufacturerComputer/Procs

RmaxRpeak

1

DOE/NNSA/LLNLUnited States/2007

BlueGene/LeServer Blue Gene Solution / 212,992, IBM

478,200596,378

2

Forschungszentrum Juelich (FZJ)Germany/2007

JUGENE - Blue Gene/P SolutionIBM

65,536

167,300

222,822

3

SGI/New Mexico Computing Applications Center (NMCAC)United States/2007

SGI Altix ICE 8200, Xeon quad core 3.0 GHz, SGI

14,336

126,900

172,032

4

Computational Research Laboratories, TATA SONSIndia/2007

EKA - Cluster Platform 3000 BL460c, Xeon 53xx 3GHz, InfinibandHewlett-Packard, 14,240

117.900170,800

5

Government AgencySweden/2007

Cluster Platform 3000 BL460c, Xeon 53xx 2.66GHz, InfinibandHewlett-Packard, 13,728

102,800

146,430

36

University of North CarolinaUnited States/2007

Topsail - PowerEdge 1955, 2.33 GHz, Cisco/Topspin Infiniband, Dell, 4160

28,770

38821.1

top500 june 2008
TOP500: June 2008

Rmax and Rpeak values are in TFlops.

Power data in KW for entire system

shared distributed memory architecture
Shared/Distributed-Memory Architecture

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

M

M

M

M

BUS

NETWORK

MEMORY

Distributed memory - each processor

has it’s own local memory. Must do

message passing to exchange data

between processors.

(examples: Baobab, the new Dell Cluster)

Shared memory - single address space. All processors have access to a pool of shared memory. (examples: Chastity/zephyr, happy/yatta, cedar/cypress, sunny) Methods of memory access : Bus and Crossbar

what is a beowulf cluster
What is a Beowulf Cluster?
  • A Beowulf system is a collection of personal computers constructed from commodity-off-the-shelf hardware components interconnected with a system-area-network and configured to operate as a single unit, parallel computing platform (e.g., MPI), using an open-source network operating system such as LINUX.
  • Main components:
    • PCs running LINUX OS
    • Inter-node connection with Ethernet,

Gigabit, Myrinet, InfiniBand, etc.

    • MPI (message passing interface)
what is parallel computing
What is Parallel Computing ?
  • Concurrent use of multiple processors to process data
    • Running the same program on many processors.
    • Running many programs on each processor.
advantages of parallelization
Advantages of Parallelization
  • Cheaper, in terms of Price/Performance Ratio
  • Faster than equivalently expensive uniprocessor machines
  • Handle bigger problems
  • More scalable: the performance of a particular program may be improved by execution on a large machine
  • More reliable: In theory if processors fail we can simply use others
catch amdahl s law
Catch: Amdahl's Law

Speedup = 1/(s+p/n)        

parallel programming tools
Parallel Programming Tools
  • Share-memory architecture
    • OpenMP
  • Distributed-memory architecture
    • MPI, PVM, etc.
openmp
OpenMP
  • An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism
  • What does OpenMP stand for?
    • Open specifications for Multi Processing via collaborative work between interested parties from the hardware and software industry, government and academia.
  • Comprised of three primary API components:
    • Compiler Directives
    • Runtime Library Routines
    • Environment Variables
  • Portable:
    • The API is specified for C/C++ and Fortran
    • Multiple platforms have been implemented including most Unix platforms and Windows NT
  • Standardized:
    • Jointly defined and endorsed by a group of major computer hardware and software vendors
    • Expected to become an ANSI standard later???
openmp example fortran
OpenMP Example (FORTRAN)

PROGRAM HELLO

INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,

+ OMP_GET_THREAD_NUM

C Fork a team of threads giving them their own copies of variables

!$OMP PARALLEL PRIVATE(TID)

C Obtain and print thread id

TID = OMP_GET_THREAD_NUM()

PRINT *, 'Hello World from thread = ', TID

C Only master thread does this

IF (TID .EQ. 0) THEN

NTHREADS = OMP_GET_NUM_THREADS()

PRINT *, 'Number of threads = ', NTHREADS

END IF

C All threads join master thread and disband

!$OMP END PARALLEL

END

the message passing model
The Message Passing Model
  • Parallelization scheme for distributed memory.
  • Parallel programs consist of cooperating processes, each with its own memory.
  • Processes send data to one another as messages
  • Message can be passed around among compute processes
  • Messages may have tags that may be used to sort messages.
  • Messages may be received in any order.
mpi message passing interface
MPI: Message Passing Interface
  • Message-passing model
  • Standard (specification)
    • Many implementations (almost each vendor has one)
    • MPICH and LAM/MPI from public domain most widely used
    • GLOBUS MPI for grid computing
  • Two phases:
    • MPI 1: Traditional message-passing
    • MPI 2: Remote memory, parallel I/O, and dynamic processes
  • Online resources
    • http://www-unix.mcs.anl.gov/mpi/index.htm
    • http://www-unix.mcs.anl.gov/mpi/mpich/
    • http://www.lam-mpi.org/
    • http://www.mpi-forum.org
    • http://www-unix.mcs.anl.gov/mpi/tutorial/learning.html
a simple mpi code
A Simple MPI Code

#include "mpi.h"

#include <stdio.h>

int main( argc, argv )

int argc;

char **argv;

{ MPI_Init( &argc, &argv );

printf( "Hello world\n" );

MPI_Finalize();

return 0;

}

include ‘mpif.h’integer myid, ierr, numprocscall MPI_INIT( ierr)call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs,ierr)write(*,*) ‘Hello from ‘, myidwrite(*,*) ‘Numprocs is’, numprocscall MPI_FINALIZE(ierr)

end

C Version

FORTRAN Version

other parallelization models
Other Parallelization Models
  • VIA: Virtual Interface Architecture -- Standards-based Cluster Communications
  • PVM: a portable message-passing programming system, designed to link separate host machines to form a ``virtual machine'' which is a single, manageable computing resource. It’s largely an academic effort and there has been no much development since 1990s.
  • BSP: Bulk Synchronous Parallel Model, a generalization of the widely researched PRAM (Parallel Random Access Machine) model
  • Linda:a concurrent programming model from Yale, with the primary concept of ``tuple-space''
  • HPF: PGI’s first standard parallel programming language for shared and distributed-memory systems.
rc servers @ unc ch
RC Servers @ UNC-CH
  • SGI Altix 3700 – SMP, 128 CPUs, cedar/cypress
  • Emerald LINUX Cluster – Distributed memory, ~400 CPUs, emerald
    • yatta/p575 IBM AIX nodes
  • Dell LINUX cluster – Distributed memory 4160 CPUs, topsail
ibm p690 p575 smp
IBM P690/P575 SMP
  • IBM pSeries 690/P575 Model 6C4, Power4+ Turbo, 32 1.7 GHz processors
  • - access to 4TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /netscr
  • OS: IBM AIX 5.3 Maintenance Level 04
  • login node: emerald.isis.unc.edu
  • compute node:
    • yatta.isis.unc.edu 32 CPUs
    • P575-n00.isis.unc.edu 16 CPUs
    • P575-n01.isis.unc.edu 16 CPUs
    • P575-n02.isis.unc.edu 16 CPUs
    • P575-n03.issi.unc.edu 16 CPUs
sgi altix 3700 smp
SGI Altix 3700 SMP
  • Servers for Scientific Applications such as Gaussian, Amber, and custom code
  • Login node: cedar.isis.unc.edu
  • Compute node: cypress.isis.unc.edu
  • Cypress: SGI Altix 3700bx2 - 128 Intel Itanium2 Processors (1600MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 6MB L3 cache, 4GB of Shared Memory (512GB total memory)
  • Two 70 GB SCSI System Disks as /scr
sgi altix 3700 smp33
SGI Altix 3700 SMP
  • Cedar: SGI Altix 350 - 8 Intel Itanium2 Processors (1500MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 4MB L3 cache, 1GB of Shared Memory (8GB total memory), two 70 GB SATA System Disks.
  • RHEL 3 with Propack 3, Service Pack 3
  • No AFS (HOME & pkg space) access
  • Scratch Disk:

/netscr, /nas, /scr

emerald cluster
Emerald Cluster
  • General purpose Linux Cluster for Scientific and Statistical Applications
  • Machine Name: Emerald.isis.unc.edu
  • 2 Login Nodes: IBM BladeCenter, one Xeon 2.4GHz, 2.5GB RAM and one Xeon 2.8GHz, 2.5GB RAM
  • 18 Compute Nodes: Dual AMD Athlon 1600+ 1.4GHz MP Processor, Tyan Thunder MP Motherboard, 2GB DDR RAM on each node
  • 6 Compute Nodes: Dual AMD Athlon 1800+ 1.6GHz MP Processor, Tyan Thunder MP Motherboard, 2GB DDR RAM on each node
  • 25 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.4GHz, 2.5GB RAM on each node
  • 96 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.8GHz, 2.5GB RAM on each node
  • 15 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 3.2GHz, 4.0GB RAM on each node
  • Access to 10 TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr
  • Login: emerald.isis.unc.edu
  • Access to 7TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr
  • OS: RedHat Enterprise Linux 3.0
  • TOP500: 395th place in the June 2003 release.
dell linux cluster topsail
Dell LINUX Cluster, Topsail
  • 520 dual nodes (4160 CPUs) Xeon (EM64T)
  • 3.6GHz, 2MB L1 cache 2GB memory per CPU
  • InfiniBand inter-node connection
  • Not AFS mounted, not open to general public
  • Access based on peer-reviewed proposal
  • HPL: 6.252 Teraflops, 74th in 2006 JuneTOP500 list and 104th in the November 2006 list and 25th in the June 2007 list (28.77 teraflopsafter upgrade)
topsail
Topsail
  • Login node : topsail.unc.edu 8 CPUs @ 2.3 GHz Intel EM64T with 2x4M L2 cache (Model E5345/Clovertown), 12 GB memory
  • Compute nodes : 4,160 CPUs @ 2.3 GHz Intel EM64T with 2x4M L2 cache (Model E5345/Clovertown), 12 GB memory
  • Shared Disk : (/ifs1) 39 TB IBRIX Parallel File System
  • Interconnect: Infiniband 4x SDR
  • Resource management is handled by LSF v.6.2, through which all computational jobs are submitted for processing
file systems
File Systems
  • AFS (Andrew File System): AFS is a distributed network file system that enables files from any AFS machine across the campus to be accessed as easily as files stored locally.
    • As ISIS HOME for all users with an ONYEN – the Only Name You’ll Ever Need
    • Limited quote: 250 MB for most users [type “fs lq” to view]
    • Current production version openafs-1.3.8.6
    • Files backed up daily [ ~/OldFiles ]
    • Directory/File tree: /afs/isis/home/o/n/onyen
      • For example: /afs/isis/home/m/a/mason, where “mason” is the ONYEN of the user
    • Accessible from emerald, happy/yatta
    • But not from cedar/cypress, topsail
    • Recommended to compile, run I/O intensive jobs on /scr or /netscr
    • More info: http://help.unc.edu/?id=215#d0e24
basic afs commands
Basic AFS Commands
  • To add or remove packages
    • ipm add pkg_name, ipm remove pkg_name
  • To find out space quota/usage
    • fs lq
  • To see and review AFS tokens (read/write-able), which expires in 25 hours
    • tokens, klog
  • Over 300 packages installed in AFS pkg space
    • /afs/isis/pkg/
  • More info available at
    • http://its.unc.edu/dci/dci_components/afs/
data storage
Data Storage
  • Local Scratch: /scr – local to a machine
    • Cedar/cypress: 2x500 GB SCSI System Disks
    • Topsail: /ifs1 39 TB IBRIX Parallel File System
    • Happy/yatta: 2x500 GB Disk Drives
    • For running jobs, temporary data storage, not backed up
  • Network Attached Storage (NAS) – for temporary storage
    • /nas/uncch, /netscr
    • 7TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr
    • For running jobs, temporary data storage, not backed up
    • Shared by all login and compute nodes

(cedar/cypress, happy/yatta, emerald)

  • Mass Storage (MS) – for permanent storage
    • Never run jobs using files in ~/ms (compute nodes do not have ~/ms access)
    • Mounted for long term data storage on all scientific computing

servers’ login nodes as ~/ms ($HOME/ms)

subscription of services
Subscription of Services
  • Have an ONYEN ID
    • The Only Name You’ll Ever Need
  • Eligibility: Faculty, staff, postdoc, and graduate students
  • Go to http://onyen.unc.edu
access to servers
Access to Servers
  • To Emerald
    • ssh emerald.isis.unc.edu
  • To cedar
    • ssh cedar.isis.unc.edu
  • To Topsail
    • ssh topsail.unc.edu
programming tools
Programming Tools
  • Compilers
    • FORTRAN 77/90/95
    • C/C++
  • Utility Libraries
    • BLAS, LAPACK, FFTW, SCALAPACK
    • IMSL, NAG,
    • NetCDF, GSL, PETSc
  • Parallel Computing
    • OpenMP
    • PVM
    • MPI (MPICH, LAM/MPI, MPICH-GM)
compilers smp machines
Compilers: SMP Machines
  • Cedar/Cypress – SGI Altix 3700, 128 CPUs
    • 64-bit Intel Compiler versions 9.1 and 10.1, /opt/intel
      • FORTRAN 77/90/95: ifort/ifc/efc
      • C/C++: icc/ecc
    • 64-bit GNU compilers
      • FORTRAN 77 f77/g77
      • C and C++ gcc/cc and g++/c++
  • Yatta/P575 – IBM P690/P575, 32/64CPUs
    • XL FORTRAN 77/90 8.1.0.3 xlf, xlf90
    • C and C++ AIX 6.0.0.4 xlc, xlC
compilers linux cluster
Compilers: LINUX Cluster
  • Absoft ProFortran Compilers
    • Package Name: profortran
    • Current Version: 7.0
    • FORTRAN 77 (f77): Absoft FORTRAN 77 compiler version 5.0
    • FORTRAN 90/95 (f90/f95): Absoft FORTRAN 90/95 compiler version 3.0
  • GNU Compilers
    • Package Name: gcc
    • Current Version: 4.1.2
    • FORTRAN 77 (g77/f77): 3.4.3, 4.1.2
    • C (gcc): 3.4.3, 4.1.2
    • C++ (g++/c++): 3.4.3, 4.1.2
  • Intel Compilers
    • Package Name: intel_fortran intel_CC
    • Current Version: 10.1
    • FORTRAN 77/90 (ifc): Intel LINUX compiler version 8.1, 9.0, 10.1
    • CC/C++ (icc): Intel LINUX compiler version 8.1, 9.0, 10.1
  • Portland Group Compilers
    • Package Name: pgi
    • Current Version: 7.1.6
    • FORTRAN 77 (pgf77): The Portland Group, Inc. pgf77 v6.0, 7.0.4, 7.1.3
    • FORTRAN 90 (pgf90): The Portland Group, Inc. pgf90 v6.0, 7.0.4, 7.1.3
    • High Performance FORTRAN (pghpf): The Portland Group, Inc. pghpf v6.0, 7.0.4, 7.1.3
    • C (pgcc): The Portland Group, Inc. pgcc v6.0, 7.0.4, 7.1.3
    • C++ (pgCC): The Portland Group, Inc. v6.0, 7.0.4, 7.1.3
linux compiler benchmark
LINUX Compiler Benchmark

Absoft ProFortran 90

Intel FORTRAN 90

Portland Group FORTRAN 90

GNU FORTRAN 77

Molecular Dynamics (CPU time)

4.19 (4)

2.83 (2)

2.80 (1)

2.89 (3)

Kepler (CPU Time)

0.49 (1)

0.93 (2)

1.10 (3)

1.24 (4)

Linpack (CPU Time)

98.6 (4)

95.6 (1)

96.7 (2)

97.6 (3)

Linpack (MFLOPS)

182.6 (4)

183.8 (1)

183.2 (3)

183.3 (2)

LFK (CPU Time)

89.5 (4)

70.0 (3)

68.7 (2)

68.0 (1)

LFK (MFLOPS)

309.7 (3)

403.0 (2)

468.9 (1)

250.9 (4)

Total Rank

20

11

12

17

  • For reference only. Notice that performance is code and compilation flag dependent. For each benchmark, three
  • identical runs were performed and the best CPU timing was chosen among the three and then listed in the Table.
  • Optimization flags: for Absoft -O, Portland Group -O4 -fast, Intel -O3, GNU -O
profilers debuggers
Profilers & Debuggers
  • SMP machines
    • Happy: dbx, prof, gprof
    • Cedar: gprof
  • LINUX Cluster
    • PGI: pgdebug, pgprof, gprof
    • Absoft: fx, xfx, gprof
    • Intel: idb, gprof
    • GNU: gdb, gprof
utility libraries
Utility Libraries
  • Mathematic Libraries
    • IMSL, NAG, etc.
  • Scientific Computing
    • Linear Algebra
      • BLAS, ATLAS
      • EISPACK
      • LAPACK
      • SCALAPACK
    • Fast Fourier Transform, FFTW
    • BLAS/LAPACK, ScaLAPACK
    • The GNU Scientific Library, GSL
    • Utility Libraries, netCDF, PETSc, etc.
utility libraries49
Utility Libraries
  • SMP Machines
    • Yatta/P575: ESSL (Engineering and Scientific Subroutine Library), -lessl
      • BLAS
      • LAPACK
      • EISPACK
      • Fourier Transforms, Convolutions and Correlations, and Related Computations
      • Sorting and Searching
      • Interpolation
      • Numerical Quadrature
      • Random Number Generation
      • Utilities
utility libraries50
Utility Libraries
  • SMP Machines
      • Cedar/Cypress: MKL (Intel Math Kernel Library) 8.0,

-L/opt/intel/mkl721/lib/64-lmkl -lmkl_lapack -lsolver -lvml -lguide

          • BLAS
          • LAPACK
          • Sparse Solvers
          • FFT
          • VML (Vector Math Library)
          • Random-Number Generators
utility libraries for emerald cluster
Utility Libraries for Emerald Cluster
  • Mathematic Libraries
    • IMSL
      • The IMSL Libraries are a comprehensive set of mathematical and statistical functions
      • From Visual Numerics, http://www.vni.com
      • Functions include

- Optimization - FFT’s

- Interpolation - Differential equations

- Correlation - Regression

- Time series analysis - and many more

      • Available in FORTRAN and C
      • Package name: imsl
      • Required compiler: Portland Group compiler, pgi
      • Installed on AFS ISIS package space, /afs/isis/pkg/imsl
      • Current default version 4.0, latest version 5.0
      • To subscribe IMSL, type “ipm add pgi imsl”
      • To compiler a C code, code.c, using IMSL:

pgcc -O $CFLAGS code.c -o code.x $LINK_CNL_STATIC

utility libraries for emerald cluster52
Utility Libraries for Emerald Cluster
  • Mathematic Libraries
    • NAG
      • NAG produces and distributes numerical, symbolic, statistical, visualisation and simulation software for the solution of problems in a wide range of applications in such areas as science, engineering, financial analysis and research.
      • From Numerical Algorithms Group, http://www.nag.co.uk
      • Functions include

- Optimization - FFT’s

- Interpolation - Differential equations

- Correlation - Regression

- Time series analysis - Multivariate factor analysis

- Linear algebra - Random number generator

      • Available in FORTRAN and C
      • Package name: nag
      • Available platform: SGI IRIX, SUN Solaris, IBM AIX, LINUX
      • Installed on AFS ISIS package space, /afs/isis/pkg/nag
      • Current default version 6.0
      • To subscribe IMSL, type “ipm add nag”
utility libraries for emerald cluster53
Utility Libraries for Emerald Cluster
  • Scientific Libraries
    • Linear Algebra
      • BLAS, LAPACK, LAPACK90, LAPACK++, ATALS, SPARSE-BLAS, SCALAPACK, EISPACK, FFTPACK, LANCZOS, HOMPACK, etc.
      • Source code downloadable from the website: http://www.netlib.org/liblist.html
      • Compiler dependent
      • BLAS and LAPACK available for all 4 compiler at AFS ISIS package space, gcc, profortran, intel and pgi
      • SCALAPACK available for pgi and intel compilers
      • Assistance available if other versions are needed
utility libraries for emerald cluster54
Utility Libraries for Emerald Cluster
  • Scientific Libraries
    • Other Libraries: not fully implemented yet and thus please be cautious and patient when using them
      • FFTW http://www.fftw.org/
      • GSL http://www.gnu.org/software/gsl/
      • NetCDF http://www.unidata.ucar.edu/software/netcdf/
      • NCO http://nco.sourceforge.net/
      • HDF http://hdf.ncsa.uiuc.edu/hdf4.html
      • OCTAVE http://www.octave.org/
      • PETSc http://www-unix.mcs.anl.gov/petsc/petsc-as/
      • ……
    • If you think more libraries are of broad interest, please recommend to us
parallel computing
Parallel Computing
  • SMP Machines:
    • OpenMP
      • Compilation:
        • Use “-qsmp=omp” flag on happy
        • Use “-openmp” flag on cedar
      • Environmental Variable Setup
        • setenv OMP_NUM_THREADS n
    • MPI
      • Compilation:
        • Use “-lmpi” flag on cedar
        • Use MPI capable compilers, e.g., mpxlf, mpxlf90, mpcc, mpCC
    • Hybrid (OpenMP and MPI): Do both!
parallel computing with emerald cluster
Parallel Computing With Emerald Cluster
  • Setup

MPI Implementation

MPICH

MPI-LAM

MPI Package to be “ipm add”-ed

mpich

mpi-lam

Vendor\Programming Language

F77

F90

C

C++

F77

F90

C

C++

GNU Compilers

Absoft ProfFortran Compilers

Portland Group Compilers

Intel Compilers

parallel computing with emerald cluster57
Parallel Computing With Emerald Cluster
  • Setup

Vendor \ Language

Package Name

FORTRAN 77

FORTRAN 90

C

C++

GNU

gcc

g77

gcc

g++

Absoft ProfFortran

profortran

f77

f95

Portland Group

pgi

pgf77

pgf90

pgcc

pgCC

Intel

intel_fortran

intel_CC

ifc

ifc

icc

icc

Commands for Parallel MPI Compilation

mpich or

mpi-lam

mpif77

mpif90

mpicc

mpiCC

parallel computing with emerald cluster58
Parallel Computing With Emerald Cluster
  • Setup
    • AFS Packages to be “ipmadd”-ed
    • Notice the order: Compiler is always added first
    • Add ONLY ONE compiler into your environment

COMPILER

MPICH

MPI-LAM

GNU

ipm add gcc mpich

ipm add gcc mpi-lam

Absoft ProFortran

ipm add profortran mpich

ipm add profortran mpi-lam

Portland Group

ipm add pgi mpich

ipm add pgi mpi-lam

Intel

ipm add intel_fortran intel_CC mpich

ipm add intel_fortran intel_CC mpi-lam

parallel computing with emerald cluster59
Parallel Computing With Emerald Cluster
  • Compilation
    • To compile an MPI Fortran 77 code, code.f, and to form an executable, exec%mpif77 -O -o exec code.f
    • For a Fortran 90/95 code, code.f90, and to form an executable, exec%mpif90 -O -o exec code.f90
    • For a C code, code.c, and to form an executable, exec%mpicc -O -o exec code.c
    • For a C++ code, code.cc, and to form an executable, exec%mpiCC -O -o exec code.cc
scientific packages
Scientific Packages
  • Available in AFS package space
  • To subscribe a package, type “ipm add pkg_name” where “pkg_name is the name of the package. For example, “ipm add gaussian”
  • To remove it, type “ipm remove pkg_name”
  • All packages are installed at the /afs/isis/pkg/ directory.

For example, /afs/isis/pkg/gaussian.

  • Categories of scientific packages include:
    • Quantum Chemistry
    • Molecular Dynamics
    • Material Science
    • Visualization
    • NMR Spectroscopy
    • X-Ray Crystallography
    • Bioinformatics
    • Others
scientific package quantum chemistry
Scientific Package: Quantum Chemistry

Software

Package Name

Platforms

Current Version

Parallel

ABINIT

abinit

IRIX/LINUX

4.3.3

YES (MPI)

ADF

adf

LINUX

2002.02

Yes (PVM)

Cerius2

cerius2

IRIX/LINUX

4.10

Yes (MPI)

GAMESS

gamess

IRIX/LINUX

2003.9.6

Yes (MPI)

Gaussian

gaussian

IRIX/LINUX

03E01

Yes (OpenMP)

MacroModel

macromodel

IRIX

7.1

No

MOLFDIR

molfdir

IRIX

2001

NO

Molpro

molpro

IRIX/LINUX

2006.6

Yes (MPI)

NWChem

nwchem

IRIX/LINUX

5.1

Yes (MPI)

MaterialStudio

materisalstudio

LINUX

4.2

Yes (MPI)

CPMD

cpmd

IRIX/LINUX

3.9

YES (MPI)

ACES2

aces2

IRIX

4.1.2

No

scientific package molecular dynamics
Scientific Package: Molecular Dynamics

Software

Package Name

Platforms

Current Version

Parallel

Amber

amber

IRIX/LINUX

9.1

MPI

NAMD/VMD

namd,vmd

IRIX/LINUX

2.5

MPI

Gromacs

gromcs

IRIX/LINUX

3.2.1

MPI

InsightII

insightII

IRIX

2000.3

--

MacroModel

macromodel

IRIX

7.1

--

PMEMD

pmemd

IRIX/LINUX

3.0.0

MPI

Quanta

quanta

IRIX

2005

MPI

Sybyl

sybyl

IRIX/LINUX

7.1

--

CHARMM

charmm

IRIX

3.0B1

MPI

TINKER

tinker

LINUX

4.2

--

O

o

IRIX

9.0.7

--

molecular scientific visualization
Molecular & ScientificVisualization

Software

Package Name

Platforms

Current Version

AVS

avs

IRIX

5.6

AVS Express

Avs-express

IRIX

6.2

Cerius2

cerius2

IRIX/LINUX

4.9

DINO

dino

IRIX

0.8.4

ECCE

ecce

IRIX

2.1

GaussView

gaussian

IRIX/LINUX/AIX

4.0

GRASP

grasp

IRIX

1.3.6

InsightII

insightII

IRIX/LINUX

2000.3

MOIL-VIEW

Moil-view

IRIX

9.1

MOLDEN

molden

IRIX/LINUX

4.0

MOLKEL

molkel

IRIX

4.3

MOLMOL

molmol

IRIX

2K.1

MOLSCRIPT

molscript

IRIX

2.1.2

MOLSTAR

molstar

IRIX/LINUX

1.0

molecular scientific visualization64
Molecular & Scientific Visualization

Software

Package Name

Platforms

Current Version

MOVIEMOL

moviemol

IRIX

1.3.1

NBOView

nbo

IRIX/LINUX

5.0

QUANTA

quanta

IRIX/LINUX

2005

RASMOL

rasmol

IRIX/LINUX/AIX

2.7.3

RASTER3D

raster3d

IRIX/LINUX

2.7c

SPARTAN

spartan

IRIX

5.1.3

SPOCK

spock

IRIX

1.7.0p1

SYBYL

sybyl

IRIX/LINUX

7.1

VMD

vmd

IRIX/LINUX

1.8.2

XtalView

xtalview

IRIX

4.0

XMGR

xmgr

IRIX

4.1.2

GRACE

grace

IRIX/LINUX

5.1.2

IMAGEMAGICK

Imagemagick

IRIX/LINUX/AIX

6.2.1.3

GIMP

gimp

IRIX/LINUX/AIX

1.0.2

XV

xv

IRIX/LINUX/AIX

3.1.0a

nmr x ray crystallography
NMR & X-Ray Crystallography

Software

Package Name

Platforms

Current Version

CNSsolve

cnssolve

IRIX/LINUX

1.1

AQUA

aqua

IRIX/LINUX

3.2

BLENDER

blender

IRIX

2.28a

BNP

bnp

IRIX/LINUX

0.99

CAMBRIDGE

cambridge

IRIX

5.26

CCP4

ccp4

IRIX/LINUX

4.2.2

CNX

cns

IRIX/LINUX

2002

FELIX

felix

IRIX/LINUX

2004

GAMMA

gamma

IRIX

4.1.0

MOGUL

mogul

IRIX/LINUX

1.0

Phoelix

phoelix

IRIX

1.2

TURBO

turbo

IRIX

5.5

XPLOR-NIH

Xplor_nih

IRIX/LINUX

2.11.2

XtalView

xtalview

IRIX

4.0

scientific package bioinformatics
Scientific Package: Bioinformatics

Software

Package Name

Platforms

Current Version

BIOPERL

bioperl

IRIX

1.4.0

BLAST

blast

IRIX/LINUX

2.2.6

CLUSTALX

clustalx

IRIX

8.1

EMBOSS

emboss

IRIX

2.8.0

GCG

gcg

LINUX

11.0

Insightful Miner

iminer

IRIX

3.0

Modeller

modeller

IRIX/LINUX

7.0

PISE

pise

LINUX

5.0a

SEAVIEW

seaview

IRIX/LINUX

1.0

AUTODOCK

autodock

IRIX

3.05

DOCK

dock

IRIX/LINUX

5.1.1

FTDOCK

ftdock

IRIX

1.0

HEX

hex

IRIX

2.4

why do we need job management systems
Why do We Need Job Management Systems?
  • “Whose job you run in addition to when and where it is run, may be as important as how many jobs you run!”
  • Effectively optimizes the utilization of resources
  • Effectively optimizes the sharing of resources
  • Often referred to as Resource Management Software, Queuing Systems, or Job Management System, etc.
job management tools
Job Management Tools
  • PBS - Portable Batch System
    • Open Source Product Developed at NASA Ames Research Center
  • DQS - Distributed Queuing System
    • Open Source Product Developed by SCRI at Florida State University
  • LSF - Load Sharing Facility
    • Commercial Product from Platform Computing, Already Deployed at UNC-CH ITS Computing Servers
  • Codine/Sun Grid Engine
    • Commercial Version of DQS from Gridware, Inc. Now owned by SUN.
  • Condor
    • A Restricted Source ‘Cycle Stealing’ Product From The University of Wisconsin
  • Others Too Numerous To Mention
operations of lsf
Operations of LSF

other

hosts

other

hosts

Execution host

Submission host

Master host

3

LIM

LIM

MLIM

Load

information

4

2

5

SBD

MBD

Batch API

11

8

9

Child SBD

1

7

6

queue

12

10

bsub app

RES

13

LIM – Load Information Manager

MLIM – Master LIM

MBD – Master Batch Daemon

SBD – Slave Batch Daemon

RES – Remote Execution Server

User job

common lsf commands
lsid

A good choice of LSF command to start with is the lsid command

lshosts/bhosts

shows all of the nodes that the LSF system is aware of

bsub

submits a job interactively or in batch using LSF batch scheduling and queue layer of the LSF suite

bjobs

isplays information about a recently run job. You can use the –l option to view a more detailed accounting

bqueues

displays information about the batch queues. Again, the –l option will display a more thorough description

bkill <job ID# >

kill the job with job ID number of #

bhist -l <job ID# >

displays historical information about jobs. A “-a” flag can displays information about both finished and unfinished jobs

bpeek -f <job ID#>

displays the stdout and stderr output of an unfinished job with a job ID of #.

bhpart

displays information about host partitions

bstop

Suspend a unfinished jobs

bswitch

switches unfinished jobs from one queue to another

Common LSF Commands
more about lsf
More about LSF
  • Type “jle” -- checks job efficiency
  • Type “bqueues” for all queues on one cluster/machine (-m); Type “bqueues -l queue_name” for more info about the queue named “queue_name”
  • Type “busers” for user job slot limits
  • Specific for Baobab:
    • cpufree -- to check how many free/idle CPUs avaialble
    • pending -- to check how many jobs are still pending
    • bfree – to check how many free slots available “bfree –h”
lsf queues emerald clusters
LSF Queues Emerald Clusters

Queues

Description

int

Interactive jobs

now

Preemptive debugging queue, 10 min wall-clock limit, 2 CPUs

week

Default queue, one week wall-clock limit, up to 32 CPUs/user

month

Long-running serial-job queue, one month wall-clock limit, up to 4 jobs per user

staff

ITS Research Computing staff queue

manager

For use by LSF administrators

how to submit jobs via lsf on emerald clusters
How to Submit Jobs via LSF on Emerald Clusters
  • Jobs to Interactive Queue

bsub -q int -m cedar -Ip my_interactive_job

  • Serial Jobs

bsub -q week -m cypress my_batch_job

  • Parallel OpenMP Jobs

setenv OMP_NUM_THREADS 4

bsub -q week -n 4 -m cypress my_parallel_job

  • Parallel MPI Jobs

bsub -q week -n 4 -m cypress mpirun -np 4 my_parallel_job

peculiars of emerald cluster
Peculiars of Emerald Cluster

CPU Type

Resources

-R

Parallel Job Submission

esub

-a

Wrapper

Xeon 2.4 GHz

Xeon24, blade,…

lammpi

mpichp4

lammpirun_wrapper

mpichp4_wrapper

Xeon 2.8 GHz

Xeon28, blade,…

Xeon 3.2 GHz

Xeo32, blade,…

16-Way IBM P575

p5aix,…

Notice that -R and -a flags are mutually exclusive in one command line.

run jobs on emerald linux cluster
Run Jobs on Emerald LINUX Cluster
  • Interactive Jobs

bsub -q int -R xeon28 -Ip my_interactive_job

  • Syntax for submitting a serial job is:

bsub -q queuename -R resources executable

    • For example

bsub -q week -R blade my_executable

  • To run a MPICH parallel job on AMD Athlon machines with, say, 4 CPUs,

bsub -q idle -n 4 -a mpichp4 mpirun.lsf my_par_job

  • To run LAM/MPI parallel jobs on IBM BladeCenter machines with, say, 4 CPUs:

bsub -q week -n 4 -a lammpi mpirun.lsf my_par_job

final friendly reminders
Final Friendly Reminders
  • Never run jobs on login nodes
    • For file management, coding, compilation, etc., purposes only
  • Never run jobs outside LSF
    • Fair sharing
  • Never run jobs on your AFS ISIS home or ~/ms. Instead, on /scr, /netscr, or /nas
    • Slow I/O response, limited disk space
  • Move your data to mass storage after jobs are finished and remove all temporary files on scratch disks
    • Scratch disk not backed up, efficient use of limited resources
    • Old files will automatically be deleted without notification
online resources
Online Resources
  • Get started with Research Computing:

http://www.unc.edu/atn/hpc/getting_started/index.shtml?id=4196

  • Programming Tools

http://www.unc.edu/atn/hpc/programming_tools/index.shtml

  • Scientific Packages

http://www.unc.edu/atn/hpc/applications/index.shtml?id=4237

  • Job Management

http://www.unc.edu/atn/hpc/job_management/index.shtml?id=4484

  • Benchmarks

http://www.unc.edu/atn/hpc/performance/index.shtml?id=4228

  • High Performance Computing

http://www.beowulf.org

http://www.top500.org

http://www.linuxhpc.org

http://www.supercluster.org/

short courses
Short Courses
  • Introduction to Scientific Computing
  • Introduction to Emerald
  • Introduction to Topsail
  • LINUX: Introduction
  • LINUX: Intermediate
  • MPI for Parallel Computing
  • OpenMP for Parallel Computing
  • MATLAB: Introduction
  • STATA: Introduction
  • Gaussian and GaussView
  • Introduction to Computational Chemistry
  • Shell Scripting
  • Introduction to Perl

http://learnit.unc.edu click “Current Schedule of ITS Workshops”

hands on exercises
Hands-on Exercises
  • If you haven’t done so yet
    • Subscribe the Research Computing services
    • Access via SecureCRT or X-Win32 to emerald, topsail, etc.
    • Create a working directory for yourself on /netscr or /scr
    • Get to know basic AFS, UNIX commands
    • Get to know the Baobab Beowulf cluster
  • Compile serial and one parallel (MPI) codes on Emerald
  • Get familiar with basic LSF commands
  • Get to know available packages available in AFS space
  • Submit jobs via LSF using serial or parallel queues
slide80

Please direct comments/questions about research computing toE-mail: research@unc.eduPlease direct comments/questions pertaining to this presentation toE-Mail: shubin@email.unc.edu