how to make pc cluster systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
How to make PC Cluster Systems? PowerPoint Presentation
Download Presentation
How to make PC Cluster Systems?

Loading in 2 Seconds...

play fullscreen
1 / 72

How to make PC Cluster Systems? - PowerPoint PPT Presentation


  • 155 Views
  • Uploaded on

How to make PC Cluster Systems?. Tomo Hiroyasu Doshisha University Kyoto Japan tomo@is.doshisha.ac.jp. Cluster. clus · ter n.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

How to make PC Cluster Systems?


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
how to make pc cluster systems

How to make PC Cluster Systems?

Tomo Hiroyasu

Doshisha University

Kyoto Japan

tomo@is.doshisha.ac.jp

cluster

Cluster

  • clus·tern.
      • A group of the same or similar elements gathered or occurring closely together; a bunch: “She held out her hand, a small tight cluster of fingers” (Anne Tyler).
      • Linguistics. Two or more successive consonants in a word, as cl and st in the word cluster.

A Cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand alone/complete computers cooperatively working together as a single, integrated computing resource.

evolutionary computation

Evolutionary Computation

Features

It simulates the mechanism of creatures’ heredity and evolution.

It can apply to several types of problems.

It needs a huge computational costs.

There are several individuals.

Tasks can be divided into sub tasks.

High Performance Computing

slide5

http://www.top500.org

Top500

Ranking

Name

# Proc

Rmax

(Gflops)

1

8192

ASCI White

4938

9632

2

ASCI Red

2379

5808

3

ASCI Blue Pacific

2144

1608

4

ASCI Blue

6144

5

SP Power III

1417

1336

Parallel Computers

commodity hardware

Commodity Hardware

Networking

Internet

Lan

Wan

Gigabit

cable less

etc.

CPU

Pentium

Alpha

Power

etc.

PCs + Networking

PC Clusters

why pc cluster

Why PC Cluster?

High ability

Low Cost

Easy to setup

Easy to use

Possession

hardware

Commodity

Off-the-shelf

Software

Open source

Free ware

Peopleware

University students and staff

Lab nerds

slide8

http://www.top500.org

Top500

Ranking

Name

# Proc

Rmax

(Gflops)

60

512

Los Lobos

237

232.6

84

CPlant Cluster

580

528

126

CLIC PIII 800 MHz

143.3

196

215

Kepler PIII 650 MHz

96.2

396

SCore II/PIII 800 MHz

132

64.7

contents of this tutorial

Contents of this tutorial

Concept of PC Clusters

Small Cluster

Advanced Cluster

Hardware

Software

Books, Web sites, …

Conclusions

beowulf cluster

Beowulf Cluster

http://beowulf.org/

A Beowulf is a collection of personal computers (PCs) interconnected by widely available networking running any one of several open-source Unix-like operating systems.

Some Linux clusters are built for reliability instead of speed. These are not Beowulfs.

The Beowulf Project was started by Donald Becker when he moved to CESDIS in early 1994. CESDIS was located at NASA's Goddard Space Flight Center, and was operated for NASA by USRA.

avalon

Avalon

http://cnls.lanl.gov/Frames/avalon-a.html

Los Alamos National

Laboratory

Alpha(140)+Myrinet

Beowulf

First Beowulf in the ranking of Top 500

the berkeley now project

The Berkeley NOW project

http://now.cs.berkeley.edu/

The Berkeley NOW project is building system support for using a network of workstations (NOW) to act as a distributed supercomputer on a building-wide scale.

April 30, 1997: NOW makes LINPACK Top 500!

June 15, 1998: NOW Retreat Finale

cplant cluster

Cplant Cluster

http://www.cs.sandia.gov/cplant/

Sandia National Laboratory

Alpha(580) + Myrinet

rwcp cluster

RWCP Cluster

http://pdswww.rwcp.or.jp/

Japanese typical cluster

Score, Open MP

Myrinet

doshisha cluster

Doshisha Cluster

http://www.is.doshisha.ac.jp/cluster/index.html

Pentium III 0.8G (256) + Fast Ethernet

Pentium III 1.0 G (2*64) + Myrinet 2000

simple cluster

Simple Cluster

8nodes + gateway(file server)

Fast Ethernet

Switching Hub

$10000

what do we need

What do we need?

Normal PCs

Hardware

CPU

memory

motherboard

hard disc

case

network card

cable

hub

what do we need1

What do we need?

Software

OS

tools

Editor

Compiler

Parallel Library

message passing libraries
Message Passing Libraries

PVM (Parallel Virtual Machine)

http://www.epm.ornl.gov/pvm/pvm_home.html

PVM was developed at Oak Ridge National Laboratory and the University of Tennessee.

MPI (Message Passing Interface)

http://www-unix.mcs.anl.gov/mpi/index.html

MPI is an API of message passing.

1992: MPI forum

1994 MPI 1

1887 MPI 2

implementations of mpi
Implementations of MPI

Free Implementation

MPICH :

LAM:

WMPI : Windows 95,NT

CHIMP/MPI

MPI Light

Bender Implementation

Implementations of parallel computers

MPI/PRO :

procedure of constructing clusters

Procedure of constructing clusters

Prepare several PCs

Connected PCs

Install OS and tools

Install developing tools and parallel library

installing mpich lam

Installing MPICH/LAM

# rpm –ivh lam-6.3.3b28-1.i386.rpm

# rpm –ivh mpich-1.2.0-5.i386.rpm

# dpkg –i lam2_6.3.2-3.deb

# dpkg –i mpich_1.1.2-11.deb

# apt-get install lam2

# apt-get install mpich

parallel programming mpi
Parallel programming (MPI)

Massive parallel computer

gateway

Jobs

Tasks

user

PC-Cluster

programming style sheet

Initialization

Communicator

Acquiring number of process

Acquiring rank

Termination

Programming style sheet

# include “mpi.h”

int main( int argc, char **argv )

{

MPI_Init(&argc, &argv ) ;

MPI_Comm_size( …… );

MPI_Comm_rank( …… ) ;

/* parallel procedure */

MPI_Finalize( ) ;

return 0 ;

}

communications
Communications

One by one communication

Group communication

Process A

Process B

Receive/send data

Receive/send data

one by one communication
One by one communication

[Sending]

MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

void *buf:Sending buffer starting address (IN)

int count:Number of Data (IN)

MPI_ Datatype datatype:data type (IN)

int dest:receiving point (IN)

int tag:message tag (IN)

MPI_Comm comm:communicator(IN)

one by one communication1
One by one communication

[Receiving]

MPI_Recv( void *buf, int count, MPI_Datatypedatatype, int source, int tag, MPI_Commcomm, MPI_Statusstatus)

void *buf:Receiving buffer starting address (OUT)

int source:sending point (IN)

int tag:Message tag (IN)

MPI_Status *status:Status (OUT)

hello c
~Hello.c~

#include <stdio.h>

#include "mpi.h"

void main(int argc,char *argv[])

{

int myid,procs,src,dest,tag=1000,count;

char inmsg[10],outmsg[]="hello";

MPI_Status stat;

MPI_Init(&argc,&argv);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

count=sizeof(outmsg)/sizeof(char);

if(myid == 0){

src = 1; dest = 1;

MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD);

MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat);

printf("%s from rank %d\n",&inmsg,src);

}else{

src = 0; dest = 0;

MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat);

MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD);

printf("%s from rank %d\n",&inmsg,src);

}

MPI_Finalize();

}

slide34

One by one communication

MPI_Recv(&inmsg,count,MPI_CHAR,src, tag,MPI_COMM_WORLD,&stat);

MPI_Send(&outmsg,count,MPI_CHAR,dest,

tag,MPI_COMM_WORLD);

MPI_Sendrecv(&outmsg,count,MPI_CHAR,dest,

tag,&inmsg,count,MPI_CHAR,src,

tag,MPI_COMM_WORLD,&stat);

calculation of pi approximation

4

3.5

3

2.5

y

2

1.5

1

0.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

Calculation of PI (approximation)

-Parallel conversion-

Integral calculus is divided in to sub sections.

Each subsection is allotted to processors.

Results of calculation are assembled.

group communication
Group communication

Broadcast

MPI_Bcast( void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm )

Rank of sending point

Data

group communication1
Group Communication
  • Communication and operation (reduce) MPI_Reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Opop, int root, MPI_Comm comm )

Operation handle

Rank of receiving point

MPI_SUM, MPI_MAX, MPI_MIN, MPI_PROD

Operation

slide40

Hardware

CPU

Intel Pentium III, IV

AMD Athlon

Transmeta Crusoe

http://www.intel.com/

http://www.amd.com/

http://www.transmeta.com/

network

Hardware

Network

Ethernet

Gigabit Ethernet

Myrinet

QsNet

Giganet

SCI

Atoll

VIA

Infinband

Gigabit

Wake On LAN

hard disc

Hardware

Hard disc

SCSI

IDE

Raid

Diskless Cluster

http://www.linuxdoc.org/HOWTO/Diskless-HOWTO.html

slide43

Hardware

Case

Box

inexpensive

Rack

compact

maintenance

slide45

OS

Linux Kernels

Open source network

Free ware

Features

The /proc file system

Loadable kernel modules

Virtual consoles

Package management

slide46

OS

Linux Kernels

http://www.kernel.org/

Linux Distributions

Red Hat www.redhat.com

Debian GNU/Linux www.debian.org

S.u.S.E. www.suse.com

Slackware www.slackware.org

administration software

client

server

client

client

Administration software

NFS(Network File System)

NIS (Network Information System)

NTP (Network Time Protocol)

resource management and scheduling

Resource Management and Scheduling

Process distribution

Load balance

Job scheduling of multiple tasks

CONDOR

http://www.cs.wisc.edu/condor/

DQS

http://www.scri.fsu.edu/~pasko/dqs.html

LSF

http://www.platform.com/index.html

The Sun Grid Engine

http://www.sun.com/software/gridware/

tools for program development

Tools for Program Development

GNU http://www.gnu.org/

NAG http://www.nag.co.uk

PGI http://www.pgroup.com/

VAST http://www.psrv.com/

Absoft http://www.absoft.com/

Fujitsu http://www.fqs.co.jp/fort-c/

Intel

http://developer.intel.com/software/

products/compilers/index.htm

Editor Emacs

Language C, C++, Fortran, Java

Compiler

tools for program development1

Tools for Program Development

Make

CVS

Debugger

Gdb

Total View http://www.etnus.com

free mpi implementations

Free MPI Implementations

mpich

http://www-unix.mcs.anl.gov/mpi/index.html

Easy to use

High portability

for UNIX, NT/Win, Globus

Lam

http://www.lam-mpi.org/

High availability

slide52

MPICH VS LAM (SMP)

DGA

Gcc(2.95.3), mpicc

-O2 –funroll - loops

slide53

MPICH VS LAM (# process)

DGA

Gcc(2.95.3), mpicc

-O2 –funroll - loops

profiler

Profiler

MPE (MPICH)

Paradyn http://www.cs.wisc.edu/paradyn/

Vampier

http://www.pallas.de/pages/vampir.htm

message passing library for win

Message passing library for Win

PVM PVM3.4 WPVM

MPI mpich WMPI(Critical Software) MPICH/NT

(Mississippi State Univ.) MPI Pro

(MPI Software Technology)

cluster distribution

Cluster Distribution

FAI http://www.informatik.uni-koeln.de/fai/

Alinka http://www.alinka.com/

Mosix http://www.mosix.cs.huji.ac.il/

Bproc http://www.beowulf.org/software/bproc.html

Scyld http://www.scyld.com/

Score

http://pdswww.rwcp.or.jp/dist/score/html/index.html

math library

Math Library

PhiPac from Berkeley

FFTW from MIT www.fftw.org

Atlas

Automatic Tuned Linear Algebra software

www.netlib.org/atlas/

ATLAS is an adaptive software architecture and faster than all other portable BLAS implementations and it is comparable with machine specific libraries provided by the vender.

math library1

Math Library

PETSc

PETSc is a large stuite of data structures and routines for both uni and parallel processor scientific computing.

http://www-fp.mcs.anl.gov/petsc/

models of parallel gas

Models of Parallel GAs

Master Slave (Micro grained )

Cellular (Fine grained)

Distributed GAs

(Island, Coarse grained)

master slave model

evaluate

evaluate

evaluate

Master Slave model

Master node

crossover

mutation

evaluation

selection

client

client

client

client

client

client

client

client

client

a) delivers each individual to slave

b) returns the value as soon as finishes calculation

c) sends non-evaluated individual from master

books

Books

“Building Linux Clusters”

“How to Build Beowulf”

“High Performance Cluster Computing”

web sites

Web sites

IEEE Computer Society Task Force on Cluster Computing

http://www.ieeetfcc.org/

White Paper

http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/

Cluster top 500

http://clusters.top500.org/

Beowulf Project

http://www.beowulf.org/

Beowulf Under Ground

http://www.beowulf-underground.org/

in this tutorial

In this tutorial….

Concept of cluster system

How to built systems

Parallel Genetic Algorithms

ssi single system image

SSI(Single System Image)

Entry point

File directory

Control point

Virtual Network

Memory Space

Job Manager

User Interface

Misc

global computing grid

Powerful calculation resources

ex. SETI@home

Project rc5

Global Computing (GRID)

Internet

There are several types of computers