The university of sunderland cluster computer
Download
1 / 80

IET lecture on Cluster Design - John Tindle 2008 - PowerPoint PPT Presentation


  • 484 Views
  • Uploaded on

The University of Sunderland Cluster Computer. IET Lecture by John Tindle Northumbria Network, ICT Group Monday 11 February 2008. Overview of talk. SRIF3 and Potential Vendors General Requirements Areas of Application Development Team Cluster Design Cluster System Hardware + Software

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'IET lecture on Cluster Design - John Tindle 2008' - HarrisCezar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The university of sunderland cluster computer l.jpg

The University of Sunderland Cluster Computer

IET Lecture by John Tindle

Northumbria Network, ICT Group

Monday 11 February 2008


Overview of talk l.jpg
Overview of talk

  • SRIF3 and Potential Vendors

  • General Requirements

  • Areas of Application

  • Development Team

  • Cluster Design

  • Cluster System Hardware + Software

  • Demonstrations


United kingdom science research investment fund srif l.jpg
United Kingdom – Science Research Investment Fund (SRIF)

  • The Science Research Investment Fund (SRIF) is a joint initiative by the Office of Science and Technology (OST) and the Department for Education and Skills (DfES). The purpose of SRIF is to contribute to higher education institutions' (HEIs) long-term sustainable research strategies and address past under-investment in research infrastructure.


Srif3 l.jpg
SRIF3

  • SRIF3 - 90% and UoS - 10%

  • Project duration about two years

  • Made operational by late December 2007

  • Heriot Watt University - coordinator


Potential grid computer vendors l.jpg
Potential Grid Computer Vendors

  • Dell – selected vendor

  • CompuSys – SE England

  • Streamline - midlands

  • Fujitsu - Manchester

  • ClusterVision - Dutch

  • OCF - Sheffield



General requirements7 l.jpg
General requirements

  • High performance general purpose computer

  • Built using standard components

  • Commodity off the shelf (COTS)

  • Low cost PC technology

  • Reuse existing skills - Ethernet

  • Easy to maintain - hopefully


Designed for networking experiments l.jpg
Designed for Networking Experiments

  • Require flexible networking infrastructure

  • Modifiable under program control

  • Managed switch required

  • Unmanaged switch often employed in standard cluster systems

  • Fully connected programmable intranet


System supports l.jpg
System Supports

  • Rate limiting

  • Quality of service (QoS)

  • Multiprotocol Label Switching (MPLS)

  • VLANs and VPNs

  • IPv4 and IPv6 supported in hardware

  • Programmable queue structures


Special requirements 1 l.jpg
Special requirements 1

  • Operation at normal room temperature

  • Typical existing systems require

    • a low air inlet temperature < 5 Degrees C

    • a dedicated server room with airconditioning

  • Low acoustic noise output

  • Dual boot capability

  • Windows or Linux in any proportion


Special requirements 2 continued l.jpg
Special requirements 2 continued

  • Concurrent processing, for example

    • Boxes 75% cores for Windows

    • Boxes 25% cores for Linux

  • CPU power control – 4 levels

  • High resolution displays for media and data visualisation


Advantages of design l.jpg
Advantages of design

  • Heat generated is not vented to the outside atmosphere

  • Airconditioning running cost are not incurred

  • Heat is used to heat the building

  • Compute nodes (height 2U) use relatively large diameter low noise fans



Areas of application14 l.jpg
Areas of application

1. Media systems – 3D rendering

2. Networking experiments

MSc Network Systems – large cohort

3. Engineering computing

4. Numerical optimisation

5. Video streaming

6. IP Television


Application cont 1 l.jpg
Application cont 1

7. Parallel distributed computing

8. Distributed databases

9. Remote teaching experiments

10. Semantic web

11. Search large image databases

12. Search engine development

13. Web based data analysis


Application cont 2 l.jpg
Application cont 2

14. Computational fluid dynamics

15. Large scale data visualisation using high resolution colour computer graphics


Uos cluster development team l.jpg
UoS Cluster Development Team

  • From left to right

  • Kevin Ginty

  • Simon Stobart

  • John Tindle

  • Phil Irving

  • Matt Hinds

  • Note - all wearing Dell tee shirts




Work area at last all up and running l.jpg
Work Area At last all up and running!


Uos estates department l.jpg
UoS Estates Department

  • Very good project work was completed by the UoS Estates Department

    • Electrical network design

    • Building air flow analysis

      • Computing Terraces

    • Heat dissipation

    • Finite element (FE) study and analysis

    • Work area refurbishment



Cluster hardware23 l.jpg
Cluster Hardware

  • The system has been built using

  • Dell compute nodes

  • Cisco networking components

  • Grid design contributions from both Dell and Cisco


Basic building block l.jpg
Basic Building Block

  • Compute nodes

  • Dell PE2950 server

  • Height 2U

  • Two dual core processors

  • Four cores per box

  • Ram 8G , 2G per core

  • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/


Computer nodes l.jpg
Computer Nodes

  • Network interface cards 3 off

  • Local disk drives 250G SATA II

  • The large amount of RAM facilitates virtual computing experiments

  • VMWare server and MS VirtualPC


Cisco 6509 switch l.jpg
Cisco 6509 switch

  • Cisco 6509 URL (1off)

  • Cisco 720 supervisor engines (2off)

  • Central network switch for the cluster

  • RSM router switch module

  • Provides


6509 provides l.jpg
6509 Provides

  • 720Mbps full duplex, (4off port cards)

  • Virtual LANs - VLAN

  • Virtual private networks - VPN

  • Link bandwidth throttling

  • Traffic prioritisation, QoS

  • Network experimentation


Cluster intranet l.jpg
Cluster Intranet

  • The network has three buses

  • Data

  • IPC

  • IPMI


1 data bus l.jpg
1. Data bus

  • User data bus

  • A normal data bus required for interprocessor communication between user applications


2 ipc bus l.jpg
2. IPC Bus

  • Inter process communication (IPC)

  • “The Microsoft Windows operating system provides mechanisms for facilitating communications and data sharing between applications.

  • Collectively, the activities enabled by these mechanisms are called interprocess communications (IPC). Some forms of IPC facilitate the division of labor among several specialized processes”.


Ipc bus continued l.jpg
IPC Bus continued

  • “Other forms of IPC facilitate the division of labor among computers on a network”.

  • Ref Microsoft Website

  • IPC is controlled by the OS

  • For example IPC is

    • Used to transfer and install new disk images on compute nodes

  • Disk imaging is a complex operation


3 ipmi bus l.jpg
3. IPMI Bus

  • IPMI

    • Intelligent Platform Management Interface (IPMI) specification defines a set of common interfaces to computer hardware and firmware which system administrators can use to monitor system health and manage the system.


Master rack a l.jpg
Master Rack A

  • Linux and Microsoft

  • 2 – PE2950 control nodes

  • 5 – PE1950 web servers

  • Cisco Catalyst 6509

  • 720 supervisor engines

  • 2 * 720 supervisors

  • 4 * 48 port cards (192 ports)


Master rack a cont l.jpg
Master Rack A cont

  • Compute nodes require

    • 40*3 = 120 connections

  • Disk storage 1 – MD1000

  • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/

  • Master rack resilient to mains failure

  • Power supply

    • 6 kVA APC (hard wired 24 Amp PSU)


Master rack a kvm switch l.jpg
Master Rack A KVM Switch

  • Ethernet KVM switch

  • Keyboard, Video display, Mouse - KVM

  • Provides user access to the head nodes

  • Windows head node, named – “Paddy”

  • Linux head node, named - “Max”

  • Movie USCC MVI_6991.AVI


Rack b infiniband l.jpg
Rack B Infiniband

  • InfiniBand is a switched fabric communications link primarily used in high-performance computing.

  • Its features include quality of service and failover and it is designed to be scalable.

  • The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes.


Infiniband rack b l.jpg
Infiniband Rack B

  • 6 – PE2950 each with two HCAs

  • 1 – Cisco 7000P router

  • Host channel adapter (HCA) link

  • http://157.228.27.155/website/CLUSTER-GRID/Cisco-docs1/HCA/

  • Infiniband

  • http://en.wikipedia.org/wiki/InfiniBand


Cisco infiniband l.jpg
Cisco Infiniband

  • Cisco 7000p

  • High speed bus 10Gbits/sec

  • Low latency < 1microsec

  • Infiniband 6 compute nodes

    • 24 cpu cores

  • High speed serial communication


Infiniband l.jpg
Infiniband

  • Many parallel channels

  • PCI Express bus (serial DMA)

  • Direct memory access (DMA)


General compute rack c l.jpg
General compute Rack C

  • 11 – PE2950 computer nodes

  • Product details


Racks l.jpg
Racks

  • A*1 - 2 control (+5 servers) GigE

  • B*1 - 6 Infiniband (overlay)

  • C*3 - 11 (33) GigE

  • N*1 - 1 (Cisco Netlab + VoIP)

  • Total compute nodes

    • 2+6+33+1 = 42


Rack layout l.jpg
Rack Layout

  • - C C B A C N -

  • F C C B A C N F

  • Future expansion – F

  • KVM video - MVI_6994.AVI


Summary dell server 2950 l.jpg
Summary - Dell Server 2950

  • Number of nodes 40 + 1(lin) + 1(win)

  • Number of compute nodes 40

  • Intel Xeon Woodcrest 2.66GHz

  • Two dual core processors

  • GigE NICs – 3 off per server

  • RAM 8G, 2G per core

  • Disks 250G SATA II


Summary cluster speedup l.jpg
Summary - cluster speedup

  • Compare time taken to complete a task

  • Time on cluster = 1 hour

  • Time using a single CPU = 160 hours or

  • 160/24 = 6.6 days approx 1 week

  • Facility available for use by companies

  • “Software City” startup companies


Data storage l.jpg
Data storage

  • Master nodes via PERC5e to MD1000 using 15 x 500G SATA drives

  • Disk storage 7.5T

  • Linux 7 disks

  • MS 2003 Server HPC 8 disks

  • MD1000 URL

  • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs2/


Power l.jpg
Power

  • Total maximum load generated by Dell cluster cabinets

  • Total load = 20,742kW

  • Values determined by using Dells integrated system design tool

    • Power and Noise


Web servers l.jpg
Web servers

  • PE1950

  • Height 1U

  • Five server

  • Web services

  • Domain controller, DNS, DHCP etc

  • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/


Access workstations l.jpg
Access Workstations

  • Dell workstations (10 off)

  • Operating Systems WinXP Pro

  • HD displays LCD (4 off)

    • Size 32 inch wall mounted

  • Graphics NVS285 – 8*2 GPUs

  • Graphics NVS440 – 2*4 GPU

  • Graphics processor units

  • Support for HDTV



Movie uscc l.jpg
Movie USCC

  • MVI_6992.AVI



Cluster software52 l.jpg
Cluster Software

  • Compute Node Operating Systems

  • Scientific Linux (based on Redhat)

  • MS Windows Server 2003

    • High performance computing - HPC


Scali l.jpg
Scali

  • Scali Management

    • software to mange high performance cluster computers

  • Scali is used to control the cluster

    • start and stop processes, upload data/code and

    • schedule tasks

  • Scali datasheet

  • http://www.scali.com/

  • http://157.228.27.155/website/CLUSTER-GRID/Scali/


Other software l.jpg
Other software

  • Apache web services

  • Tomcat, Java server side programming

  • Compilers C++, Java

  • Servers FTPD

  • 3D modelling and animation

    • Blender

    • Autodesk 3DS Max software



Virtual network security experiment example l.jpg
Virtual Network Security Experiment - example

  • Virtual Network VMWare Appliances

  • Components

    (1) NAT router

    (2) WinXP-sp2 attacks FC5 across network

    (3) Network hub - interconnection

    (4) Firewall - protection

    (5) Fedora Core FC5 target system


Network security experiment l.jpg

VMware host

XPProSP2

Eth0

SW2

RedGreen

eth1 eth0

Ethernet 2 Ethernet

NAT Firewall

Forward port 80 from Red to FC5’s IP

FC5

Eth0

Load Apache (httpd) web server

NAT/ (VMnet8)

HUB (VMnet4)

Network Security Experiment


Security experiment l.jpg
Security Experiment

  • A total of 5 virtual networking devices using just one compute box

  • Port scanning attack (Nessus)

  • Intrusion detection (Snort)

  • Tunnelling using SSH and Putty

  • RAM required 500K+ for each network component


Cisco netlab l.jpg
Cisco Netlab

  • Cisco Netlab provides

  • Remote access to network facilities for experimental purposes

  • Netlab is installed the Network cabinet

  • Plus

  • VoIP demonstration system for teaching purposes



Current research network planning l.jpg
Current ResearchNetwork Planning

  • Network Planning Research

  • Network model using OOD

  • Hybrid parallel search algorithm based upon features of

    • Parallel genetic algorithm (GA)

    • Particle swarm optimisation (PSO)

  • Ring of communicating processes


Network planning research l.jpg
Network Planning Research

  • Web services

  • Server side programs - JSP

  • FTPDaemon, URL objects, XML

  • Pan Reif solver

    • based on Newton’s Method

  • Steve Turner PhD student

    • Submit May 2008 – first to use USCC

  • UoS Cluster Computer USCC




Dsl network plan schematic diagram l.jpg
DSL Network Plan Schematic Diagram


Numerical output from ga optimiser pon equipment l.jpg
Numerical output from GA optimiser – PON Equipment



Demonstrations l.jpg
Demonstrations location, time and service types

  • IPTV

  • Java test program


Demonstration 1 iptv l.jpg
Demonstration 1 - IPTV location, time and service types

  • IP television demonstration

  • IP internet protocol

  • Video LAN client – VLC

  • Number of servers and clients – 10

  • Video streams standard definition

    • 4 to 5Mbps

  • Multicasting Class D addressing


Slide70 l.jpg
IPTV location, time and service types

  • IGMP

    • Internet group management protocol

  • Video streams HD 16Mbps

  • HD only uses 1.6% of 1Gbps

  • Rudolph Nureyev dancing

  • Six Five Special 1957

    • Don Lang and the Frantic Five

    • New dance demonstration - Bunny Hop


Demonstration 2 l.jpg
Demonstration 2 location, time and service types

  • Java demonstration test program

  • Compute node processes 40

  • Workstation server 1

  • Communication via UDP

  • Graphical display on local server of data sent from compute nodes

  • Network configuration – star


Star network l.jpg
Star network location, time and service types


Cluster configuration file l.jpg
Cluster configuration file location, time and service types

  • Description of File ipadd.txt

  • 1 Node id

  • 192.168.1.50 Hub server address

  • 192.168.1.5 Previous Compute Node

  • 192.168.1.7 Next Compute Node

  • 192.168.1.51 Hub2 spare

  • Equation

  • double val = 100 * ( 0.5 + Math.exp(-t/tau) * 0.5 * Math.sin(theta)) ;


Screenshot of hub server bar graph display l.jpg
Screenshot of hub server bar graph display location, time and service types


Uscc configuration l.jpg
USCC configuration location, time and service types

  • Single demo in a compute node

    • Dirs 1+4 = 5 (top level + one per core)

  • All compute node

    • 40*5 = 200

  • Workstations 10

    • 20*200 = 2000

  • Ten demos

    • 10*2000 = 20,000 directories to set up


Java program to configure cluster l.jpg
Java program location, time and service typesto configure cluster


Uos cluster computer inaugural event l.jpg
UoS Cluster Computer Inaugural Event location, time and service types


Uos cluster computer inaugural event78 l.jpg
UoS Cluster Computer Inaugural Event location, time and service types

  • Date: Thursday 24 April 2008

  • Time: 5.30pm

  • Venue: St Peter’s Campus

  • Three speakers (each 20minutes)

    • John MacIntyre - UoS

    • Robert Starmer - Cisco San Jose

    • TBA - Dell Computers


Uscc inaugural event l.jpg
USCC Inaugural Event location, time and service types

  • Attendance is free

  • Anyone wishing to attend is asked to register beforehand to facilitate catering

  • Contact via email

    [email protected]


The end l.jpg
The End location, time and service types

  • Thank you for your attention

  • Any questions

  • Slides and further information

  • available at URL

  • http://157.228.27.155/website/CLUSTER-GRID/


ad