1 / 80

The University of Sunderland Cluster Computer

The University of Sunderland Cluster Computer. IET Lecture by John Tindle Northumbria Network, ICT Group Monday 11 February 2008. Overview of talk. SRIF3 and Potential Vendors General Requirements Areas of Application Development Team Cluster Design Cluster System Hardware + Software

braeden
Download Presentation

The University of Sunderland Cluster Computer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The University of Sunderland Cluster Computer IET Lecture by John Tindle Northumbria Network, ICT Group Monday 11 February 2008

  2. Overview of talk • SRIF3 and Potential Vendors • General Requirements • Areas of Application • Development Team • Cluster Design • Cluster System Hardware + Software • Demonstrations

  3. United Kingdom – Science Research Investment Fund (SRIF) • The Science Research Investment Fund (SRIF) is a joint initiative by the Office of Science and Technology (OST) and the Department for Education and Skills (DfES). The purpose of SRIF is to contribute to higher education institutions' (HEIs) long-term sustainable research strategies and address past under-investment in research infrastructure.

  4. SRIF3 • SRIF3 - 90% and UoS - 10% • Project duration about two years • Made operational by late December 2007 • Heriot Watt University - coordinator

  5. Potential Grid Computer Vendors • Dell – selected vendor • CompuSys – SE England • Streamline - midlands • Fujitsu - Manchester • ClusterVision - Dutch • OCF - Sheffield

  6. General requirements

  7. General requirements • High performance general purpose computer • Built using standard components • Commodity off the shelf (COTS) • Low cost PC technology • Reuse existing skills - Ethernet • Easy to maintain - hopefully

  8. Designed for Networking Experiments • Require flexible networking infrastructure • Modifiable under program control • Managed switch required • Unmanaged switch often employed in standard cluster systems • Fully connected programmable intranet

  9. System Supports • Rate limiting • Quality of service (QoS) • Multiprotocol Label Switching (MPLS) • VLANs and VPNs • IPv4 and IPv6 supported in hardware • Programmable queue structures

  10. Special requirements 1 • Operation at normal room temperature • Typical existing systems require • a low air inlet temperature < 5 Degrees C • a dedicated server room with airconditioning • Low acoustic noise output • Dual boot capability • Windows or Linux in any proportion

  11. Special requirements 2 continued • Concurrent processing, for example • Boxes 75% cores for Windows • Boxes 25% cores for Linux • CPU power control – 4 levels • High resolution displays for media and data visualisation

  12. Advantages of design • Heat generated is not vented to the outside atmosphere • Airconditioning running cost are not incurred • Heat is used to heat the building • Compute nodes (height 2U) use relatively large diameter low noise fans

  13. Areas of application

  14. Areas of application 1. Media systems – 3D rendering 2. Networking experiments MSc Network Systems – large cohort 3. Engineering computing 4. Numerical optimisation 5. Video streaming 6. IP Television

  15. Application cont 1 7. Parallel distributed computing 8. Distributed databases 9. Remote teaching experiments 10. Semantic web 11. Search large image databases 12. Search engine development 13. Web based data analysis

  16. Application cont 2 14. Computational fluid dynamics 15. Large scale data visualisation using high resolution colour computer graphics

  17. UoS Cluster Development Team • From left to right • Kevin Ginty • Simon Stobart • John Tindle • Phil Irving • Matt Hinds • Note - all wearing Dell tee shirts

  18. UoS Team

  19. UoS Cluster

  20. Work Area At last all up and running!

  21. UoS Estates Department • Very good project work was completed by the UoS Estates Department • Electrical network design • Building air flow analysis • Computing Terraces • Heat dissipation • Finite element (FE) study and analysis • Work area refurbishment

  22. Cluster Hardware

  23. Cluster Hardware • The system has been built using • Dell compute nodes • Cisco networking components • Grid design contributions from both Dell and Cisco

  24. Basic Building Block • Compute nodes • Dell PE2950 server • Height 2U • Two dual core processors • Four cores per box • Ram 8G , 2G per core • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/

  25. Computer Nodes • Network interface cards 3 off • Local disk drives 250G SATA II • The large amount of RAM facilitates virtual computing experiments • VMWare server and MS VirtualPC

  26. Cisco 6509 switch • Cisco 6509 URL (1off) • Cisco 720 supervisor engines (2off) • Central network switch for the cluster • RSM router switch module • Provides

  27. 6509 Provides • 720Mbps full duplex, (4off port cards) • Virtual LANs - VLAN • Virtual private networks - VPN • Link bandwidth throttling • Traffic prioritisation, QoS • Network experimentation

  28. Cluster Intranet • The network has three buses • Data • IPC • IPMI

  29. 1. Data bus • User data bus • A normal data bus required for interprocessor communication between user applications

  30. 2. IPC Bus • Inter process communication (IPC) • “The Microsoft Windows operating system provides mechanisms for facilitating communications and data sharing between applications. • Collectively, the activities enabled by these mechanisms are called interprocess communications (IPC). Some forms of IPC facilitate the division of labor among several specialized processes”.

  31. IPC Bus continued • “Other forms of IPC facilitate the division of labor among computers on a network”. • Ref Microsoft Website • IPC is controlled by the OS • For example IPC is • Used to transfer and install new disk images on compute nodes • Disk imaging is a complex operation

  32. 3. IPMI Bus • IPMI • Intelligent Platform Management Interface (IPMI) specification defines a set of common interfaces to computer hardware and firmware which system administrators can use to monitor system health and manage the system.

  33. Master Rack A • Linux and Microsoft • 2 – PE2950 control nodes • 5 – PE1950 web servers • Cisco Catalyst 6509 • 720 supervisor engines • 2 * 720 supervisors • 4 * 48 port cards (192 ports)

  34. Master Rack A cont • Compute nodes require • 40*3 = 120 connections • Disk storage 1 – MD1000 • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/ • Master rack resilient to mains failure • Power supply • 6 kVA APC (hard wired 24 Amp PSU)

  35. Master Rack A KVM Switch • Ethernet KVM switch • Keyboard, Video display, Mouse - KVM • Provides user access to the head nodes • Windows head node, named – “Paddy” • Linux head node, named - “Max” • Movie USCC MVI_6991.AVI

  36. Rack B Infiniband • InfiniBand is a switched fabric communications link primarily used in high-performance computing. • Its features include quality of service and failover and it is designed to be scalable. • The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes.

  37. Infiniband Rack B • 6 – PE2950 each with two HCAs • 1 – Cisco 7000P router • Host channel adapter (HCA) link • http://157.228.27.155/website/CLUSTER-GRID/Cisco-docs1/HCA/ • Infiniband • http://en.wikipedia.org/wiki/InfiniBand

  38. Cisco Infiniband • Cisco 7000p • High speed bus 10Gbits/sec • Low latency < 1microsec • Infiniband 6 compute nodes • 24 cpu cores • High speed serial communication

  39. Infiniband • Many parallel channels • PCI Express bus (serial DMA) • Direct memory access (DMA)

  40. General compute Rack C • 11 – PE2950 computer nodes • Product details

  41. Racks • A*1 - 2 control (+5 servers) GigE • B*1 - 6 Infiniband (overlay) • C*3 - 11 (33) GigE • N*1 - 1 (Cisco Netlab + VoIP) • Total compute nodes • 2+6+33+1 = 42

  42. Rack Layout • - C C B A C N - • F C C B A C N F • Future expansion – F • KVM video - MVI_6994.AVI

  43. Summary - Dell Server 2950 • Number of nodes 40 + 1(lin) + 1(win) • Number of compute nodes 40 • Intel Xeon Woodcrest 2.66GHz • Two dual core processors • GigE NICs – 3 off per server • RAM 8G, 2G per core • Disks 250G SATA II

  44. Summary - cluster speedup • Compare time taken to complete a task • Time on cluster = 1 hour • Time using a single CPU = 160 hours or • 160/24 = 6.6 days approx 1 week • Facility available for use by companies • “Software City” startup companies

  45. Data storage • Master nodes via PERC5e to MD1000 using 15 x 500G SATA drives • Disk storage 7.5T • Linux 7 disks • MS 2003 Server HPC 8 disks • MD1000 URL • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs2/

  46. Power • Total maximum load generated by Dell cluster cabinets • Total load = 20,742kW • Values determined by using Dells integrated system design tool • Power and Noise

  47. Web servers • PE1950 • Height 1U • Five server • Web services • Domain controller, DNS, DHCP etc • http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/

  48. Access Workstations • Dell workstations (10 off) • Operating Systems WinXP Pro • HD displays LCD (4 off) • Size 32 inch wall mounted • Graphics NVS285 – 8*2 GPUs • Graphics NVS440 – 2*4 GPU • Graphics processor units • Support for HDTV

  49. Block Diagram

  50. Movie USCC • MVI_6992.AVI

More Related