High performance cluster computing architectures and systems
Download
1 / 44

High Performance Cluster Computing Architectures and Systems - PowerPoint PPT Presentation


  • 829 Views
  • Updated On :

High Performance Cluster Computing Architectures and Systems. Book Editor: Rajkumar Buyya Slides Prepared by: Hai Jin. Internet and Cluster Computing Center. Introduction. Need more computing power Improve the operating speed of processors & other components

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'High Performance Cluster Computing Architectures and Systems' - Jeffrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
High performance cluster computing architectures and systems l.jpg

High Performance Cluster ComputingArchitectures and Systems

Book Editor: Rajkumar Buyya

Slides Prepared by: Hai Jin

Internet and Cluster Computing Center


Introduction l.jpg
Introduction

  • Need more computing power

    • Improve the operating speed of processors & other components

      • constrained by the speed of light, thermodynamic laws, & the high financial costs for processor fabrication

    • Connect multiple processors together & coordinate their computational efforts

      • parallel computers

      • allow the sharing of a computational task among multiple processors


Era of computing l.jpg
Era of Computing

  • Rapid technical advances

    • the recent advances in VLSI technology

    • software technology

      • OS, PL, development methodologies, & tools

    • grand challenge applications have become the main driving force

  • Parallel computing

    • one of the best ways to overcome the speed bottleneck of a single processor

    • good price/performance ratio of a small cluster-based parallel computer


Need of more computing power grand challenge applications l.jpg

Geographic

Information

Systems

Need of more Computing Power:Grand Challenge Applications

Solving technology problems using computer modeling, simulation and analysis

Aerospace

Life Sciences

CAD/CAM

Digital Biology

Military Applications


Parallel computer architectures l.jpg
Parallel Computer Architectures

  • Taxonomy

    • based on how processors, memory & interconnect are laid out

  • Massively Parallel Processors (MPP)

  • Symmetric Multiprocessors (SMP)

  • Cache-Coherent Nonuniform Memory Access (CC-NUMA)

  • Distributed Systems

  • Clusters

  • Grids


Parallel computer architectures6 l.jpg
Parallel Computer Architectures

  • MPP

    • A large parallel processing system with a shared-nothing architecture

    • Consist of several hundred nodes with a high-speed interconnection network/switch

    • Each node consists of a main memory & one or more processors

      • Runs a separate copy of the OS

  • SMP

    • 2-64 processors today

    • Shared-everything architecture

    • All processors share all the global resources available

    • Single copy of the OS runs on these systems


Parallel computer architectures7 l.jpg
Parallel Computer Architectures

  • CC-NUMA

    • a scalable multiprocessor system having a cache-coherent nonuniform memory access architecture

    • every processor has a global view of all of the memory

  • Distributed systems

    • considered conventional networks of independent computers

    • have multiple system images as each node runs its own OS

    • the individual machines could be combinations of MPPs, SMPs, clusters, & individual computers

  • Clusters

    • a collection of workstations of PCs that are interconnected by a high-speed network

    • work as an integrated collection of resources

    • have a single system image spanning all its nodes


Towards low cost parallel computing l.jpg
Towards Low Cost Parallel Computing

  • Parallel processing

    • linking together 2 or more computers to jointly solve some computational problem

    • since the early 1990s, an increasing trend to move away from expensive and specialized proprietary parallel supercomputers towards networks of workstations

    • the rapid improvement in the availability of commodity high performance components for workstations and networks

      Low-cost commodity supercomputing

    • from specialized traditional supercomputing platforms to cheaper, general purpose systems consisting of loosely coupled components built up from single or multiprocessor PCs or workstations

    • need to standardization of many of the tools and utilities used by parallel applications (ex) MPI, HPF


Windows of opportunities l.jpg
Windows of Opportunities

  • Parallel Processing

    • Use multiple processors to build MPP/DSM-like systems for parallel computing

  • Network RAM

    • Use memory associated with each workstation as aggregate DRAM cache

  • Software RAID

    • Redundant array of inexpensive disks

    • Use the arrays of workstation disks to provide cheap, highly available, & scalable file storage

    • Possible to provide parallel I/O support to applications

    • Use arrays of workstation disks to provide cheap, highly available, and scalable file storage

  • Multipath Communication

    • Use multiple networks for parallel data transfer between nodes


Cluster computer and its architecture l.jpg
Cluster Computer and its Architecture

  • A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource

  • A node

    • a single or multiprocessor system with memory, I/O facilities, & OS

    • generally 2 or more computers (nodes) connected together

    • in a single cabinet, or physically separated & connected via a LAN

    • appear as a single system to users and applications

    • provide a cost-effective way to gain features and benefits



Prominent components of cluster computers i l.jpg
Prominent Components of Cluster Computers (I)

  • Multiple High Performance Computers

    • PCs

    • Workstations

    • SMPs (CLUMPS)

    • Distributed HPC Systems leading to Metacomputing


Prominent components of cluster computers iii l.jpg
Prominent Components of Cluster Computers (III)

  • High Performance Networks/Switches

    • Ethernet (10Mbps),

    • Fast Ethernet (100Mbps),

    • Gigabit Ethernet (1Gbps)

    • SCI (Dolphin - MPI- 12micro-sec latency)

    • ATM

    • Myrinet (1.2Gbps)

    • Digital Memory Channel

    • FDDI


Prominent components of cluster computers v l.jpg
Prominent Components of Cluster Computers (V)

  • Fast Communication Protocols and Services

    • Active Messages (Berkeley)

    • Fast Messages (Illinois)

    • U-net (Cornell)

    • XTP (Virginia)


Prominent components of cluster computers vi l.jpg
Prominent Components of Cluster Computers (VI)

  • Cluster Middleware

    • Single System Image (SSI)

    • System Availability (SA) Infrastructure

  • Hardware

    • DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques

  • Operating System Kernel/Gluing Layers

    • Solaris MC, Unixware, GLUnix

  • Applications and Subsystems

    • Applications (system management and electronic forms)

    • Runtime systems (software DSM, PFS etc.)

    • Resource management and scheduling software (RMS)

      • CODINE, LSF, PBS, NQS, etc.


Prominent components of cluster computers vii l.jpg
Prominent Components of Cluster Computers (VII)

  • Parallel Programming Environments and Tools

    • Threads (PCs, SMPs, NOW..)

      • POSIX Threads

      • Java Threads

    • MPI

      • Linux, NT, on many Supercomputers

    • PVM

    • Software DSMs (TreadMark)

    • Compilers

      • C/C++/Java

      • Parallel programming with C++ (MIT Press book)

    • RAD (rapid application development tools)

      • GUI based tools for PP modeling

    • Debuggers

    • Performance Analysis Tools

    • Visualization Tools


Prominent components of cluster computers viii l.jpg
Prominent Components of Cluster Computers (VIII)

  • Applications

    • Sequential

    • Parallel / Distributed (Cluster-aware app.)

      • Grand Challenge applications

        • Weather Forecasting

        • Quantum Chemistry

        • Molecular Biology Modeling

        • Engineering Analysis (CAD/CAM)

        • ……………….

      • PDBs, web servers,data-mining


Key operational benefits of clustering l.jpg
Key Operational Benefits of Clustering

  • High Performance

  • Expandability and Scalability

  • High Throughput

  • High Availability


Clusters classification iii l.jpg
Clusters Classification (III)

  • Node Hardware

    • Clusters of PCs (CoPs)

      • Piles of PCs (PoPs)

    • Clusters of Workstations (COWs)

    • Clusters of SMPs (CLUMPs)


Clusters classification v l.jpg
Clusters Classification (V)

  • Node Configuration

    • Homogeneous Clusters

      • All nodes will have similar architectures and run the same OSs

    • Heterogeneous Clusters

      • All nodes will have different architectures and run different OSs


Clusters classification vi l.jpg
Clusters Classification (VI)

  • Levels of Clustering

    • Group Clusters (#nodes: 2-99)

      • Nodes are connected by SAN like Myrinet

    • Departmental Clusters (#nodes: 10s to 100s)

    • Organizational Clusters (#nodes: many 100s)

    • National Metacomputers (WAN/Internet-based)

    • International Metacomputers (Internet-based, #nodes: 1000s to many millions)

      • Metacomputing

      • Web-based Computing

      • Agent Based Computing

        • Java plays a major in web and agent based computing


Commodity components for clusters iii l.jpg
Commodity Components for Clusters (III)

  • Disk and I/O

    • Overall improvement in disk access time has been less than 10% per year

    • Amdahl’s law

      • Speed-up obtained from faster processors is limited by the slowest system component

    • Parallel I/O

      • Carry out I/O operations in parallel, supported by parallel file system based on hardware or software RAID


Cluster middleware ssi l.jpg
Cluster Middleware & SSI

  • SSI

    • Supported by a middleware layer that resides between the OS and user-level environment

    • Middleware consists of essentially 2 sublayers of SW infrastructure

      • SSI infrastructure

        • Glue together OSs on all nodes to offer unified access to system resources

      • System availability infrastructure

        • Enable cluster services such as checkpointing, automatic failover, recovery from failure, & fault-tolerant support among all nodes of the cluster


What is single system image ssi l.jpg
What is Single System Image (SSI) ?

  • A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource.

  • SSI makes the cluster appear like a single machine to the user, to applications, and to the network.

  • A cluster without a SSI is not a cluster


Ssi boundaries an applications ssi boundary l.jpg

SSI

Boundary

SSI Boundaries -- an applications SSI boundary

Batch System

(c) In search

of clusters


Single system image benefits l.jpg
Single System Image Benefits

  • Provide a simple, straightforward view of all system resources and activities, from any node of the cluster

  • Free the end user from having to know where an application will run

  • Free the operator from having to know where a resource is located

  • Let the user work with familiar interface and commands and allows the administrators to manage the entire clusters as a single entity

  • Reduce the risk of operator errors, with the result that end users see improved reliability and higher availability of the system


Single system image benefits cont d l.jpg
Single System Image Benefits (Cont’d)

  • Allowing centralize/decentralize system management and control to avoid the need of skilled administrators from system administration

  • Present multiple, cooperating components of an application to the administrator as a single application

  • Greatly simplify system management

  • Provide location-independent message communication

  • Help track the locations of all resource so that there is no longer any need for system operators to be concerned with their physical location

  • Provide transparent process migration and load balancing across nodes.

  • Improved system response time and performance


Resource management and scheduling rms l.jpg
Resource Management and Scheduling (RMS)

  • RMS is the act of distributing applications among computers to maximize their throughput

  • Enable the effective and efficient utilization of the resources available

  • Software components

    • Resource manager

      • Locating and allocating computational resource, authentication, process creation and migration

    • Resource scheduler

      • Queueing applications, resource location and assignment

  • Reasons using RMS

    • Provide an increased, and reliable, throughput of user applications on the systems

    • Load balancing

    • Utilizing spare CPU cycles

    • Providing fault tolerant systems

    • Manage access to powerful system, etc

  • Basic architecture of RMS: client-server system


Services provided by rms l.jpg
Services provided by RMS

  • Process Migration

    • Computational resource has become too heavily loaded

    • Fault tolerant concern

  • Checkpointing

  • Scavenging Idle Cycles

    • 70% to 90% of the time most workstations are idle

  • Fault Tolerance

  • Minimization of Impact on Users

  • Load Balancing

  • Multiple Application Queues


Computing platforms evolution breaking administrative barriers l.jpg

2100

2100

2100

2100

2100

2100

2100

2100

2100

Computing Platforms EvolutionBreaking Administrative Barriers

?

PERFORMANCE

Administrative Barriers

Individual

Group

Department

Campus

State

National

Globe

Inter Planet

Universe

Desktop

(Single Processor)

SMPs or SuperComputers

Local

Cluster

Enterprise

Cluster/Grid

Inter Planet

Cluster/Grid ??

Global

Cluster/Grid


Why do we need metacomputing l.jpg
Why Do We Need Metacomputing?

  • Our computational needs are infinite, whereas our financial resources are finite

    • users will always want more & more powerful computers

    • try & utilize the potentially hundreds of thousands of computers that are interconnected in some unified way

    • need seamless access to remote resources



What is grid l.jpg
What is Grid ?

  • An infrastructure that couples

    • Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops, notebooks, mobile computers, PDA, and so on) …

    • Software (e.g., renting expensive special purpose applications on demand)

    • Databases (e.g., transparent access to human genome database)

    • Special Instruments (e.g., radio telescope--SETI@Home Searching for Life in galaxy, Austrophysics@Swinburne for pulsars)

    • People (may be even animals who knows ?)

  • Across the Internet and presents them as an unified integrated (single) resource

http://www.csse.monash.edu.au/~rajkumar/ecogrid/


Conceptual view of the grid l.jpg
Conceptual view of the Grid

Leading to Portal (Super)Computing


Grid application drivers l.jpg
Grid Application-Drivers

  • Old and new applications getting enabled due to coupling of computers, databases, instruments, people, etc.

    • (distributed) Supercomputing

    • Collaborative engineering

    • High-throughput computing

      • large scale simulation & parameter studies

    • Remote software access / Renting Software

    • Data-intensive computing

    • On-demand computing


The grid impact l.jpg
The Grid Impact

“The global computational grid is expected to drive the economy of the 21st century similar to the electric power grid that drove the economy of the 20th century”


Metacomputer design objectives and issues ii l.jpg
MetacomputerDesign Objectives and Issues (II)

  • Underlying Hardware and Software Infrastructure

    • A metacomputing environment must be able to operate on top of the whole spectrum of current and emerging HW & SW technology

    • An ideal environment will provide access to the available resources in a seamless manner such that physical discontinuities such as difference between platforms, network protocols, and administrative boundaries become completely transparent


Metacomputer design objectives and issues iii l.jpg
MetacomputerDesign Objectives and Issues (III)

  • Middleware – The Metacomputing Environment

    • Communication services

      • needs to support protocols that are used for bulk-data transport, streaming data, group communications, and those used by distributed objects

    • Directory/registration services

      • provide the mechanism for registering and obtaining information about the metacomputer structure, resources, services, and status

    • Processes, threads, and concurrency control

      • share data and maintain consistency when multiple processes or threads have concurrent access to it


Metacomputer design objectives and issues v l.jpg
MetacomputerDesign Objectives and Issues (V)

  • Middleware – The Metacomputing Environment

    • Security and authorization

      • confidentiality: prevent disclosure of data

      • integrity: prevent tampering with data

      • authorization: verify identity

      • accountability: knowing whom to blame

    • System status and fault tolerance

    • Resource management and scheduling

      • efficiently and effectively schedule the applications that need to utilize the available resource in the metacomputing environment


Metacomputer design objectives and issues vi l.jpg
MetacomputerDesign Objectives and Issues (VI)

  • Middleware – The Metacomputing Environment

    • Programming tools and paradigms

      • include interface, APIs, and conversion tools so as to provide a rich development environment

      • support a range of programming paradigms

      • a suite of numerical and other commonly used libraries should be available

    • User and administrative GUI

      • intuitive and easy to use interface to the services and resources available

    • Availability

      • easily port on to a range of commonly used platforms, or use technologies that enable it to be platform neutral


Metacomputing projects l.jpg
Metacomputing Projects

  • Globus (from Argonne National Laboratory)

    • provides a toolkit on a set of existing components to build metacomputing environments

  • Legion (from the University of Virginia)

    • provides a high-level unified object model out of new and existing components to build a metasystem

  • Webflow (from Syracuse University)

    • provides a Web-based metacomputing environment


Globus i l.jpg
Globus (I)

  • A computational grid

    • A hardware and software infrastructure to provide dependable, consistent, and pervasive access to high-end computational capabilities, despite the geographical distribution of both resources and users

  • A layered architecture

    • high-level global services are built upon essential low-level core local services

  • Globus Toolkit (GT)

    • a central element of the Globus system

    • defines the basic services and capabilities required to construct a computational grid

    • consists of a set of components that implement basic services

    • provides a bag of services

    • only possible when the services are distinct and have well-defined interfaces (API)


Globus ii l.jpg
Globus (II)

  • Globus Alliance

  • http://www.globus.org

  • GT 3.0

    • Resources management (GRAM)

    • Information Service (MDS)

    • Data Management (GridFTP)

    • Security (GSI)

  • GT 4.0 (2005)

    • Execution management

    • Information Services

    • Data management

    • Security

    • Common runtime (WS)


The impact of metacomputing l.jpg
The Impact of Metacomputing

  • Metacomputing is an infrastructure that can bond and unify globally remote and diverse resources

  • At some stage in the future, our computing needs will be satisfied in same pervasive and ubiquitous manner that we use the electricity power grid


ad