Building a low cost supercomputer
1 / 75

Building a Low-Cost Supercomputer - PowerPoint PPT Presentation

  • Uploaded on

NO. Building a Low-Cost Supercomputer. Dr. Tim McGuire Sam Houston State University ACET 2000 Austin, TX. Acknowledgments. Most treatments of cluster computing (including this one) are heavily based on the seminal work of Greg Pfister (IBM Research, Austin,) In Search of Clusters

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Building a Low-Cost Supercomputer' - zoey

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Building a low cost supercomputer


Building a Low-Cost Supercomputer

Dr. Tim McGuire

Sam Houston State University

ACET 2000

Austin, TX


  • Most treatments of cluster computing (including this one) are heavily based on the seminal work of Greg Pfister (IBM Research, Austin,) In Search of Clusters

  • The concept of Beowulf clusters originated with Donald J. Becker and Thomas Sterling at the Center of Excellence in Space Data and Information Sciences, NASA Goddard Space Flight Center


  • There are three ways to do anything faster:

  • Work harder

    • "Crunch Time" is familiar to all of us

  • Work smarter

    • Better to find a way to reduce the work needed

  • Get help

    • Certainly works, but we all know about committees ...

In a computer
In a computer ...

  • Working Harder

    Get a faster processor

  • Working Smarter

    Use a better algorithm

  • Getting Help

    Parallel processing

Working harder faster processors
Working Harder -- Faster Processors

  • The effect of faster processors is astonishing

    • The effective speed of the x86 family of processors has increased nearly 50% per year

    • RISC architectures have sustained a 60% annual cumulative growth rate

  • These trends will likely continue for the foreseeable future

Working smarter better algorithms
Working Smarter -- Better Algorithms

  • The increases in speed made possible by better algorithms dwarf the accomplishments of faster hardware

  • Binary search on 1 billion items takes 30 comparisons, versus a maximum of one billion comparisons using linear search

Getting help parallel processing
Getting Help -- Parallel Processing

  • Covert parallel processing

    pipelining, vector processing, etc.

    really equivalent to faster hardware

  • Overt parallelism

    Done via software

  • "Parallelism is the wave of the future -- and always will be"

Early attempts at parallelism
Early Attempts at Parallelism

  • Von Neumann thought it was too hard, and gave us the "Von Neumann bottleneck"

  • 60's ILLIAC IV project was the first great attempt at parallel processing (as well as trying to advance circuit and software technology.)

  • Japanese Fifth Generation Project launched another wave, including the Grand Challenge problems

Microprocessor revolution
Microprocessor Revolution

  • Microprocessors have had a superior price/performance ratio

  • "All you have to do is gang a whole bunch of them together"

  • The problem is "All you also have to do is program them to work together"

  • Programming costs much more than hardware

Highly parallel computing
Highly Parallel Computing

  • Finally, (early 90's) microprocessors became fast and powerful enough that a practical-sized aggregation of them seemed the only feasible way to exceed supercomputer speeds

  • Even Cray Research (T3D) got into the act

Lowly parallel processing
"Lowly" Parallel Processing

  • Mid-to-late 90's -- military downsizing (among other things) caused funding to dry up

  • However …

    • Microprocessors kept getting faster … a lot faster

      • With overall performance doubling each year, in 4 years what needed 256 processors can be done with 16 instead.

    • System availability became a mass market issue

      • Since computers are so cheap, buy two (or more) for redundancy in case one fails and use them both, interconnected by a network

Smp one form of cheap parallelism
SMP -- One Form of "Cheap" Parallelism

  • Symmetric multiprocessors have been around for some time and have certain advantages over clusters

    • Typically, these have been shared memory systems -- few communication problems

The big distinction programming
The Big Distinction -- Programming

  • How you program SMP systems is substantially different from programming clusters: Their programming models are different

  • If you explicitly exploit SMP in an application, it's essentially impossible to efficiently exploit clusters in the same program

Why clusters
Why Clusters?

  • The Standard Litany

  • Why Now ?

  • Why Not Now?

The standard litany
The Standard Litany

  • Performance

  • Availability

  • Price/Performance Ratio

  • Incremental Growth

  • Scaling

  • Scavenging


  • No matter what form or measure of performance one is seeking -- throughput, response time, turnaround time, etc., it is straightforward to claim that one can get even more of it by using a bunch of machines at the same time.

  • Only occasionally does one hear the admission that a "tad bid" of new programming will be needed for anything to work correctly.


  • Having a computer shrivel up into an expensive paperweight can be a lot less traumatic if it's not unique, but rather one of a herd.

  • The work done by the dear departed sibling can be redistributed among the others (fail-soft computing)

Price performance ratio
Price/Performance Ratio

  • Clusters and other forms of computer aggregation are typically collections of machines that individually have very good performance for their price.

  • The promise is that the aggregate retains the price/performance of its individual members.

Incremental growth
Incremental Growth

  • To the degree that one really does attain greater performance and availability with a group of computers, one should be able to enhance both by merely adding more machines.

  • Replacing machines should not be necessary.


  • "Scalable" is, unfortunately, a buzzword

  • What it does deal with is how big a computer system can usably get.

  • It is a crucial element in the differentiation between clusters and symmetric multiprocessors.


  • "Look at all those unused CPU cycles spread across all the desktops in our network…"

  • Unused cycles are free.

  • However, how do you get and manage them? -- this complicates cluster support very significantly

The benefits are real
The Benefits are Real

  • But, how does one take advantage of it?

  • The hardware provides the potential.

  • The fulfillment lies in the software, and unfortunately, software isn't riding the exponential growth curve.

Why now
Why Now?

  • Three Trends

    • Fat Boxes -- very high performance microprocessors

    • Fat Pipes -- standard high-speed communication

    • Thick Glue -- standard tools for distributed computing

  • One Market Requirement

    • High Availability

Fat boxes
Fat Boxes

  • Microprocessors have kept, and will keep getting faster.

  • Supercomputers in the classic style are extinct for practical purposes

  • Mass-market, inexpensive microprocessors have crawled up the tailpipe of the workstation market just like workstations crawled up the tailpipe of minicomputers and mainframes earlier.

  • There are no more supercomputers, there is only supercomputing.

Fat pipes
Fat Pipes

  • Commodity off the shelf (COTS) networking parts have achieved communication performance that was only previously possible with expensive, proprietary techniques

  • Standardized communication facilities such as

    • ATM - Asynchronous Transmission Mode

    • Switched Gigabit Ethernet

    • FCS -- Fibre Channel Standard

  • Performance of Gigabytes per second are possible.

Thick glue
Thick Glue

  • Standard tools for distributed computing such as TCP/IP

  • Intranets, the Internet, and the World Wide Web

  • Tool sets for distributed system administration

  • PVM (Parallel Virtual Machine) and MPI (Message Passing Interface)

Requirement for high availability
Requirement for High Availability

  • Nobody has ever wanted computers to break.

  • However, never before has high availability become a significant issue in a mass market computer arena.

  • Clusters are uniquely capable of answering the need of both sides of the spectrum and are much cheaper than hardware based fault-tolerant approaches.

Why not now
Why Not Now?

  • If they're so good, why haven't clusters become the most common mode of computation?

    • Lack of "single system image" software

    • Limited exploitation

Lack of single system image software
Lack of Single System Image Software

  • Replacing a single large computer with a cluster means that many systems will have to be managed rather than one.

  • Their distributed management tools are tools, not turnkey systems

  • 50% of the cost of a computer system is staffing, rather than hardware, software, or maintenance

Limited exploitation
Limited Exploitation

  • Only relatively few types of subsystems now exploit the ability of clusters to provide both scalable performance and high availability.

  • This is a direct result of substantial difficulties that arise in parallel programming.

  • The problem is not hardware, it's software

An exception
An Exception

  • For one kind of parallel system, the software issues have been addressed to a large degree: The symmetric multiprocessor (SMP)

  • It of necessity requires a single system image

Definitions distinctions and comparisons
Definitions, Distinctions, and Comparisons

  • Definition

  • Distinction from Parallel Systems

  • Distinctions from Distributed Systems

  • Comparisons and Contrasts


  • A cluster is a type of parallel or distributed system that:

    • consists of a collection of interconnected stand-alone computers, and

    • is used as a single, unified computing resource

  • We define them as a subparadigm of distributed (or parallel) systems

Distinction from parallel systems
Distinction from Parallel Systems

  • A useful analogy:

    • This is A Dog

    • (a single computer)

A pack of dogs
A Pack of Dogs

  • And this is a pack of dogs (running in parallel)

  • (a cluster)

A savage multiheaded pooch
A Savage Multiheaded Pooch

  • … or, pardon the abbreviation, "SMP"

  • (This pooch is no relation to Kerberos (Cerberus in Latin) that guards both the gates of Hades and distributed systems -- He only has three heads.)

Dog packs and smps are similar
Dog Packs and SMPs are Similar

  • Both are more potent than just plain dogs

  • They can both bring down larger prey than a plain single dog.

  • They eat more and eat faster than a single dog

Dog packs and smps are different
Dog Packs and SMPs are Different

  • Scaling

  • Availability

  • System Management

  • Software Licensing

Scaling differences
Scaling Differences

  • The Savage Multiheaded Pooch can take many bites at once

  • What happens when it tries to swallow?

    • It needs a larger throat, stomach, intestines, etc.

    • Similarly, to scale SMPs, you must beef up the entire machine

  • When you add another dog to a dog pack, you add a whole dog. You don't have to do anything to the other dogs.

    • Likewise, clusters


  • If an SMP breaks a leg …

    "that dog won't hunt" … no matter how many heads it has.

  • If a member of the pack is injured, the rest of the pack can still bring down prey.

System management
System Management

  • You only have to walk a SMP once.

  • It takes a good deal more effort to train a pack of dogs to behave.

  • With the SMP, all you have to do is get the heads to learn basic cooperation (and that should be built into the operating system.)

Licensing dogs or software
Licensing (Dogs or Software)

  • If you get a license for an SMP, you'll probably only need one license

  • For an cluster of dogs, you'll need one per dog

Distinctions from distributed systems
Distinctions from Distributed Systems

  • The distinctions of clusters from distributed systems is not as clear (and a lot of people confuse the two.)

  • We'll try. The salient points are:

    • Internal Anonymity

    • Peer Relationship

    • Clusters as part of a Distributed System

Internal anonymity
Internal Anonymity

  • Nodes in a distributed system necessarily retain their own individual identities

  • The elements of a cluster are usually viewed from outside the cluster as anonymous

    • Internally, they may be differentiated, but externally the jobs are submitted to the cluster, not, for example, to cluster node #4

Peer relationship
Peer Relationship

  • Distributed systems

    • use an underlying communication layer that is peer-to-peer

    • at a higher level, they are often organized into a client-server paradigm

  • Clusters

    • underlying communication is peer-to peer

    • organization is also peer-to-peer (with some minor exceptions)

Clusters as part of a distributed system
Clusters as part of a Distributed System

  • Clusters usually exist in the context of a distributed system

  • In this case, they are viewed by the distributed system as a single node

    • For example, the cluster could server as a compute engine

    • It also could serve as, say, a DBMS server in the client-server paradigm (but that's not the organization we want to consider in this presentation)

Beowulf clusters
Beowulf Clusters

  • The Beowulf project was initiated in 1994 under the sponsorship of the NASA HPCC program to explore how computing could be made "cheaper better faster".

  • They termed this PoPC -- a Pile of PCs

The pile of pcs approach
The "Pile of PCs" Approach

  • Very similar to COW (cluster of workstations) and shares the roots of NOW (network of workstations,) but emphasizes:

    • COTS (commodity off the shelf) components

    • dedicated processors (rather than scavenging cycles from idle workstations)

    • a private system area network (enclosed SAN rather than exposed LAN)

What beowulf adds
What Beowulf Adds

  • Beowulf adds to the PoPC model by emphasizing

    • no custom components

    • easy replication from multiple vendors

    • scalable I/O

    • a freely available software base

    • using freely available distributed computing tools with minimal changes

    • a collaborative design

Advantages of the beowulf approach
Advantages of the Beowulf Approach

  • No single vendor owns the rights to the product -- not vulnerable to single vendor decisions

  • Approach permits technology tracking -- using the best, most recent components at the best price

  • Allows "just in place" configuration -- permits flexible and user driven decisions

Software for beowulf
Software for Beowulf

  • Exploits readily available, usually free software systems

  • These systems are as sophisticated, robust, and efficient as commercial-grade software

  • Derived from community-wide collaborations in operating systems, languages, compilers, and parallel computing libraries

Operating systems etc
Operating Systems, etc.

  • Two of the operating systems used are

    • Linux (Slackware, RedHat, and Debian distributions are all used)

    • FreeBSD

  • Both have

    • commercial distributors and support

    • full X Windowing support

    • a variety of shells

    • a variety of quality compilers

    • message passing libraries, such as PVM and MPI

Beowulf architecture
Beowulf Architecture

  • Beowulf clusters have been assembled around every new generation of commodity CPUs since the first 100 MHz 486DX4 in 1994

    • The idea here is to use fast but cheap CPUs

  • We also need to interconnect them with a private network that is fast but cheap

    • Originally used channel-bonded 10Mbit/sec Ethernet with multiple Ethernet cards per CPU because the 100 MHz processors were faster than the network

    • When 100Mbit/sec Ethernet cards and switches became available, channel bonding was discarded


  • Mostly, Intel 80x86 CPUs have been used, but Beowulf-class clusters have been constructed from other chips such as the DEC Alpha

  • Fast Ethernet is most commonly used, but some use 1Gb/sec Ethernet or Myrinet (about 2.5Gb/sec) where performance is worth the much higher cost


  • Small systems (< 24 nodes) have a simple topology -- a single switch

  • (If price outweighs performance, a hub may be used instead of a switch)



Connecting to the outside world
Connecting to the Outside World

  • If the switch is smart (read expensive) it may be connected directly to the LAN

  • Most often, however, one node of the cluster has a second (slower) network card connecting it to the LAN

Larger systems
Larger Systems

  • Beyond 24 nodes, suitable switches just do not exist for a single-switch solution

  • A two level tree with (non-leaf) nodes of 16-way Ethernet can handle over 200 processors

  • If locality can be exploited (big problem) there is no major performance hit

  • For system wide random communication, the root node switch can be a severe bottleneck

Example of a larger system the hive at nasa gsfc
Example of a Larger System -- The Hive at NASA GSFC


Beowulf at shsu bubbawulf
Beowulf at SHSU -- Bubbawulf

  • Bubbawulf consists of 8 nodes

    • Pentium 350 with 64MB RAM

    • Main node has a 4GB disk

    • Other 7 nodes are headless and diskless

    • Interconnected through a Cisco 2900 switch (100Mb full-duplex switched Ethernet network)

    • The 7 (beowulf2 - beowulf8) mount their file systems off the main node via NFS

Building a low cost supercomputer

Cheaper clusters
Cheaper Clusters about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • The WTAMU Beowulf Project, 1998

    • The "Buffalo" CHIP (Cluster of Hierarchical Integrated Processors)


    • Total cost < $2,500

Hardware about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • 16 port Fast Ethernet switch ($1000)

  • Node 0 (Scavenged -- my 1995 desktop)

    • Intel Pentium 90 processor

    • 16 MB RAM

    • 1.0 GB Hard drive

    • 3COM 3c905b 10Mbs Ethernet card (connection to outside world)

    • Linksys LNE100TX 100Mbs Ethernet card

    • 8x CD-ROM

Hardware about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • Nodes 1-3 ($500 each -- mail order)

  • Intel Pentium 200 processors

  • 32 MB RAM

  • 3.2 GB hard drives

  • Linksys LNE100TX 100Mbs Ethernet card

  • 40x CD-ROM

Software about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • Operating System

    • RedHat Linux 6.0

  • Message Passing Interface

    • LAM MPI version 6.3-3b

Free clusters
Free Clusters about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • The epitome of clusters is the "Stone Soup-ercomputer" at Oak Ridge National Laboratories

  • A group of physicists with no budget built a Beowulf cluster from cast-off PCs and outdated network hardware

The stone soupercomputer
The Stone Soupercomputer about $15,000 and will eventually be upgraded to the maximum of 24 nodes

How to build a free beowulf
How to Build a "Free" Beowulf about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • Gather a bunch of machines that are considered "too slow" to run Microsoft software

    • Typically these might be older Pentiums in the 90 - 233 MHz range -- they need not be identical

    • 16 MB RAM is probably the minimal good amount, 32 MB is better

    • Hard drives are nice -- diskless stations are possible, but harder to set up -- 1GB is plenty big

    • A CD-ROM simplifies hardware installation

    • You'll need at least one monitor and keyboard

Gather network hardware
Gather Network Hardware about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • Fast Ethernet switches are nice, but not usually available at low cost

  • Ethernet hubs are inexpensive, possibly free if older 10Mbs technology (10baseT)

  • 10base2 Ethernet is slow, but cheap because it doesn't require a hub or switch

    • You will take a big performance hit with the slower technology

What next
What Next? about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • After you get the hardware set up, start installing the software

  • Usually Linux is the OS of choice

  • Set up each node as a stand-alone system

  • Let them know about each other by assigning IP addresses (192.168.0.x is a good choice) in /etc/hosts

  • Install communication software (MPI or PVM)

How does one program a beowulf
How Does One Program a Beowulf? about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • The short answer is Message Passing, a technique originally developed for distributed computing

  • The Beowulf architecture means that message passing is more efficient -- it doesn't have to compete with other traffic on the net

  • Other techniques are being explored -- People are just now looking at Java

Message passing software
Message Passing Software about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • PVM (parallel virtual machine) was first

    • Developed at Oak Ridge labs

    • Very widely used (free)

    • Berkeley NOW (network of workstations) project

More recent message passing work
More Recent Message Passing Work about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • MPI (Message-passing Interface)

    • Standard for message passing libraries

    • Defines routines but not implementation

    • Very comprehensive

    • Version 1 released in 1994 with 120+ routines defined

    • Version 2 now available

Ieee task force on cluster computing
IEEE Task Force on Cluster Computing about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • Aim to foster the use and development of clusters

  • Obtained IEEE approval early 1999

  • Main home page:

Conclusions about $15,000 and will eventually be upgraded to the maximum of 24 nodes

  • Cluster computing offers a very attractive cost effective method of achieving high performance

  • Promising future

Quote gill wrote in 1958 quoting papers back to 1953
Quote: Gill wrote in 1958 about $15,000 and will eventually be upgraded to the maximum of 24 nodes(quoting papers back to 1953):

“ … There is therefore nothing new in the basic idea of parallel programming, but only its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation.”

Gill, S. (1958), “Parallel Programming,” The Computer Journal (British) Vol. 1, pp. 2-10.