parallel computers l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Computers PowerPoint Presentation
Download Presentation
Parallel Computers

Loading in 2 Seconds...

play fullscreen
1 / 59

Parallel Computers - PowerPoint PPT Presentation


  • 347 Views
  • Uploaded on

CS147 Lecture 20 Parallel Computers Prof. Sin-Min Lee Department of Computer Science Uniprocessor Systems Improve performance: Allowing multiple, simultaneous memory access - requires multiple address, data, and control buses (one set for each simultaneous memory access)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Parallel Computers' - issac


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
parallel computers

CS147 Lecture 20

Parallel Computers

Prof. Sin-Min Lee

Department of Computer Science

Parallel Computers

uniprocessor systems
Uniprocessor Systems

Improve performance:

  • Allowing multiple, simultaneous memory access

- requires multiple address, data, and control buses

(one set for each simultaneous memory access)

- The memory chip has to be able to handle multiple

transfers simultaneously

Parallel Computers

uniprocessor systems3
Uniprocessor Systems

Multiport Memory:

  • Has two sets of address, data, and control pins to allow simultaneous data transfers to occur
  • CPU and DMA controller can transfer data concurrently
  • A system with more than one CPU could handle simultaneous requests from two different processors

Parallel Computers

uniprocessor systems4
Uniprocessor Systems

Multiport Memory (cont.):

  • Can
  • Multiport memory can handle two requests to read data from the same location at the same time
  • Cannot
  • Process two simultaneous requests to write data to the same memory location
  • - Requests to read from and write to the same memory location simultaneously

Parallel Computers

multiprocessors

Device

Controller

CPU

CPU

Multiprocessors

Bus

CPU

Device

I/O Port

Memory

Parallel Computers

multiprocessors6
Multiprocessors
  • Systems designed to have 2 to 8 CPUs
  • The CPUs all share the other parts of the computer
    • Memory
    • Disk
    • System Bus
    • etc
  • CPUs communicate via Memory and the System Bus

Parallel Computers

multiprocessors7
MultiProcessors
  • Each CPU shares memory, disks, etc
    • Cheaper than clusters
    • Not as good performance as clusters
  • Often used for
    • Small Servers
    • High-end Workstations

Parallel Computers

multiprocessors8
MultiProcessors
  • OS automatically shares work among available CPUs
    • On a workstation…
      • One CPU can be running an engineering design program
      • Another CPU can be doing complex graphics formatting

Parallel Computers

major mimd styles
Major MIMD Styles
  • Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor")
  • Decentralized memory (memory module with CPU)
    • get more memory bandwidth, lower memory latency
    • Drawback: Longer communication latency
    • Drawback: Software model more complex

Parallel Computers

applications of parallel computers
Applications of Parallel Computers
  • Traditionally: government labs, numerically intensive applications
  • Research Institutions
  • Recent Growth in Industrial Applications
    • 236 of the top 500
    • Financial analysis, drug design and analysis, oil exploration, aerospace and automotive

Parallel Computers

multiprocessor systems flynn s classification
Multiprocessor SystemsFlynn’s Classification

Single instruction multiple data (SIMD):

Main

Memory

Control

Unit

Processor

Memory

Communications

Network

Processor

Memory

Processor

Memory

  • Executes a single instruction on multiple data values simultaneously using many processors
  • Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction
  • This task is handled by a single control unit that sends the control signals to each processor.
  • Example: Array processor

Parallel Computers

why multiprocessors
Why Multiprocessors?
  • Microprocessors as the fastest CPUs
    • Collecting several much easier than redesigning 1
  • Complexity of current microprocessors
    • Do we have enough ideas to sustain 1.5X/yr?
    • Can we deliver such complexity on schedule?
  • Slow (but steady) improvement in parallel software (scientific apps, databases, OS)
  • Emergence of embedded and server markets driving microprocessors in addition to desktops
    • Embedded functional parallelism, producer/consumer model
    • Server figure of merit is tasks per hour vs. latency

Parallel Computers

parallel processing intro
Parallel Processing Intro
  • Long term goal of the field: scale number processors to size of budget, desired performance
  • Machines today: Sun Enterprise 10000 (8/00)
    • 64 400 MHz UltraSPARC® II CPUs,64 GB SDRAM memory, 868 18GB disk,tape
    • $4,720,800 total
    • 64 CPUs 15%,64 GB DRAM 11%, disks 55%, cabinet 16% ($10,800 per processor or ~0.2% per processor)
    • Minimal E10K - 1 CPU, 1 GB DRAM, 0 disks, tape ~$286,700
    • $10,800 (4%) per CPU, plus $39,600 board/4 CPUs (~8%/CPU)
  • Machines today: Dell Workstation 220 (2/01)
    • 866 MHz Intel Pentium® III (in Minitower)
    • 0.125 GB RDRAM memory, 1 10GB disk, 12X CD, 17” monitor, nVIDIA GeForce 2 GTS,32MB DDR Graphics card, 1yr service
    • $1,600; for extra processor, add $350 (~20%)

Parallel Computers

organization of multiprocessor systems
Organization of Multiprocessor Systems

Three different ways to organize/classify systems:

  • Flynn’s Classification
  • System Topologies
  • MIMD System Architectures

Parallel Computers

multiprocessor systems flynn s classification15
Multiprocessor SystemsFlynn’s Classification

Flynn’s Classification:

  • Based on the flow of instructions and data processing
  • A computer is classified by:

- whether it processes a single instruction at a time or multiple instructions simultaneously

- whether it operates on one more multiple data sets

Parallel Computers

multiprocessor systems flynn s classification16
Multiprocessor SystemsFlynn’s Classification

Four Categories of Flynn’s Classification:

  • SISD Single instruction single data
  • SIMD Single instruction multiple data
  • MISD Multiple instruction single data **
  • MIMD Multiple instruction multiple data

** The MISD classification is not practical to implement.

In fact, no significant MISD computers have ever been build.

It is included only for completeness.

Parallel Computers

multiprocessor systems flynn s classification17
Multiprocessor SystemsFlynn’s Classification

Single instruction single data (SISD):

  • Consists of a single CPU executing individual instructions on individual data values

Parallel Computers

multiprocessor systems flynn s classification18
Multiprocessor SystemsFlynn’s Classification

Multiple instruction Multiple data (MIMD):

  • Executes different instructions simultaneously
  • Each processor must include its own control unit
  • The processors can be assigned to parts of the same task or to completely separate tasks
  • Example: Multiprocessors, multicomputers

Parallel Computers

popular flynn categories
Popular Flynn Categories
  • SISD (Single Instruction Single Data)
    • Uniprocessors
  • MISD (Multiple Instruction Single Data)
    • ???; multiple processors on a single data stream
  • SIMD (Single Instruction Multiple Data)
    • Examples: Illiac-IV, CM-2
      • Simple programming model
      • Low overhead
      • Flexibility
      • All custom integrated circuits
    • (Phrase reused by Intel marketing for media instructions ~ vector)
  • MIMD (Multiple Instruction Multiple Data)
    • Examples: Sun Enterprise 5000, Cray T3D, SGI Origin
      • Flexible
      • Use off-the-shelf micros
  • MIMD current winner: Concentrate on major design emphasis <= 128 processor MIMD machines

Parallel Computers

multiprocessor systems
Multiprocessor Systems

System Topologies:

  • The topology of a multiprocessor system refers to the pattern of connections between its processors
  • Quantified by standard metrics:
    • Diameter The maximum distance between two processors in the computer system
    • Bandwidth The capacity of a communications link multiplied by the number of such links in the system (best case)
    • Bisectional Bandwidth The total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case)

Parallel Computers

multiprocessor systems system topologies
Multiprocessor SystemsSystem Topologies

Six Categories of System Topologies:

  • Shared bus
  • Ring
  • Tree
  • Mesh
  • Hypercube
  • Completely Connected

Parallel Computers

multiprocessor systems system topologies22
Multiprocessor SystemsSystem Topologies

Shared bus:

  • The simplest topology
  • Processors communicate with each other exclusively via this bus
  • Can handle only one data transmission at a time
  • Can be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitry

M

M

M

P

P

P

Shared Bus

Global

Memory

Parallel Computers

multiprocessor systems system topologies23
Multiprocessor SystemsSystem Topologies

Ring:

  • Uses direct dedicated connections between processors
  • Allows all communication links to be active simultaneously
  • A piece of data may have to travel through several processors to reach its final destination
  • All processors must have two communication links

P

P

P

P

P

P

Parallel Computers

multiprocessor systems system topologies24
Multiprocessor SystemsSystem Topologies

Tree topology:

  • Uses direct connections between processors
  • Each processor has three connections
  • Its primary advantage is its relatively low diameter
  • Example: DADO Computer

P

P

P

P

P

P

Parallel Computers

multiprocessor systems system topologies25
Multiprocessor SystemsSystem Topologies

Mesh topology:

  • Every processor connects to the processors above, below, left, and right
  • Left to right and top to bottom wraparound connections may or may not be present

P

P

P

P

P

P

P

P

P

Parallel Computers

multiprocessor systems system topologies26
Multiprocessor SystemsSystem Topologies

Hypercube:

  • Multidimensional mesh
  • Has n processors, each with log n connections

Parallel Computers

multiprocessor systems system topologies27
Multiprocessor SystemsSystem Topologies

Completely Connected:

  • Every processor has n-1
  • connections, one to each
  • of the other processors
  • The complexity of the
  • processors increases as
  • the system grows
  • Offers maximum
  • communication capabilities

Parallel Computers

architecture details

M

M

M

M

M

P

P

P

P

P

D

D

D

D

C

C

C

C

Architecture Details

World’s simplest computer (processor/memory)

  • Computers  MPPs

Standard computer (add cache,disk)

Network

Parallel Computers

slide29

A Supercomputer at $5.2 million

Virginia Tech 1,100 node Macs.

G5 supercomputer

Parallel Computers

slide30

The Virginia Polytechnic Institute and State University has built a supercomputer comprised of a cluster of 1,100 dual-processor Macintosh G5 computers. Based on preliminary benchmarks, Big Mac is capable of 8.1 teraflops per second. The Mac supercomputer still is being fine tuned, and the full extent of its computing power will not be known until November. But the 8.1 teraflops figure would make the Big Mac the world's fourth fastest supercomputer

Parallel Computers

slide31

Big Mac's cost relative to similar machines is as noteworthy as its performance. The Apple supercomputer was constructed for just over US$5 million, and the cluster was assembled in about four weeks.

In contrast, the world's leading supercomputers cost well over $100 million to build and require several years to construct. The Earth Simulator, which clocked in at 38.5 teraflops in 2002, reportedly cost up to $250 million.

Parallel Computers

slide32

October 28 2003Time:7:30pm - 9:00pmLocation:Santa Clara Ballroom

Srinidhi Varadarajan, Ph.D.Dr. Srinidhi Varadarajan is an Assistant Professor of Computer Science at Virginia Tech. He was honored with the NSF Career Award in 2002 for "Weaving a Code Tapestry: A Compiler Directed Framework for Scalable Network Emulation." He has focused his research on building a distributed network emulation system that can scale to emulate hundreds of thousands of virtual nodes.

Parallel Computers

parallel computers33
Parallel Computers
  • Two common types
    • Cluster
    • Multi-Processor

Parallel Computers

cluster computers
Cluster Computers

Parallel Computers

slide35

Clusters on the Rise

Using clusters of small machines to build a supercomputer is not a new concept.

Another of the world's top machines, housed at the Lawrence Livermore National Laboratory, was constructed from 2,304 Xeon processors. The machine was build by Utah-based Linux Networx.

Clustering technology has meant that traditional big-iron leaders like Cray (Nasdaq: CRAY) and IBM have new competition from makers of smaller machines. Dell (Nasdaq: DELL) , among other companies, has sold high-powered computing clusters to research institutions.

Parallel Computers

cluster computers36
Cluster Computers
  • Each computer in a cluster is a complete computer by itself
    • CPU
    • Memory
    • Disk
    • etc
  • Computers communicate with each other via some interconnection bus

Parallel Computers

cluster computers37
Cluster Computers
  • Typically used where one computer does not have enough capacity to do the expected work
    • Large Servers
  • Cheaper than building one GIANT computer

Parallel Computers

slide38

Although not new, supercomputing clustering technology still is impressive. It works by farming out chunks of data to individual machines, adding that clustering works better for some types of computing problems than others.

For example, a cluster would not be ideal to compete against IBM's Deep Blue supercomputer in a chess match; in this case, all the data must be available to one processor at the same moment -- the machine operates much in the same way as the human brain handles tasks.

However, a cluster would be ideal for the processing of seismic data for oil exploration, because that computing job can be divided into many smaller tasks.

Parallel Computers

cluster computers39
Cluster Computers
  • Need to break up work among the computers in the cluster
  • Example: Microsoft.com Search Engine
    • 6 computers running SQL Server
      • Each has a copy of the MS Knowledge Base
    • Search requests come to one computer
      • Sends request to one of the 6
      • Attempts to keep all 6 busy

Parallel Computers

slide40

The Virginia Tech Mac supercomputer should be fully functional and in use by January 2004. It will be used for research into nanoscale electronics, quantum chemistry, computational chemistry, aerodynamics, molecular statics, computational acoustics and the molecular modeling of proteins.

Parallel Computers

specialized processors
Specialized Processors
  • Vector Processors
  • Massively Parallel Computers

Parallel Computers

vector processors
Vector Processors

For (I=0;I<n;I++) {

array1[I] = array2[I] + array3[I]

}

This is an array (vector) operation

Parallel Computers

vector processors43
Vector Processors

Special instructions to operate on vectors (arrays)

    • Vector instruction specifies
      • Starting addresses of all 3 arrays
      • Loop count
    • Saves For Loop overhead
    • Can more efficiently access memory
  • Also Known as SIMD Computers
    • Single Instruction Multiple Data

Parallel Computers

vector processors44
Vector Processors
  • Until the 1990s, the world’s fastest supercomputers were implemented as vector processors
  • Now, Vector Processors are typically special peripheral devices that can be installed on a “regular” computer

Parallel Computers

massively parallel computers
Massively Parallel Computers
  • IBM ASCI Purple
    • Cluster of 196 computers
    • Each computer has
      • 64 CPUs
      • 256 Gigabytes of RAM
      • 10,000 GB of Disk

Parallel Computers

massively parallel computer
Massively Parallel Computer
  • How will ASCI Purple be used?
    • Simulation of molecular dynamics
      • Research into repairing damaged DNA
    • Analysis of seismic waves
      • Earthquake research
    • Simulation of star evolution
    • Simulation of Weapons of Mass Destruction

Parallel Computers

slide47

According to the article, the supercomputer, powered by 2,200 IBM G5 processors, has been initially rated at computing 7.41 trillion operations per second. The final number could be much higher, according to school officials, but if not, it would rank as the #4 fastest supercomputing cluster in the world.

Japan's US$250M Earth Simulator, which is currently the world's fastest computer

Lawrence Livermore's US$10-15M cluster system, which is made up of 2,304 Intel Xeon processors. IBM recently installed "Pacific Blue" at the Lawrence Livermore Laboratories for $94 million

Parallel Computers

slide48

"We are demonstrating that you can build a very high performance machine for a fifth to a tenth of the cost of what supercomputers now cost," said Hassan Aref, the dean of the School of Engineering at Virginia Tech in Blacksburg

1998 a group called distributed.net linked thousands of computers of all kinds around the world via the Internet, and cracked a 56-bit DES-II code in 40 days. It had previously been thought that such heavyweight ciphers would take hundreds of years to crack even on fast computers. One version of the Distributed.net program ran as a screen saver that kicked in, and began cracking code, whenever the machine was idle for more than a few minutes. Distributed.net bills itself as the "Fastest Computer on Earth", even though their hardware bill is effectively zero.

Parallel Computers

slide49

The idea is straightforward. You set up an arbitrary number of PCs, network them, typically using fast Ethernet, and then send them problems that can be divided up among the machines' processors. One machine acts as a server that syncs up all the rest, called clients.

Beowulf specs software like the Message Passing Interface written under the Linux operating system, that allows the machines to communicate while working on the problem.

And since Linux, brainchild of computer science student Linus Torvalds, is free, it keeps the cost down

Parallel Computers

slide50

Modeling the trajectories of tens of millions of charged particles, each interacting with the others through electro-magnetic forces, requires heavy-duty number crunching. To harness supercomputing power at a desktop price, UCLA’s Dr. Viktor K. Decyk and his colleagues have created their own super-fast, parallel processing “supercomputer” using a cluster of Power Macintosh computers.

Parallel Computers

slide51

Apple's G4 Cubes used for cell mutation detection and genotyping analysis

SYDNEY - 22 January 2001

Parallel Computers

slide52

World's fastest" Macintosh cluster Tuesday, May 15, 2001 @ 8:45am Researchers at the Grupo de Lasers e Plasmas (GoLP) in Portugal have created what they bill as the world's fastest Macintosh-based cluster. Consisting of 16 dual-processor Power Mac G4/450s, the cluster delivers more than 50 GigaFlops of peak power and took just one day to set up.

Parallel Computers

slide53

Apple Computer purchased a big Cray supercomputer in the mid-1980s. In fact, Steve Jobs was Cray's first and only walk-in customer. He arrived unannounced (so the story goes) at Cray headquarters in Mendota Heights, Minnesota and asked to speak to someone about buying a Cray. They nearly threw him out. It's only slightly less eccentric than someone walking into NASA Johnson Space Center and inquiring how to purchase a shuttle orbiter.

Later, Cray president John Rollwagen phoned Seymour and told him that Apple had just purchased a Cray that would be used in designing the next Macintosh. Seymour thought for a bit, and replied that that seemed reasonable, since he was using a Macintosh to design the next Cray!

Parallel Computers

parallel computer architectures
Parallel Computer Architectures

2002

  • MPP – Massively Parallel Processors
    • Top of the top500 list consists of mostly mpps but clusters are “rising”
  • Clusters are there!
    • Earth Simulator: Old-old style making news again
    • ASCI Machines: Big companies, special purpose
    • Beowulf Clusters: Popping up everywhere
  • Software
    • Embarassingly parallel or sacrifice a grad student
    • MATLAB*p (our little homegrown project)

2003

Parallel Computers

performance trends
Performance Trends

Parallel Computers

extrapolations
Extrapolations

Parallel Computers

beowulf clusters
Beowulf Clusters

Parallel Computers

current beowulfs 2
Current Beowulfs (2)

Parallel Computers