overview of parallel architecture l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Overview of Parallel Architecture PowerPoint Presentation
Download Presentation
Overview of Parallel Architecture

Loading in 2 Seconds...

play fullscreen
1 / 54

Overview of Parallel Architecture - PowerPoint PPT Presentation


  • 516 Views
  • Uploaded on

Overview of Parallel Architecture Yulia Newton CS 147, Fall 2009 SJSU What is it? “Parallel processing is the ability of an entity to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition and machine computation.” - Wikipedia Humans

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Overview of Parallel Architecture' - Sophia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview of parallel architecture

Overview of Parallel Architecture

Yulia Newton

CS 147, Fall 2009

SJSU

what is it
What is it?
  • “Parallel processing is the ability of an entity to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition and machine computation.” - Wikipedia
humans
Humans
  • Human body is a parallel processing architecture
  • Human body is the most complex machine known to man
  • We perform parallel processing all the time
slide4
Why?
  • We want to do MORE
  • We want to do it FASTER
limitations of single processor
Limitations of Single Processor
  • Physical
    • Heat
    • Electromagnetic interference
  • Economic
    • Cost
parallel computing
Parallel Computing
  • Divide bigger tasks into smaller tasks
    • Reminds you Divide and Conquer concept?
  • Perform calculations of those tasks simultaneously (in parallel)
with single cpu
With Single CPU
  • Parallel processing is possible with single-CPU, single-core computers
  • Requires very sophisticated software called distributed processing software
  • Most parallel computing is done with multiple CPU’s
speedup from parallelization
Speedup from parallelization
  • More processors = faster computing?
  • Speed-up from parallelization is not linear

Can do single task in a unit of time

Can do twice as many tasks in same the amount of time

Can do four times as many tasks in the same amount of time

speedup
Speedup
  • Definition:

Speedup is how much a parallel algorithm is faster than a corresponding sequential algorithm.

gene myron amdahl
Gene Myron Amdahl
  • Norwegian American computer architect and hi-tech entrepreneur
amdahl s law
Amdahl's Law
  • Formulated in 1960s
  • Gives potential speed-up on parallel computing system

Small portion of components/elements that cannot be parallelized will limit speed-up possible from parallelizable components/elements.

amdahl s law cont d
Amdahl's Law (cont’d)

where :

  • S is the overall speed-up of the system
  • P is the fraction that is parallelizable
amdahl s law alternative form
Amdahl's Law Alternative Form
  • where
    • f – fraction of the program that is enhanced
    • s – speedup of the enhanced portion
amdahl s law example
Amdahl's Law Example

Program

  • Program has four parts

Code 1

Part 1 = 11%

Loop 1

Part 2 = 18%

Loop 2

Part 3 = 23%

Code 2

Part 4 = 48%

amdahl s law example cont d
Amdahl's Law Example (cont’d)
  • Let’s say:
    • P1 is not sped up, so S1 = 1 or 100%
    • P2 is sped up 5×, so S2 = 500%
    • P3 is sped up 20×, so S3 = 2000%
    • P4 is sped up 1.6×, so S4 = 160%
amdahl s law example cont d17
Amdahl's Law Example (cont’d)
  • Using P1/S1 + P2/S2 + P3/S3 + P4/S4:
amdahl s law example cont d18
Amdahl's Law Example (cont’d)
  • 0.4575 is a little less than ½ of the original running time, which is 1
  • Overall speed boost is 1 / 0.4575 = 2.186 or a little more than double the original speed
  • Notice how the 20× and 5× speedup don't have much effect on the overall speed boost and running time when 11% is not sped up, and 48% is sped up by 1.6×
amdahl s law limitations
Amdahl's Law Limitations
  • Based on fixed work load or fixed problem size (strong or hard scaling)
  • Implies that machine size is irrelevant
john l gustafson
John L. Gustafson
  • American computer scientist and businessman
gustafson s law
Gustafson's Law
  • Formulated in 1988
  • Closely related to Amdahl’s Law
  • Addresses the shortcomings of Amdahl's law, which cannot scale to match availability of computing power as the machine size increases
gustafson s law cont d
Gustafson's Law (cont’d)
  • where P is the number of processors, S is the speedup, and α the non-parallelizable part of the process
gustafson s law cont d23
Gustafson's Law (cont’d)
  • Proposes a fixed time concept which leads to scaled speed up for larger problem sizes
  • Uses weak or soft scaling
gustafson s law limitations
Gustafson's Law Limitations
  • Some problems do not have fundamentally larger datasets
    • Example: processing one data point per world citizen gets larger at only a few percent per year
  • Nonlinear algorithms may make it hard to take advantage of parallelism "exposed" by Gustafson's law
parallel architectures
Parallel Architectures
  • Superscalar
  • VLIW
  • Vector Processors
  • Interconnection Networks
  • Shared Memory Multiprocessors
  • Distributed Computing
  • Dataflow Computing
  • Neural Networks
  • Systolic Arrays
superscalar architecture
Superscalar Architecture
  • Implements instruction-level parallelism (multiple instructions are executed simultaneously in each cycle)
  • Net effect is the same as pipelining
  • Additional hardware is required
  • Contains specialized instruction fetch unit (retrieves multiple instructions simultaneously from memory)
  • Relies on both hardware and compiler
superscalar architecture27
Superscalar Architecture
  • Analogous to adding a lane to a highway – more physical resources are required but in the end more cars can pass through
superscalar architecture28
Superscalar Architecture
  • Examples:
    • Pentium x86 processors
    • Intel i960CA
    • AMD 29000-series
vliw architecture
VLIW Architecture
  • Stands for Very Long Instruction Word
  • Similar to Superscalar architecture
  • Relies entirely on compiler
  • Pack independent instructions into one long instruction
  • Bigger size of compiled code
vliw architecture30
VLIW Architecture
  • Examples:
    • Intel’s Itanium, IA-64
    • Floating Point Systems' FPS164
vector processors
Vector Processors
  • Supercomputers are vector processor systems
  • Specialized, heavily pipelined processors
  • Efficient operations on entire vectors and matrices
vector processors32
Vector Processors
  • Heavily pipelined so that arithmetic operations can be overlapped
vector processors example
Vector Processors (example)
  • Instructions on a traditional processor:

For I = 0 to VectorLength

V3[i] = V1[i] + V2[i]

  • Instructions on a vector processor:

LDV V1, R1

LDV V2, R2

ADDV R3, R1, R2

STV R3, V3

vector processors34
Vector Processors
  • Applications:
    • Weather forecasting
    • Medical diagnoses systems
    • Image processing
vector processors35
Vector Processors
  • Examples:
    • Cray series of cupercomputers
    • Xtrillion 3.0
interconnection networks
Interconnection Networks
  • MIMD (Multiple instruction stream, multiple data stream) type architecture
  • Each processor has its own memory
  • Processors are allowed to access other processors’ memory via network
interconnection networks37
Interconnection Networks
  • Can be:
    • Dynamic – allow paths between two components to change from one communication to another
    • Static – do not allow a change of path between communications
interconnection networks38
Interconnection Networks
  • Can be:
    • Non-blocking – allow new connections in the presence of another simultaneous connections
    • Blocking – do not allow simultaneous connections
interconnection networks39
Interconnection Networks
  • Sample interconnection networks:
shared memory architecture
Shared Memory Architecture
  • MIMD (Multiple instruction stream, multiple data stream) type architecture
  • Single memory pool accessed by all processors
  • Can be categorized by how the memory access is performed:
    • UMA (Uniform Memory Access) – all processors have equal access to entire memory
    • NUMA (Non-Uniform Memory Access) – each processor has access to its own piece of memory
shared memory architecture41
Shared Memory Architecture
  • Diagram of a Shared Memory system:
distributed computing
Distributed Computing
  • Very loosely coupled multicomputer system, connected by buses or through a network
  • Main advantage is cost
  • Main disadvantage is slow communication due to the physical distance between computing units
distributed computing43
Distributed Computing
  • Example:
    • SETI (Search for Extra Terrestrial Intelligence) group at UC Berkley analyzes data from radio-telescopes
    • To help this project PC users can install SETI screen saver on their home computer
    • Data will be analyzed during the processor’s idle time
    • Half a million years of CPU time accumulated in about 18 months!
data flow computing
Data Flow Computing
  • Von Neumann machines exhibit sequential control flow; data and instructions are segregated
  • Dataflow computing – control of the program is directly tied to the data
  • Order of the instructions does not matter
  • Each instruction is considered a separate process
  • Tokens are executed rather than instructions
data flow computing45
Data Flow Computing
  • Data Flow Graph for y=(c+3)*f
data flow computing46
Data Flow Computing
  • Data flow system diagram:
neural networks
Neural Networks
  • Good for massively parallel applications, fault tolerance and adapting to changing circumstances
  • Based on parallel architecture of human brain
  • Composed of large number of simple processing elements
  • Trainable
neural networks48
Neural Networks
  • Model human brain
  • Perceptron is the simplest example
neural networks49
Neural Networks
  • Applications:
    • Quality control
    • Financial and economic forecasting
    • Oil and gas exploration
    • Speech and pattern recognition
    • Health care cost reduction
    • Bankruptcy prediction
systolic arrays
Systolic Arrays
  • Networks of processing elements that rhythmically compute data by circulating it through the system
  • Variations of SIMD architecture
systolic arrays51
Systolic Arrays
  • Chain of processors in a systolic array system:
to recap
To Recap
  • We completed overview of the following parallel processing architectures:
      • Superscalar
      • VLIW
      • Vector Processors
      • Interconnection Networks
      • Shared Memory Multiprocessors
      • Distributed Computing
      • Dataflow Computing
      • Neural Networks
      • Systolic Arrays
references and sources
References and Sources
  • Null, Linda and Julia Lobur. “Computer Organization and Architecture”
  • http://en.wikipedia.org
  • http://www.cs.berkeley.edu/~culler/cs258-s99
  • http://www.cs.iastate.edu/~prabhu
  • http://www.answers.com
  • http://www.gigaflop.demon.co.uk
questions
Questions?

Thank you!