Parallelism multi threaded
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Parallelism (Multi-threaded) PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Parallelism (Multi-threaded). Outline. A touch of theory Amdahl’s Law Metrics (speedup, efficiency, redundancy, utilization) Trade-offs Tightly coupled vs. Loosely coupled Shared Distributed vs Shared Centralized CMP vs SMT (and while we are at it: SSMT) Heterogeneous vs. Homogeneous

Download Presentation

Parallelism (Multi-threaded)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Parallelism multi threaded

Parallelism (Multi-threaded)


Outline

Outline

  • A touch of theory

    • Amdahl’s Law

    • Metrics (speedup, efficiency, redundancy, utilization)

  • Trade-offs

    • Tightly coupled vs. Loosely coupled

      • Shared Distributed vs Shared Centralized

    • CMP vs SMT (and while we are at it: SSMT)

    • Heterogeneous vs. Homogeneous

    • Interconnection networks

  • Other important issues

    • Cache Coherency

    • Memory Consistency


Metrics

Metrics

  • Speed-up:

  • Efficiency:

  • Utilization:

  • Redundancy:


An example a4 x 4 a3 x 3 a2 x 2 a1 x a0

An example: a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0


Tightly coupled vs loosely coupled

Tightly-coupled vs Loosely-coupled

  • Tightly coupled (i.e., Multiprocessor)

    • Shared memory

    • Each processor capable of doing work on its own

    • Easier for the software

    • Hardware has to worry about cache coherency,

      memory contention

  • Loosely-coupled (i.e., Multicomputer Network)

    • Message passing

    • Easier for the hardware

    • Programmer’s job is tougher

  • An early distributed shared example: cm*


Cmp vs smt

CMP vs SMT

  • CMP (chip multiprocessor)

    • aka Multi-core

  • SMT (simultaneous multithreading

    • One core – a PC for each thread, in a pipelined fashion

    • Multithreading introduced by Burton Smith, HEP ~1975

    • “Simultaneous” first proposed by H.Hirata et.al, ISCA 1992

    • Carried forward by Nemirovsky, HICSS 1994

    • Popularized by Tullsen, Eggers, ISCA 1995


Heterogeneous vs homogeneous

Heterogeneous vs Homogeneous

Large

core

Largecore

Large

core

Largecore

Largecore

ACMP Approach

“Tile-Large” Approach

“Niagara” Approach

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore

Niagara-likecore


Large core vs small core

Large core vs. Small Core

Out-of-order

Wide fetch e.g. 4-wide

Deeper pipeline

Aggressive branch predictor (e.g. hybrid)

Many functional units

Trace cache

Memory dependence speculation

LargeCore

SmallCore

  • In-order

  • Narrow Fetch e.g. 2-wide

  • Shallow pipeline

  • Simple branch predictor (e.g. Gshare)

  • Few functional units


Throughput vs serial performance

Throughput vs. Serial Performance


Interconnection networks

Interconnection Networks

  • Basic Topologies

    • Bus

    • Crossbar

    • Omega Network

    • Hypercube

    • Tree

    • Mesh

    • Ring

  • Parameters: Cost, Latency, Contention

  • Example: Omega network, N=64, k=2


Three issues

Three issues

  • Interconnection networks

    • Cost

    • Latency

    • Contention

  • Cache Cohererency

    • Snoopy

    • Directory

  • Memory Consistency

    • Sequential Consistency and Mutual Exclusion


Sequential consistency

Sequential Consistency

  • Definition:

    Memory sees loads/stores in program order

  • Examples

  • Why it is important: guarantees mutual exclusion


Two processors and a critical section

Two Processors and a critical section

Processor 1 Processor 2

L1=0 L2=0

A: L1 = 1 X: L2=1

B: If L2 =0 Y: If L1=0

{critical section} {critical section}

C: L1=0 Z: L2=0

What can happen?


How many orders can memory see a b c x y z let s list them

How many orders can memory see A,B,C,X,Y,Z(Let’s list them!)


Cache coherence

Cache Coherence

  • Definition:

    If a cache line is present in more than one cache,

    it has the same values in all caches

  • Why is it important?

  • Snoopy schemes

    • All caches snoop the common bus

  • Directory schemes

    • Each cache line has a directory entry in memory


A snoopy scheme

A Snoopy Scheme


Directory scheme p 3

Directory Scheme (p=3)


  • Login