1 / 20

Understanding Multi-threaded Parallelism: Theory, Metrics, and Architectures

This outline delves into the principles of parallelism in computing, focusing on multi-threaded architectures. It explores foundational theories such as Amdahl's Law and examines key performance metrics including speedup, efficiency, and utilization. The differences between tightly coupled and loosely coupled systems are dissected, including comparisons of shared distributed versus shared centralized architectures. Additionally, the outline discusses critical aspects of cache coherency, memory consistency, and various interconnection networks. Key examples and trade-offs related to CMP and SMT architectures are also presented.

havyn
Download Presentation

Understanding Multi-threaded Parallelism: Theory, Metrics, and Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelism (Multi-threaded)

  2. Outline • A touch of theory • Amdahl’s Law • Metrics (speedup, efficiency, redundancy, utilization) • Trade-offs • Tightly coupled vs. Loosely coupled • Shared Distributed vs Shared Centralized • CMP vs SMT (and while we are at it: SSMT) • Heterogeneous vs. Homogeneous • Interconnection networks • Other important issues • Cache Coherency • Memory Consistency

  3. Metrics • Speed-up: • Efficiency: • Utilization: • Redundancy:

  4. An example: a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0

  5. Tightly-coupled vs Loosely-coupled • Tightly coupled (i.e., Multiprocessor) • Shared memory • Each processor capable of doing work on its own • Easier for the software • Hardware has to worry about cache coherency, memory contention • Loosely-coupled (i.e., Multicomputer Network) • Message passing • Easier for the hardware • Programmer’s job is tougher • An early distributed shared example: cm*

  6. CMP vs SMT • CMP (chip multiprocessor) • aka Multi-core • SMT (simultaneous multithreading • One core – a PC for each thread, in a pipelined fashion • Multithreading introduced by Burton Smith, HEP ~1975 • “Simultaneous” first proposed by H.Hirata et.al, ISCA 1992 • Carried forward by Nemirovsky, HICSS 1994 • Popularized by Tullsen, Eggers, ISCA 1995

  7. Heterogeneous vs Homogeneous Large core Largecore Large core Largecore Largecore ACMP Approach “Tile-Large” Approach “Niagara” Approach Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore

  8. Large core vs. Small Core Out-of-order Wide fetch e.g. 4-wide Deeper pipeline Aggressive branch predictor (e.g. hybrid) Many functional units Trace cache Memory dependence speculation LargeCore SmallCore • In-order • Narrow Fetch e.g. 2-wide • Shallow pipeline • Simple branch predictor (e.g. Gshare) • Few functional units

  9. Throughput vs. Serial Performance

  10. Interconnection Networks • Basic Topologies • Bus • Crossbar • Omega Network • Hypercube • Tree • Mesh • Ring • Parameters: Cost, Latency, Contention • Example: Omega network, N=64, k=2

  11. Three issues • Interconnection networks • Cost • Latency • Contention • Cache Cohererency • Snoopy • Directory • Memory Consistency • Sequential Consistency and Mutual Exclusion

  12. Sequential Consistency • Definition: Memory sees loads/stores in program order • Examples • Why it is important: guarantees mutual exclusion

  13. Two Processors and a critical section Processor 1 Processor 2 L1=0 L2=0 A: L1 = 1 X: L2=1 B: If L2 =0 Y: If L1=0 {critical section} {critical section} C: L1=0 Z: L2=0 What can happen?

  14. How many orders can memory see A,B,C,X,Y,Z(Let’s list them!)

  15. Cache Coherence • Definition: If a cache line is present in more than one cache, it has the same values in all caches • Why is it important? • Snoopy schemes • All caches snoop the common bus • Directory schemes • Each cache line has a directory entry in memory

  16. A Snoopy Scheme

  17. Directory Scheme (p=3)

More Related