1 / 23

Other Architectures & Examples

Other Architectures & Examples. Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006. Context switching. Delays and poor resource utilization due to - Data/control hazards cache misses waiting for some event Solution –

caryn-weber
Download Presentation

Other Architectures & Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1st May, 2006 Anshul Kumar, CSE IITD

  2. Context switching • Delays and poor resource utilization due to - • Data/control hazards • cache misses • waiting for some event • Solution – • context switch to another thread • Context switch mechanism – • operating system - slow • hardware - fast Anshul Kumar, CSE IITD

  3. Multithreaded architecture • Hardware context switching • Models • control flow or hybrid (control flow, data flow) • Granularity • fine grain or coarse grain • Memory organization • shared?, distributed?, cache coherent? • No. of threads • small, medium, large Anshul Kumar, CSE IITD

  4. ILP and Multithreading ILP Coarse MT Fine MT SMT Hennessy and Patterson

  5. Chip level multithreading Executing instructions from multiple threads within one processor chip at the same time. • Multithreading: Interleaved issue of multiple instructions from different threads • Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. • Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently • Any combination of multithreading/SMT/CMP Wikipedia Anshul Kumar, CSE IITD

  6. Historical Examples Machine Granu- Procs Threads/ Memory Year larity proc HEP from fine max 16 8 active shared 1978 Denelcor 64 max centralized Tera fine max 256 128 distributed 1990 shared Alewife coarse max 512 1 active CC 1990 (MIT) sparcle 3 loaded Anshul Kumar, CSE IITD

  7. Modern examples • Pentium 4 Hyperthreading • MIPS MT 8 cores with 4 threads each • IBM Power 5 dual core, 2 threads each • Ultrasparc T1 fine grained multithreading Anshul Kumar, CSE IITD

  8. HEP Control loop 8 stage pipeline scheduler function unit PSW queue Program memory Matching unit Increment control Registers Operand fetch SFU FU1 FU2 FUn To/from data memory Anshul Kumar, CSE IITD

  9. Control Flow & Data Flow models • Control Flow (von Neumann) • control flows through a sequence of instructions, branches can alter the flow • instructions get data from or put data in memory • explicit parallelism through control operators – fork/join • Data Flow • instructions are triggered by availability of data • data flows from instruction to instruction • explicit parallelism Anshul Kumar, CSE IITD

  10. Dataflow Model A B 1 - + A-B B+1 * R=(A-B)*(B+1) Anshul Kumar, CSE IITD

  11. Dataflow Program - L1: Compute B A L3: L2/2 L2: L3/1 + - B B 1 L4/2 L4/1 L4: A-B * B+1 L6/1 R=(A-B)*(B+1) Anshul Kumar, CSE IITD

  12. Static Dataflow Architecture Activity Store Fetch unit FU1 FU2 FUn Instruction queue Update unit to/from other PEs Anshul Kumar, CSE IITD

  13. Tagged-token dataflow architecture Matching unit Matching store Instruction/ data memory Fetch unit FU1 FU2 FUn Token queue Form token unit to/from other PEs Anshul Kumar, CSE IITD

  14. UMA Examples • Earlier approach : Large number of processors (e.g. Denelcor HEP, NYU Ultracomputer) • Now realized : Good only for small number of processors (e.g. Encore Multimax - 1980’s, SGI Power Challenge - 1990’s) Anshul Kumar, CSE IITD

  15. SGI Power Challenge • 18 MIPS R 8000 • 16 GB RAM, 8-way interleaved • 4 power channel-2, each 320 MB/s (I/O bus) • Power path-2 : split transaction shared bus (256 bit data, 40 bit address) • Snoopy cache coherence protocol Anshul Kumar, CSE IITD

  16. NUMA Examples • BBN TC2000 • IBM RP3 • Hector • Cray T3D Anshul Kumar, CSE IITD

  17. Hector • Hierarchical Structure global ring local rings stations Proc module (P+C+M) I/O module Anshul Kumar, CSE IITD

  18. Hector station station station local ring global ring local ring station station station Station Station bus Station controller Proc module Proc module Proc module I/O module Anshul Kumar, CSE IITD

  19. Cray T3D • Alpha 21064 Proc Cray Y-MP host • upto 128 GB memory • 4x4x4 3D torus - config upto 8x8x8 • 2 PEs in each node Anshul Kumar, CSE IITD

  20. CC-NUMA examples Machine Nodes Mem Cache Net Wisconsin single proc per col bus snoopy bus grid Multicube Aquarius single proc per node snoopy+ bus grid Multimulti directory Stanford cluster per cluster snoopy+ pair of Dash 4 R3000+ directory meshes FPU on bus Stanford single proc per node directory 2D Flash T5+magic chip mesh Convex hyper node per SCI X bar Exemplar 8 PA-RISC hyper node (hyper node) multi rings Magic chip : memory + I/O + network controller Anshul Kumar, CSE IITD

  21. COMA examples • DDM (Data Diffusion Machine) • single bus (split transaction) • can be made hierarchical • KSR 1 • hierarchical rings • distributed directory is a matrix : rows for pages, columns for caches Anshul Kumar, CSE IITD

  22. Distr Mem Arch Examples Machine Comp. Comm. Vec. Switch Topology proc proc proc nCUBE2 custom custom hyper cube iPSC2 i386 yes yes hyper cube Intel i860 i860 custom 2D mesh Paragon Genesis i870 i870 custom 2 level X bar Manna i860 i860 16x16 X bar hierarch. Parsytec P.PC601 T805 C004 3D mesh Transtech i860 T805 C004 variable Paramid IBM SP2 Power2 i860 custom fat tree Meiko SPARC custom Fujitsu custom fat tree C32 Parsys T900 T900 C104 hierarch sw SN9800 Anshul Kumar, CSE IITD

  23. References • D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997. Anshul Kumar, CSE IITD

More Related