ETM 555 Supplementary Lecture Notes Version 5. / 201 2 Contents: - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ETM 555 Supplementary Lecture Notes Version 5. / 201 2 Contents: PowerPoint Presentation
Download Presentation
ETM 555 Supplementary Lecture Notes Version 5. / 201 2 Contents:

play fullscreen
1 / 90
ETM 555 Supplementary Lecture Notes Version 5. / 201 2 Contents:
128 Views
Download Presentation
gram
Download Presentation

ETM 555 Supplementary Lecture Notes Version 5. / 201 2 Contents:

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ETM 555 Supplementary Lecture Notes Version 5. / 2012 Contents: Part 1: Hardware/Software Systems, Grid / Cloud Computing

  2. Part 1 Hardware/Software Systems, Grid Computing

  3. Hardware Parallel/Distributed Processing High Performance ComputingTop 500 listGrid computing ETM 555 picture of Tianhe, the most powerful computer in the world in Nov-2010

  4. Von Neumann Architecture RAM CPU Device Device BUS • sequential computer

  5. History of Computer Architecture • 4 Generations (identified by logic technology) • Tubes • Transistors • Integrated Circuits • VLSI (very large scale integration)

  6. PERFORMANCE TRENDS

  7. PERFORMANCE TRENDS • Traditional mainframe/supercomputer performance 25% increase per year • But … microprocessor performance 50% increase per year since mid 80’s.

  8. Moore’s Law • “Transistor density doubles every 18 months” • Moore is co-founder of Intel. • 60 % increase per year • Exponential growth • PC costs decline. • PCs are building bricks of all future systems.

  9. VLSI Generation

  10. Bit Level Parallelism(upto mid 80’s) • 4 bit microprocessors replaced by 8 bit, 16 bit, 32 bit etc. • doubling the width of the datapath reduces the number of cycles required to perform a full 32-bit operation • mid 80’s reap benefits of this kind of parallelism (full 32-bit word operations combined with the use of caches)

  11. Instruction Level Parallelism(mid 80’s to mid 90’s) • Basic steps in instruction processing (instruction decode, integer arithmetic, address calculations, could be performed in a single cycle) • Pipelined instruction processing • Reduced instruction set (RISC) • Superscalar execution • Branch prediction

  12. Thread/Process Level Parallelism(mid 90’s to present) • On average control transfers occur roughly once in five instructions, so exploiting instruction level parallelism at a larger scale is not possible • Use multiple independent “threads” or processes • Concurrently running threads, processes

  13. Evolution of the Infrastructure • Electronic Accounting Machine Era: 1930-1950 • General Purpose Mainframe and Minicomputer Era: 1959-Present • Personal Computer Era: 1981 – Present • Client/Server Era: 1983 – Present • Enterprise Internet Computing Era: 1992- Present

  14. Memory Hierarchy Fast Registers Cache Real Memory Disk Slow CD

  15. Sequential vs Parallel Processing • physical limits reached • easy to program • expensive supercomputers • “raw” power unlimited • more memory, multiple cache • made up of COTS, so cheap • difficult to program

  16. Amdahl’s Law • The serial percentage of a program is fixed. So speed-up obtained by employing parallel processing is bounded. • Lead to pessimism in in the parallel processing community and prevented development of parallel machines for a long time. 1 Speedup = 1-s s + P • In the limit: • Spedup = 1/s s

  17. Gustafson’s Law • Serial percentage is dependent on the number of processors/input. • Demonstrated achieving more than 1000 fold speedup using 1024 processors. • Justified parallel processing

  18. Grand Challenge Applications • Important scientific & engineering problems identified by U.S. High Performance Computing & Communications Program (’92)

  19. Flynn’s Taxonomy • classifies computer architectures according to: • Number of instruction streams it can process at a time • Number of data elements on which it can operate simultaneously Data Streams Single Multiple Single SIMD SISD Instruction Streams Multiple MISD MIMD

  20. SPMD Model (Single Program Multiple Data) • Each processor executes the same program asynchronously • Synchronization takes place only when processors need to exchange data • SPMD is extension of SIMD (relax synchronized instruction execution) • SPMD is restriction of MIMD (use only one source/object)

  21. Parallel Processing Terminology • Embarassingly Parallel: • applications which are trivial to parallelize • large amounts of independent computation • Little communication • Data Parallelism: • model of parallel computing in which a single operation can be applied to all data elements simultaneously • amenable to SIMD or SPMD style of computation • Control Parallelism: • many different operations may be executed concurrently • require MIMD/SPMD style of computation

  22. Parallel Processing Terminology • Scalability: • If the size of problem is increased, number of processors that can be effectively used can be increased (i.e. there is no limit on parallelism). • Cost of scalable algorithm grows slowly as input size and the number of processors are increased. • Data parallel algorithms are more scalable than control parallel alorithms • Granularity: • fine grain machines: employ massive number of weak processors each with small memory • coarse grain machines: smaller number of powerful processors each with large amounts of memory

  23. Shared Address Space process (thread) process (thread) process (thread) process (thread) process (thread) Shared Memory Machines • Memory is globally shared, therefore processes (threads) see single address • space • Coordination of accesses to locations done by use of locks provided by • thread libraries • Example Machines: Sequent, Alliant, SUN Ultra, Dual/Quad Board Pentium PC • Example Thread Libraries: POSIX threads, Linux threads.

  24. Shared Memory Machines • can be classified as: • UMA: uniform memory access • NUMA: nonuniform memory access • based on the amount of time a processor takes to access local and global memory. P M P M .. P M Inter- connection network P M P M .. P M Inter- connection network M M M .. M P P .. P M M .. M Inter- connection network/ or BUS (a) (c) (b)

  25. M process process M Network M process M process process M Distributed Memory Machines • Each processor has its own local memory (not directly accessible by others) • Processors communicate by passing messages to each other • Example Machines: IBM SP2, Intel Paragon, COWs (cluster of workstations) • Example Message Passing Libraries: PVM, MPI

  26. Beowulf Clusters • Use COTS, ordinary PCs and networking equipment • Has the best price/performance ratio PC cluster

  27. What is Multi-Core Programming ? • Answer: It is basically parallel programming on a single computer box (e.g. a desktop, a notebook, a blade)

  28. Important Benefit of Multi-Core : Reduced Energy Consumption Dual core Single core 1 GHz 1 GHz 2 GHz 2 2 Energy per cycle(E) = C*Vdd Energy=E * N Energy per cycle(E’) = C*(0.5*Vdd) = 0.25*C*Vdd Energy’ = 2*(E’ * 0.5 * N ) = E’ * N = 0.25*(E * N) = 0.25*Energy c c 2 c c c c

  29. Multi-Core Computing • A multi-core microprocessor is one which combines two or more independent processors into a single package, often a single integrated circuit. • A dual-core device contains only two independent microprocessors.

  30. CPU State Execution unit Cache Comparison of Different Architectures Single Core Architecture

  31. CPU State CPU State Execution unit Execution unit Cache Cache Comparison of Different Architectures Multiprocessor

  32. Comparison of Different Architectures CPU State CPU State Execution unit Cache Hyper-Threading Technology

  33. CPU State CPU State Execution unit Execution unit Cache Cache Comparison of Different Architectures Multi-Core Architecture

  34. Comparison of Different Architectures CPU State CPU State Execution unit Execution unit Cache Multi-Core Architecture with Shared Cache

  35. CPU State CPU State CPU State CPU State Execution unit Execution unit Cache Cache Comparison of Different Architectures Multi-Core with Hyper-Threading Technology

  36. Graphics Processing Units (GPUs) • GPU devotes more transistors to data processing

  37. Hillis’ Thesis ’85 (back to the future !) Piece of silicon Sequential computer Parallel computer • proposed “The Connection Machine” with massive number of processors each with small memory operating in SIMD mode. • CM-1, CM-2 machines from Thinking Machines Corporation (TMC)were examples of this architecture with 32K-128K processors.

  38. Floating Point Operations for the CPU and the GPU

  39. Memory Bandwidth for the CPU and the GPU

  40. NVIDIA GPU Supports Various Languages or Application Programming Interfaces

  41. Automatic Scalability A multithreaded program is partitioned into blocks of threads that execute independently from each other, so that a GPU with more cores will automatically execute the program in less time than a GPU with fewer cores.

  42. Grid of Thread Blocks

  43. Memory Hierarchy

  44. GPU Programming Model • Heterogeneous Programming • Serial code executes on the host while • parallel code executes on the device.

  45. Top 500 Most Powerful Computers List • http://www.top500.org/list/2011/06

  46. Grid Computing • provide access to computing power and various resources just like accessing electrical power from electrical grid • Allows coupling of geographically distributed resources • Provide inexpensive access to resources irrespective of their physical location or access point • Internet & dedicated networks can be used to interconnect distributed computational resources and present them as a single unified resource • Resources: supercomputers, clusters, storage systems, data resources, special devices

  47. Grid Computing • the GRID is, in effect, a set of software tools, which when combined with hardware, would let users tap processing poweroff the Internet as easily as the electrical power can be drawn from the electricty grid. • Examples of Grids: • -TeraGrid (USA) • -EGEE Grid (Europe) • TR-Grid (Turkey)

  48. GRID COMPUTING Power Grid Compute Grid

  49. Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day