1 / 72

Evolution of Computer Architecture and Parallel Processing

Explore the history and evolution of computer architecture, from Von Neumann architecture to multi-core processors, and learn about the benefits and terminology of parallel processing.

genem
Download Presentation

Evolution of Computer Architecture and Parallel Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPE 478 Parallel Processing picture of Tianhe, the most powerful computer in the world in Nov-2010

  2. Von Neumann Architecture RAM CPU Device Device BUS • sequential computer

  3. Memory Hierarchy Fast Registers Cache Real Memory Disk Slow CD

  4. History of Computer Architecture • 4 Generations (identified by logic technology) • Tubes • Transistors • Integrated Circuits • VLSI (very large scale integration)

  5. PERFORMANCE TRENDS

  6. PERFORMANCE TRENDS • Traditional mainframe/supercomputer performance 25% increase per year • But … microprocessor performance 50% increase per year since mid 80’s.

  7. Moore’s Law • “Transistor density doubles every 18 months” • Moore is co-founder of Intel. • 60 % increase per year • Exponential growth • PC costs decline. • PCs are building bricks of all future systems. Intel 62 core xeon Phi 2012 5 billion

  8. VLSI Generation

  9. Bit Level Parallelism(upto mid 80’s) • 4 bit microprocessors replaced by 8 bit, 16 bit, 32 bit etc. • doubling the width of the datapath reduces the number of cycles required to perform a full 32-bit operation • mid 80’s reap benefits of this kind of parallelism (full 32-bit word operations combined with the use of caches)

  10. Instruction Level Parallelism(mid 80’s to mid 90’s) • Basic steps in instruction processing (instruction decode, integer arithmetic, address calculations, could be performed in a single cycle) • Pipelined instruction processing • Reduced instruction set (RISC) • Superscalar execution • Branch prediction

  11. Thread/Process Level Parallelism(mid 90’s to present) • On average control transfers occur roughly once in five instructions, so exploiting instruction level parallelism at a larger scale is not possible • Use multiple independent “threads” or processes • Concurrently running threads, processes

  12. Evolution of the Infrastructure • Electronic Accounting Machine Era: 1930-1950 • General Purpose Mainframe and Minicomputer Era: 1959-Present • Personal Computer Era: 1981 – Present • Client/Server Era: 1983 – Present • Enterprise Internet Computing Era: 1992- Present

  13. Sequential vs Parallel Processing • physical limits reached • easy to program • expensive supercomputers • “raw” power unlimited • more memory, multiple cache • made up of COTS, so cheap • difficult to program

  14. What is Multi-Core Programming ? • Answer: It is basically parallel programming on a single computer box (e.g. a desktop, a notebook, a blade)

  15. Another Important Benefit of Multi-Core : Reduced Energy Consumption Dual core Single core 1 GHz 1 GHz 2 GHz Single core executes workload of N Clock cycles Each core executes workload of N/2 Clock cycles 2 2 Energy per cycle(E) = C*Vdd Energy=E * N Energy per cycle(E’) = C*(0.5*Vdd) = 0.25*C*Vdd Energy’ = 2*(E’ * 0.5 * N ) = E’ * N = 0.25*(E * N) = 0.25*Energy c c 2 c c c c

  16. SPMD Model (Single Program Multiple Data) • Each processor executes the same program asynchronously • Synchronization takes place only when processors need to exchange data • SPMD is extension of SIMD (relax synchronized instruction execution) • SPMD is restriction of MIMD (use only one source/object)

  17. Parallel Processing Terminology • Embarassingly Parallel: • applications which are trivial to parallelize • large amounts of independent computation • Little communication • Data Parallelism: • model of parallel computing in which a single operation can be applied to all data elements simultaneously • amenable to SIMD or SPMD style of computation • Control Parallelism: • many different operations may be executed concurrently • require MIMD/SPMD style of computation

  18. Parallel Processing Terminology • Scalability: • If the size of problem is increased, number of processors that can be effectively used can be increased (i.e. there is no limit on parallelism). • Cost of scalable algorithm grows slowly as input size and the number of processors are increased. • Data parallel algorithms are more scalable than control parallel alorithms • Granularity: • fine grain machines: employ massive number of weak processors each with small memory • coarse grain machines: smaller number of powerful processors each with large amounts of memory

  19. Models of Parallel Computers • Message Passing Model • Distributed memory • Multicomputer 2. Shared Memory Model • Multiprocessor • Multi-core 3. Theoretical Model • PRAM • New architectures: combination of 1 and 2.

  20. Theoretical PRAM Model • Used by parallel algorithm designers • Algorithm designers do not want to worry about low level details: They want to concentrate on algorithmic details • Extends classic RAM model • Consist of : • Control unit (common clock), synchronous • Global shared memory • Unbounded set of processors, each with its private own memory

  21. Theoretical PRAM Model • Some characteristics • Each processor has a unique identifier, mypid=0,1,2,… • All processors operate synhronously under the control of a common clock • In each unit of time, each procesor is allowed to execute an instruction or stay idle

  22. EREW (exlusive read / exclusive write) CREW (concurrent read / exclusive write) CRCW (concurrent read / concurrent write) Common (must write the same value) Arbitrary (one processor is chosen arbitrarily) Priority (processor with the lowest index writes) Various PRAM Models weakest (how write conflicts to the same memory location are handled) strongest

  23. Flynn’s Taxonomy • classifies computer architectures according to: • Number of instruction streams it can process at a time • Number of data elements on which it can operate simultaneously Data Streams Single Multiple Single SIMD SISD Instruction Streams Multiple MISD MIMD

  24. Shared Address Space process (thread) process (thread) process (thread) process (thread) process (thread) Shared Memory Machines • Memory is globally shared, therefore processes (threads) see single address space • Coordination of accesses to locations done by use of locks provided by thread libraries • Example Machines: Sequent, Alliant, SUN Ultra, Dual/Quad Board Pentium PC • Example Thread Libraries: POSIX threads, Linux threads.

  25. Shared Memory Machines • can be classified as: • UMA: uniform memory access • NUMA: nonuniform memory access based on the amount of time a processor takes to access local and global memory. P M P M .. P M Inter- connection network P M P M .. P M Inter- connection network M M M .. M P P .. P M M .. M Inter- connection network/ or BUS (a) (c) (b)

  26. M process process M Network M process M process process M Distributed Memory Machines • Each processor has its own local memory (not directly accessible by others) • Processors communicate by passing messages to each other • Example Machines: IBM SP2, Intel Paragon, COWs (cluster of workstations) • Example Message Passing Libraries: PVM, MPI

  27. Beowulf Clusters • Use COTS, ordinary PCs and networking equipment • Has the best price/performance ratio PC cluster

  28. Multi-Core Computing • A multi-core microprocessor is one which combines two or more independent processors into a single package, often a single integrated circuit. • A dual-core device contains only two independent microprocessors.

  29. CPU State Execution unit Cache Comparison of Different Architectures Single Core Architecture

  30. CPU State CPU State Execution unit Execution unit Cache Cache Comparison of Different Architectures Multiprocessor

  31. Comparison of Different Architectures CPU State CPU State Execution unit Cache Hyper-Threading Technology

  32. CPU State CPU State Execution unit Execution unit Cache Cache Comparison of Different Architectures Multi-Core Architecture

  33. Comparison of Different Architectures CPU State CPU State Execution unit Execution unit Cache Multi-Core Architecture with Shared Cache

  34. CPU State CPU State CPU State CPU State Execution unit Execution unit Cache Cache Comparison of Different Architectures Multi-Core with Hyper-Threading Technology

  35. Top 500 Most Power Supercomputer Lists • http://www.top500.org/ • ……..

  36. Grid Computing • provide access to computing power and various resources just like accessing electrical power from electrical grid • Allows coupling of geographically distributed resources • Provide inexpensive access to resources irrespective of their physical location or access point • Internet & dedicated networks can be used to interconnect distributed computational resources and present them as a single unified resource • Resources: supercomputers, clusters, storage systems, data resources, special devices 37

  37. Grid Computing • the GRID is, in effect, a set of software tools, which when combined with hardware, would let users tap processing poweroff the Internet as easily as the electrical power can be drawn from the electricty grid. • Examples of Grids: -TeraGrid (USA) -EGEE Grid (Europe) • TR-Grid (Turkey) 38

  38. GRID COMPUTING Power Grid Compute Grid

  39. Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day 40

  40. Virtualization Virtualization is abstraction of computer resources. Make a single physical resource such as a server, an operating system, an application, or storage device appear to function as multiple logical resources It may also mean making multiple physical resources such as storage devices or servers appear as a single logical resource Server virtualization enables companies to run more than one operating system at the same time on a single machine 41

  41. Advantages of Virtualization Most servers run at just 10-15 %capacity – virtualization can increase server utilization to 70% or higher. Higher utilization means fewer computers are required to process the same amount of work. Fewer machines means less power consumption. Legacy applications can also be run on older versions of an operating system Other advantages: easier administration, fault tolerancy, security 42

  42. VMware Virtual Platform Virtual machine 1 Virtual machine 2 Apps 1 Apps 2 OS 1 OS 2 X86, motherboard disks, display, net .. X86, motherboard disks, display, net .. Virtual machines VMware Virtual Platform X86, motherboard, disks, display, net .. Real machines • VMware is now tens of billion dollar company !! 43

  43. Cloud Computing • Style of computing in which IT-related capabilities are provided “as a service”,allowing users to access technology-enabled services from the Internet ("in the cloud") without knowledge of, expertise with, or control over the technology infrastructure that supports them. • General concept that incorporates software as a service (SaaS), Web 2.0 and other recent, well-known technology trends, in which the common theme is reliance on the Internet for satisfying the computing needs of the users. 44

  44. Cloud Computing 45 Virtualisation provides separation between infrastructure and user runtime environment Users specify virtual images as their deployment building blocks Pay-as-you-go allows users to use the service when they want and only pay for what they use Elasticity of the cloud allows users to start simple and explore more complex deployment over time Simple interface allows easy integration with existing systems

  45. Cloud: Unique Features 46 • Ease of use • REST and HTTP(S) • Runtime environment • Hardware virtualisation • Gives users full control • Elasticity • Pay-as-you-go • Cloud providers can buy hardware faster than you!

  46. Example Cloud: Amazon Web Services 47 • EC2 (Elastic Computing Cloud) is the computing service of Amazon • Based on hardware virtualisation • Users request virtual machine instances, pointing to an image (public or private) stored in S3 • Users have full control over each instance (e.g. access as root, if required) • Requests can be issued via SOAP and REST

  47. Example Cloud: Amazon Web Services 48 Pricing information http://aws.amazon.com/ec2/

  48. PARALLEL PERFORMANCE MODELSand ALGORITHMS 49

  49. Amdahl’s Law • The serial percentage of a program is fixed. So speed-up obtained by employing parallel processing is bounded. • Lead to pessimism in in the parallel processing community and prevented development of parallel machines for a long time. 1 Speedup = 1-s s + P • In the limit: Spedup = 1/s s

More Related