1 / 39

CS 325: CS Hardware and Software Organization and Architecture

CS 325: CS Hardware and Software Organization and Architecture. Computer Evolution and Performance 2. Outline. Von Neumann Architecture Processor Hierarchy Registers ALU Processor Categories Processor Performance Amdahl’s Law Computer Benchmarks. Von Neumann Architecture.

Download Presentation

CS 325: CS Hardware and Software Organization and Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 325: CS Hardware and SoftwareOrganization and Architecture Computer Evolution and Performance 2

  2. Outline • Von Neumann Architecture • Processor Hierarchy • Registers • ALU • Processor Categories • Processor Performance • Amdahl’s Law • Computer Benchmarks

  3. Von Neumann Architecture • Characteristic of most modern processors. • Central idea is Stored Program. • Three basic components: • Processor • Memory • I/O Facilities

  4. Illustration of Von Neumann Architecture

  5. Processor • Digital Device. • Performs computation involving multiple steps. • Building blocks used to form computer system.

  6. Hierarchical Structure and Computational Engines • Most computer architecture follows a hierarchical approach. • Subparts of a large, central processor are sophisticated enough to meet our definition of a processor. • Some engineers use the term computational engine for sub-piece that is less powerful than the main processor.

  7. Illustration of Processor Hierarchy

  8. Major Components of a Conventional Processor • Controller • Computational Engine (ALU) • Local Data Storage • Internal Interconnections • External Interface

  9. Illustration of a Conventional Processor

  10. Parts of a Conventional Processor • Controller • Overall responsibility for execution • Moves through sequence of steps • Coordinates other units • Computational Engine • Operates as directed by controller • Typically provides arithmetic and Boolean operations (ALU) • Performs one operation at a time

  11. Parts of a Conventional Processor • Local Data Storage • Holds data values for operations • Must be loaded before operation can be performed • Typically implemented with registers • Internal Interconnections • Allows transfer of values among units of the processor • Sometimes called data path

  12. Parts of a Conventional Processor • External Interface • Handles communication between processor and rest of computer system • Provides connections to external memory as well as external I/O devices

  13. Another Illustration of Processor

  14. Parts of a Conventional Processor • ALU • Status Flags: • Neg, Zero, Carry, Overflow • Shifter: • Left  multiplication by 2 • Right  division by 2 • Complementer: • Logical NOT

  15. Example Register Organizations

  16. Processor Registers • Motorola CPU - MC68000 • 8 32-bit general purpose registers (D0 – D7) • 8 32-bit address registers (A0 – A7) • 1 32-bit program counter • 1 16 status register

  17. Processor Registers • Intel 8086 – 16-bit • General Purpose: • AX – Accumulator: Multiply, Divide, I/O • BX – Base: Pointer to base address (data) • CX – Count: Counter for loops, shifts • DX – Data: Multiply, Divide, I/O • Pointer and Index: • SP – Stack Pointer: pointer to top of stack • BP – Base Pointer: pointer to base address (stack) • SI – Source Index: source string/index pointer • DI – Destination Index: Destination string/index pointer • Segment Registers: • CS – Code Segment • DS – Data Segment • SS – Stack Segment • ES – Extra Segment • Program Status: • PC – Program Counter • SR – Status Register

  18. Processor Registers • Intel 80386 – Pentium 2 • Similar to 8086, but register width doubled to 32-bit

  19. Arithmetic Logic Unit (ALU) • Main computational engine in conventional processor. • Complex unit that can perform variety of tasks • Integer arithmetic (add, subtract, multiply, divide) • Shift (left, right, circular) • Boolean (AND, OR, NOT, XOR) • Typically CPU “bit size” refers to ALU and register size • 32-bit CPU  32-bit ALU and registers • 64-bit CPU  64-bit ALU and registers

  20. Processor Categories and Roles • Many possible roles for individual processors in: • Coprocessors • Microcontrollers • Microsequencers • Embedded system processors • General purpose processors

  21. Coprocessor • Operates in conjunction with and under the control of another processor. • Special purpose processor • Performs a single task • Operates at high speed • Example: • Math Coprocessor • Used for floating point mathematical operations

  22. Microcontroller • Programmable device • Dedicated to control of a physical system • Example: • ECU for automobile engine • Roadway intersection traffic lights

  23. Microsequencer • Similar to microcontroller • Controls coprocessors and other engines within a large processor • Example: • Move operands to floating point unit • Invoke an operation (divide) • Move result back to memory

  24. Embedded System Processor • Operates sophisticated electronic device • Usually more powerful than microcontroller • Example: • Controlling a DVD player, including commands from a remote control

  25. General Purpose Processor • Most powerful type of processor • Completely programmable • Full functionality • Example: • CPU in personal computer/laptop (CISC x86 architecture) • CPU in smartphone/tablet (RISC ARM architecture)

  26. Processor Performance

  27. Clock and Instruction Rate • Clock Cycle • Time interval in which all basic circuits (steps) inside a process must complete • Time at which gates are clocked (gate-signal propagation) • Clock Rate • 1/clock cycle (GHz – billion cycles per second) • Instruction Rate • Measure of time required to execute instructions • MIPS – million instructions per second • Varies since some instructions take more time (more clock cycles) than others • Shift left instruction vs. fetch from memory instruction

  28. Basic Performance Equation Define: N = Number of instructions executed in the program S = Average number of cycles for instructions in the program R = Clock rate T = Program execution time T = N * S R

  29. Improve Performance • To improve performance: • Decrease N and/or S • Increase R • Parameters are not independent: • Increasing R may increase S as well • N is primarily controlled by compiler • Processors with large R may not have the best performance • Due to larger S • Making logic circuits faster/smaller is a definite win • Increases R while S and N remain unchanged

  30. Amdahl’s Law • Potential speed up of program using multiple processors. • Concluded that: • Code needs to be parallelizable • Speed up is bound, giving diminishing returns for more processors • Task dependent • Servers gain by maintaining multiple connections on multiple processors • Databases can be split into parallel tasks

  31. Amdahl’s Law • Most important principle in computer design: • Make the common case fast • Optimize for the normal case • Enhancement: any change/modification in the design of a component • Speedup: how much faster a task will execute using an enhanced component versus using the original component. Speedup = Componentenhanced Componentoriginal

  32. Amdahl’s Law • The enhanced feature may not be used all the time. • Let the fraction of the computation time when the enhanced feature is used be F. • Let the speedup when the enhanced feature is used be Se. • Now the execution time with the enhancement is: Exnew = Exold * (1 – F) + Exold * (F/Se) This gives the overall speedup (So) as: So = Exold/Exnew = 1 / ((1 - F) + (F/Se))

  33. Amdahl’s Law – Example 1 • Suppose that we are considering an enhancement that runs 10 times faster than the original component but is usable only 40% of the time. What is the overall speedup gained by incorporating the enhancement? Se = 10 F = 40 / 100 = 0.4 So = 1 / ((1 – F) + (F / Se)) = 1 / (0.6 + (0.4 / 10)) = 1 / 0.64 = 1.56

  34. Amdahl’s Law – Example 2 • Suppose that we hired a guru programmer that made 70% of our program run 15x faster that the original program. What is the speedup of the enhanced program? Se = 15 F = 70 / 100 = 0.7 So = 1 / ((1 – F) + (F / Se)) = 1 / (0.3 + (0.7 / 15)) = 1 / 0.347 = 2.88

  35. Amdahl’s Law – Example 3 • Suppose that we hired two students to enhance our WKU web Server performance. The first student increased the performance of the server by 12% for 85% of the time. The second student increased the performance of the server by 2x for 25% of the time. Which student produced the overall highest speedup? Student1 Student2 Se = 1.12 Se = 2 F = 85 / 100 = 0.85 F = 25 / 100 = 0.25 So = 1 / ((1 – F) + (F / Se)) So = 1 / ((1 – F) + (F / Se)) = 1 / (0.15 + (0.85 / 1.12)) = 1 / (0.75 + (0.25 / 2)) = 1 / 0.909 = 1 / 0.875 = 1.1 = 1.14

  36. Benchmarks • LINPACK (Scientific Computing) • Speed in solving linear system of equations (matrix multiplications) • http://www.top500.org/list/2013/11/

  37. Top 10 Supercomputers

  38. Top 500 Performance Development

  39. Benchmarks - LINPACK • Current fastest supercomputer: • Tianhe-2 (MiklyWay-2) • 3.12 million cores @ 2.2Ghz • 33.86 Pflops/sec = 33,860,000,000,000,000 Floating point operations/sec • Current High End Desktop: • Intel I7 “Haswell” 4770k • 4 cores @ 3.5Ghz • 177 Gflops/sec = 177,000,000,000 Floating point operations/sec • Current Google Android Smartphone: • Google Nexus 5 • 4 cores @ 2.3Ghz ARM RISC Architecture • 393 Mflops/sec = 393,000,000 Floating point operations/sec

More Related