1 / 26

Machine Instruction Rulebook

Machine Instruction Rulebook. CPU design must support machine instructions Thus a CPU has an instruction set Intel, for example, has the x86 instruction set As instruction set grows old, designers have choice One option: start over (e.g. IBM's S/360)

denton
Download Presentation

Machine Instruction Rulebook

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Instruction Rulebook • CPU design must support machine instructions • Thus a CPU has an instruction set • Intel, for example, has the x86 instruction set • As instruction set grows old, designers have choice • One option: start over (e.g. IBM's S/360) • Typical choice: backward compatibility (x86) • Middle option: modularize the instruction set (e.g. extensions for Intel, ARM)

  2. CISC versus RISC • RISC = Reduced Instruction Set Computer • CISC = Complex Instruction Set Computer • RISC circuitry is simpler than that of CISC • Translates to speed, reliability, lower costs • RISC doesn’t have lots of “high-level” instructions at hardware level • RISC puts burden on compilers • Even CISC chips are typically RISC on the inside • Intel is CISC, ARM is RISC

  3. Pipelining • Push multiple instructions thru cycle at same time • Trick: divide instruction processing into stages • For example, fetch first instruction • Then do two things at same time: • Decode first instruction • Fetch next instruction • Then do three things at same time: • Execute first instruction • Decode second instruction • Fetch third instruction

  4. An Example Six-Stage Pipeline • FI: Fetch Instruction • DI: Decode Instruction • CO: Calculate Operands (figure out where operands are) • FO: Fetch Operands • EI: Execute Instruction • WO: Write Operand

  5. Changing the Meaning of Time

  6. Adding Stages, Saving Time • Each clock pulse redefined • was: “go through all stages for one instruction” • now: “go through a stage for many instructions” • The “old” clock tick included all six stages • The “new” clock tick is for all stages at once • How do we quantify? Need delay for longest stage • Modern pipelines can have as many as 30 or more stages • So, how do we save time? Marginal time per instruction • Consider timing for the nine instructions in last slide • Without pipeline: 9 “old” clock ticks • With pipeline?

  7. Pentium 4 Pipeline

  8. Creating Stages with Memory • Need some way to separate stages • Need to “pipe” output of one stage to input of next • Regular memory uses flip-flops (edge-triggered) • The pipeline can remember using latches (level-triggered) • So establish latches to remember output of each stage • Latch also serves as input for the next stage

  9. Pipelines Hate Branching

  10. Two Ways to Predict

  11. A Third Way: EPIC • Explicitly Parallel Instruction Computing • Co-developed by Intel and HP • Itanium was first implementation of EPIC • Innovative solution to branch prediction problem • don't try to predict at all! • execute all the paths in the code (up to a point) • keep up with register copies for each path • when branch decided, free up “extra” registers

  12. Hazard: Resource Conflict

  13. Hazard: Data Conflict RAW: read ahead of write (fetching operand before change)

  14. Superscalar Architecture • Use multiple function units • multiple instructions can execute in parallel • each uses its own circuitry (e.g. multiple ALUs) • Issues • some instructions shouldn’t execute in parallel • difficult to design CPU that decides • put burden on compiler (e.g. the Pentium optimized compiler versus generic compiler)

  15. Multiple Function Units Execute Fetch Decode Execute Store Execute

  16. Also Multiple Types of Units

  17. Pipeline versus Superscalar

  18. Decoding and Dispatching

  19. Doing Things Out of Order • Dispatcher can grab any instruction • while waiting on fetch, do something useful • Look out for data hazards • RAW: read ahead of write • WAR: write ahead of read • WAW: write ahead of write • Dispatcher must be aware of dependencies • Renaming registers: pros and cons

  20. Dealing with Dependencies

  21. Pentium II Dispatch/Execute Unit

  22. Pentium 4 does Superscalar

  23. UltraSPARC II Pipeline

  24. Improving Performance • Rev up the clock speed • Redesign circuits to reduce delay, e.g. • Use ripple adder: decrease worst-case ALU propagation • Reorder some microinstructions • Add an incrementer to PC register • Add pre-fetcher, pipeline, superscalar • Add lots of registers (RISC) • Branch prediction, speculative execution • Note tradeoff among speed, cost, and space • Find things to do in parallel

  25. Working Smarter and Harder

More Related