2.12k likes | 2.14k Views
14. Course Review. Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2017. THANK YOU. Email LinkedIn Twitter Weibo... Don't hesitate to keep in touch:). Lectures 02-03. Fundamentals of Computer Design. Classes of Parallel Arch itectures. according to the parallelism
E N D
14 Course Review Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2017
EmailLinkedInTwitterWeibo...Don't hesitate to keep in touch:)
Lectures 02-03 Fundamentals of Computer Design
Classes of Parallel Architectures according to the parallelism in the instruction and data streams called for by the instructions: SISD, SIMD, MISD, MIMD
SISD • Single instruction stream single data stream • uniprocessor • Can exploit instruction-level parallelism
SIMD • Single instruction stream multiple data stream • The same instruction is executed by multiple processors using different data streams. • Exploits data-level parallelism • Data memory for each processor; whereas a single instruction memory and control processor.
MISD • Multiple instruction streams single data stream • No commercial multiprocessor of this type yet
MIMD • Multiple instruction streams multiple data streams • Each processor fetches its own instructions and operates on its own data. • Exploits task-level parallelism
Instruction Set Architecture ISA • actual programmer-visible instruction set • the boundary between software and hardware
ISA: Class • Most are general-purpose register architectures with operands of either registers or memory locations • Two popular versions register-memory ISA: e.g., 80x86 many instructions can access memory load-store ISA: e.g., ARM, MIPS only load or store instructions can access memory
ISA: Memory Addressing • Byte addressing supports accessing individual bytes of data rather than only larger units called words • Aligned address object width: s bytes address: A aligned if A mod s = 0
ISA: Addressing Modes • Specify the address of a memory object • Register Add R2, R1; R2<-R2+R1 • Immediate Add R2, #3; R2<-R2+3 • Displacement Add R2, 100(R1); R2<-R2+M[100+R1]
Trends in Cost • Cost of an Integrated Circuit wafer for test; chopped into dies for packaging
Trends in Cost • Cost of an Integrated Circuit percentage of manufactured devices that survives the testing procedure
Trends in Cost • Cost of an Integrated Circuit
Trends in Cost • Cost of an Integrated Circuit
Trends in Cost • Cost of an Integrated Circuit • N: process-complexity factor for measuring manufacturing difficulty
Dependability • Two measures of dependability Module reliability Module availability
Dependability • Two measures of dependability Module reliability continuous service accomplishment from a reference initial instant MTTF: mean time to failure MTTR: mean time to repair MTBF: mean time between failures MTBF = MTTF + MTTR 1st f 2nd f
Dependability • Two measures of dependability Module reliability FIT: failures in time failures per billion hours MTTF of 1,000,000 hours = 109/106 = 1000 FIT
Dependability • Two measures of dependability Module availability
Measuring Performance • Execution time the time between the start and the completion of an event • Throughput the total amount of work done in a given time
Measuring Performance • Computer X and Computer Y • X is n times faster than Y
Quantitative Principles • Parallelism • Locality temporal locality: recently accessed items are likely to be accessed in the near future; spatial locality: items whose addresses are near one another tend to be referenced close together in time
Quantitative Principles • Amdahl’s Law
Quantitative Principles • Amdahl’s Law: two factors 1. Fractionenhanced: e.g., 20/60 if 20 seconds out of a 60-second program to enhance 2. Speedupenhanced: e.g., 5/2 if enhanced to 2 seconds while originally 5 seconds
Quantitative Principles • The Processor Performance Equation
ICi: the number of times instruction i is executed in a program CPIi: the average number of clocks per instruction for instruction i
Lecture 04 Instruction Set Principles
ISA Classification • Classification Basis the type of internal storage: stack accumulator register • ISA Classes: stack architecture accumulator architecture general-purpose register architecture (GPR)
ISA Classes:Stack Architecture • implicit operands on the Top Of the Stack • C = A + B Push A Push B Add Pop C First operand removed from stack Second op replaced by the result memory
ISA Classes:Accumulator Architecture • one implicit operand: the accumulator one explicit operand: mem location • C = A + B Load A Add B Store C accumulator is both an implicit input operand and a result memory
ISA Classes:General-Purpose Register Arch • Only explicit operands registers memory locations • Operand access: direct memory access loaded into temporary storage first
ISA Classes:General-Purpose Register Arch Two Classes: • register-memory architecture any instruction can access memory • load-store architecture only load and store instructions can access memory
ISA Classes:General-Purpose Register Arch Two Classes: • register-memory architecture any instruction can access mem • C = A + B Load R1, A Add R3, R1, B Store R3, C
ISA Classes:General-Purpose Register Arch Two Classes: • load-store architecture only load and store instructions can access memory • C = A + B Load R1, A Load R2, B Add R3, R1, R2 Store R3, C
GPR Classification • ALU instruction has 2 or 3 operands? 2 = 1 result&source op + 1 source op 3 = 1 result op + 2 source op • ALU instruction has 0, 1, 2, or 3 operands of memory address?
Addressing Modes • How instructions specify addresses of objects to access • Types constant register memory location – effective address
Lectures 05-07 Pipelining
Pipelining start executing one instruction before completing the previous one
Pipelined Laundry 3.5 Hours Time Observations • No speed up for individual task; e.g., A still takes 30+40+20=90 • But speed up for average task execution time; e.g., 3.5*60/4=52.5 < 30+40+20=90 30 40 40 40 40 20 A Task Order B C D
MIPS Instruction • at most 5 clock cycles per instruction • IF ID EX MEM WB
MIPS Instruction IF ID EX MEM WB IR ← Mem[PC]; NPC ← PC + 4;