CPUs
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

CPUs PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

CPUs. CPU performance CPU power consumption. Elements of CPU performance. Cycle time. CPU pipeline. Memory system. Pipelining. Several instructions are executed simultaneously at different stages of completion. Various conditions can cause pipeline bubbles that reduce utilization:

Download Presentation

CPUs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cpus

CPUs

  • CPU performance

  • CPU power consumption.

Overheads for Computers as Components 2nd ed.


Elements of cpu performance

Elements of CPU performance

  • Cycle time.

  • CPU pipeline.

  • Memory system.

Overheads for Computers as Components 2nd ed.


Pipelining

Pipelining

  • Several instructions are executed simultaneously at different stages of completion.

  • Various conditions can cause pipeline bubbles that reduce utilization:

    • branches;

    • memory system delays;

    • etc.

Overheads for Computers as Components 2nd ed.


Performance measures

Performance measures

  • Latency: time it takes for an instruction to get through the pipeline.

  • Throughput: number of instructions executed per time period.

  • Pipelining increases throughput without reducing latency.

Overheads for Computers as Components 2nd ed.


Arm7 pipeline

ARM7 pipeline

  • ARM 7 has 3-stage pipe:

    • fetch instruction from memory;

    • decode opcode and operands;

    • execute.

Overheads for Computers as Components 2nd ed.


Arm pipeline execution

decode

execute

fetch

decode

execute

fetch

decode

ARM pipeline execution

add r0,r1,#5

fetch

sub r2,r3,r6

execute

cmp r2,#3

time

1

2

3

Overheads for Computers as Components 2nd ed.


Pipeline stalls

Pipeline stalls

  • If every step cannot be completed in the same amount of time, pipeline stalls.

  • Bubbles introduced by stall increase latency, reduce throughput.

Overheads for Computers as Components 2nd ed.


Arm multi cycle ldmia instruction

ARM multi-cycle LDMIA instruction

ldmia

r0,{r2,r3}

fetch

decode

ex ld r2

ex ld r3

sub

r2,r3,r6

fetch

decode

ex sub

cmp

r2,#3

fetch

decode

ex cmp

time

Overheads for Computers as Components 2nd ed.


Control stalls

Control stalls

  • Branches often introduce stalls (branch penalty).

    • Stall time may depend on whether branch is taken.

  • May have to squash instructions that already started executing.

  • Don’t know what to fetch until condition is evaluated.

Overheads for Computers as Components 2nd ed.


Arm pipelined branch

bne foo

fetch

decode

ex bne

ex bne

ex bne

sub

r2,r3,r6

fetch

decode

foo add

r0,r1,r2

fetch

decode

ex add

ARM pipelined branch

time

Overheads for Computers as Components 2nd ed.


Delayed branch

Delayed branch

  • To increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not.

  • SHARC supports delayed and non-delayed branches.

    • Specified by bit in branch instruction.

    • 2 instruction branch delay slot.

Overheads for Computers as Components 2nd ed.


Example arm execution time

Example: ARM execution time

  • Determine execution time of FIR filter:

    for (i=0; i<N; i++)

    f = f + c[i]*x[i];

  • Only branch in loop test may take more than one cycle.

    • BLT loop takes 1 cycle best case, 3 worst case.

Overheads for Computers as Components 2nd ed.


Fir filter arm code

FIR filter ARM code

; loop initiation code

MOV r0,#0 ; use r0 for i, set to 0

MOV r8,#0 ; use a separate index for arrays

ADR r2,N ; get address for N

LDR r1,[r2] ; get value of N

MOV r2,#0 ; use r2 for f, set to 0

ADR r3,c ; load r3 with address of base of c

ADR r5,x ; load r5 with address of base of x

; loop body

loop LDR r4,[r3,r8] ; get value of c[i]

LDR r6,[r5,r8] ; get value of x[i]

MUL r4,r4,r6 ; compute c[i]*x[i]

ADD r2,r2,r4 ; add into running sum

; update loop counter and array index

ADD r8,r8,#4 ; add one to array index

ADD r0,r0,#1 ; add 1 to i

; test for exit

CMP r0,r1

BLT loop ; if i < N, continue loop

loopend...

Overheads for Computers as Components 2nd ed.


Fir filter performance by block

FIR filter performance by block

tloop = tinit+ N(tbody + tupdate) + (N-1) ttest,worst + ttest,best

Loop test succeeds is worst case

Loop test fails is best case

Overheads for Computers as Components 2nd ed.


C55x pipeline

C55x pipeline

  • C55x has 7-stage pipe:

    • fetch;

    • decode;

    • address: computes data/branch addresses;

    • access 1: reads data;

    • access 2: finishes data read;

    • Read stage: puts operands on internal busses;

    • execute.

Overheads for Computers as Components 2nd ed.


C55x organization

B bus

C, D busses

D bus

Instruction

fetch

Dual-multiply

coefficient

Dual operand

read

Single operand

read

Data read

from memory

Writes

C55x organization

3 data read busses

16

3 data read address busses

24

program address bus

24

program

read bus

Instruction

unit

Program

flow

unit

Address

unit

Data

unit

32

2 data write busses

16

2 data write address busses

24

Overheads for Computers as Components 2nd ed.


C55x pipeline hazards

C55x pipeline hazards

  • Processor structure:

    • Three computation units.

    • 14 operators.

  • Can perform two operations per instruction.

  • Some combinations of operators are not legal.

Overheads for Computers as Components 2nd ed.


C55x hazards

C55x hazards

  • A-unit ALU/A-unit ALU.

  • A-unit swap/A-unit swap.

  • D-unit ALU,shifter,MAC/D-unit ALU,shifter,MAC

  • D-unit shifter/D-unit shift, store

  • D-unit shift, store/D-unit shift, store

  • D-unit swap/D-unit swap

  • P-unit control/P-unit control

Overheads for Computers as Components 2nd ed.


Memory system performance

Memory system performance

  • Caches introduce indeterminacy in execution time.

    • Depends on order of execution.

  • Cache miss penalty: added time due to a cache miss.

Overheads for Computers as Components 2nd ed.


Types of cache misses

Types of cache misses

  • Compulsory miss: location has not been referenced before.

  • Conflict miss: two locations are fighting for the same block.

  • Capacity miss: working set is too large.

Overheads for Computers as Components 2nd ed.


Cpu power consumption

CPU power consumption

  • Most modern CPUs are designed with power consumption in mind to some degree.

  • Power vs. energy:

    • heat depends on power consumption;

    • battery life depends on energy consumption.

Overheads for Computers as Components 2nd ed.


Cmos power consumption

CMOS power consumption

  • Voltage drops: power consumption proportional to V2.

  • Toggling: more activity means more power.

  • Leakage: basic circuit characteristics; can be eliminated by disconnecting power.

Overheads for Computers as Components 2nd ed.


Cpu power saving strategies

CPU power-saving strategies

  • Reduce power supply voltage.

  • Run at lower clock frequency.

  • Disable function units with control signals when not in use.

  • Disconnect parts from power supply when not in use.

Overheads for Computers as Components 2nd ed.


C55x low power features

C55x low power features

  • Parallel execution units---longer idle shutdown times.

  • Multiple data widths:

    • 16-bit ALU vs. 40-bit ALU.

  • Instruction caches minimizes main memory accesses.

  • Power management:

    • Function unit idle detection.

    • Memory idle detection.

    • User-configurable IDLE domains allow programmer control of what hardware is shut down.

Overheads for Computers as Components 2nd ed.


Power management styles

Power management styles

  • Static power management: does not depend on CPU activity.

    • Example: user-activated power-down mode.

  • Dynamic power management: based on CPU activity.

    • Example: disabling off function units.

Overheads for Computers as Components 2nd ed.


Application powerpc 603 energy features

Application: PowerPC 603 energy features

  • Provides doze, nap, sleep modes.

  • Dynamic power management features:

    • Uses static logic.

    • Can shut down unused execution units.

    • Cache organized into subarrays to minimize amount of active circuitry.

Overheads for Computers as Components 2nd ed.


Powerpc 603 activity

PowerPC 603 activity

  • Percentage of time units are idle for SPEC integer/floating-point:

    unitSpecint92Specfp92

    D cache29%28%

    I cache29%17%

    load/store35%17%

    fixed-point38%76%

    floating-point99%30%

    system register89%97%

Overheads for Computers as Components 2nd ed.


Power down costs

Power-down costs

  • Going into a power-down mode costs:

    • time;

    • energy.

  • Must determine if going into mode is worthwhile.

  • Can model CPU power states with power state machine.

Overheads for Computers as Components 2nd ed.


Application strongarm sa 1100 power saving

Application: StrongARM SA-1100 power saving

  • Processor takes two supplies:

    • VDD is main 3.3V supply.

    • VDDX is 1.5V.

  • Three power modes:

    • Run: normal operation.

    • Idle: stops CPU clock, with logic still powered.

    • Sleep: shuts off most of chip activity; 3 steps, each about 30 ms; wakeup takes > 10 ms.

Overheads for Computers as Components 2nd ed.


Sa 1100 power state machine

SA-1100 power state machine

Prun = 400 mW

run

10 ms

160 ms

90 ms

10 ms

90 ms

idle

sleep

Pidle = 50 mW

Psleep = 0.16 mW

Overheads for Computers as Components 2nd ed.


  • Login