6 alu blocks and control
This presentation is the property of its rightful owner.
Sponsored Links
1 / 76

6 ALU Blocks and Control PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Contents. 6 ALU Blocks and Control. 1. Adder 2. Multiplier 3. Datapath Generation. 1. Adder. Full Adder Boolean equation. Sum(Odd Parity). A×B× C. CARRY. A+B+C. Which is better?. Boolean Equation 1 :. Boolean Equation 2 :.

Download Presentation

6 ALU Blocks and Control

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


6 alu blocks and control

Contents

6 ALU Blocks and Control

1. Adder

2. Multiplier

3. Datapath Generation


1 adder

1. Adder

  • Full Adder

  • Boolean equation

Sum(Odd Parity)

A×B×C

CARRY

A+B+C


Which is better

Which is better?

  • Boolean Equation 1 :

  • Boolean Equation 2 :

  • CARRY evaluation is more urgent since CARRY is in the critical path

S0

S1

S2

Sn

C1

C2

Cn

Cn

ADDER

ADDER

ADDER

ADDER

C0

A0

B0

A1

B1

A2

B2

An

Bn

[ Ripple Carry Adder ]


Alternating complementary form

At Odd Stages

At Even Stages

A

B

A

C

B

CARRY

CARRY

C

A

B

A

C

B

SUM

SUM

C

Alternating Complementary Form

SUM

SUM

CARRY

CARRY


Alternating complementary form1

Alternating Complementary Form


Dynamic serial adder

Dynamic Serial Adder

A

SUM

A

S

B

CARRY

B

C

R/S

Q

D

CLOCK


Dynamic configuration

Dynamic Configuration

CARRY GATE

SUM GATE

OPTIONAL

PRECHARGE

DEVICE

CK

CK

CK

A

B

A

C

B

A

A

SUM

CK

B

OPTIONAL

PRECHARGE

DEVICE

C

B

C

S

R

CK

CK

C (CARRY)

CK

CK

R

Set/Reset

Circuit

S


Full adder truth table

Full Adder Truth Table

A

B

C

CARRY

SUM

Mutually Complement

  • Conjugate Symmetry

0

0

0

0

0

0

0

0

1

0

1

1

0

1

0

0

1

2

0

1

2

3

FC - on terms

0

1

1

1

0

3

1

0

0

0

1

7

6

5

4

FS - on terms

4

1

0

1

1

0

5

1

1

0

1

0

6

1

1

1

1

1

7


Another configuration of carry sum logic

A

1 PROPAGATE

A

B

A

B

C

1 GENERATE

B

A

C

C

CARRY

CARRY

SUM

C

A

1 GENERATE

B

1 PROPAGATE

A

B

A

B

C

A

CARRY STAGE

SUM STAGE

Another Configuration of Carry & Sum Logic


Dynamic full adder using np cmos logic style

Dynamic full adder using np CMOS logic style


Layout of the dynamic full adder

Layout of the dynamic full adder


Looking at the fa truth table

Looking at the FA Truth Table

A

B

C

CARRY

SUM

0

0

0

0

0

0

0

1

0

1

0

1

0

0

1

0

1

1

1

0

1

0

0

0

1

1

0

1

1

0

1

1

0

1

0

1

1

1

1

1


Transmission gate implementation

Transmission Gate Implementation

C

C

A  B

A B

SUM

A

B

A  B

B

A  B

A  B

CARRY

C


Cla carry lookahead adder

C0

P1

C1

G1

An

Gn

P2

C2

G2

Pn

Bn

P3

C3

G3

P4

C4

G4

CLA (Carry Lookahead Adder)


Carry bypass structure basic concept

Carry bypass structure - basic concept


N 16 bit carry bypass adder each stage m bits

(N=16)-bit carry bypass adder(each stage: M bits)

  • tp = tsetup + M * tcarry+(N/M - 1) tbypass + M*tcarry+tsum

    • tsetup : time to create G and P signals

    • tcarry : propagation delay through a single bit

    • tbypass : propagation delay through MUX

    • tsum : time to generate sum


Combining 4 domino carry lookahead blocks

Combining 4 Domino Carry Lookahead Blocks

  • Manchester Carry Chain (4-bit)

CK

G1

P1

G2

P2

G3

P3

G4

P4

P1

P2

P3

P4

C0

C1

C2

C3

C4

MANCHESTER

CARRY CHAIN

C4

C0

C4

C0

G1

G2

G3

G4

CK

C0

C1

C2

C3

C4

  • Limit @ 4 stages

    • In the worst case, 6 Series Tr.s to the ground.


Improving worst case carry prop time

Improving Worst Case Carry Prop. Time

MANCHESTER

CARRY CHAIN

C0

C4

C0

C4

CK

P1

P2

P3

P4

CK


Manchester cc adder floorplan

Dual CC Scheme

One for Carry Prop.

The other for off-loading the 1st CC from the SUM-block.

Manchester CC Adder Floorplan

C4

GP

SUM

A4

MANCHESTER

CARRY CHAIN

MANCHESTER

CARRY CHAIN

BIT 4

S4

B4

GP

SUM

A3

MANCHESTER

CARRY CHAIN

MANCHESTER

CARRY CHAIN

BIT 3

S3

B3

GP

SUM

A2

MANCHESTER

CARRY CHAIN

MANCHESTER

CARRY CHAIN

BIT 2

S2

B2

SUM

GENERATE

SUM

GENERATE

A1

MANCHESTER

CARRY CHAIN

MANCHESTER

CARRY CHAIN

BIT 1

S1

B1

C0


Csa carry select adder

0

1

1

0

C8

C80

C8

C81

C4

CSA (Carry Select Adder)

Realization of MUX with restoring logic

A4 ~ A7

B4 ~ B7

Carry Selection

C81

1

S41 ~ S71

S4~ S7

A4 ~ A7

B4 ~ B7

C80

0

S40 ~ S70

Note) Realization of MUX with pass-transistor gates

C4

C8

C80

C120

C8

C12

A0 ~ A3

B0 ~ B3

C4

C8

C4

C81

C121

C0

S0~ S3

C4

C8

S0~ S3

Vdd

Vdd - Vt

Vdd - 2Vt

Threshold voltage loss per stage


Csa carry select adder1

CSA (Carry Select Adder)

  • For carry propagation, use restoring logic in the alternative pattern

A0 ~ A3

B0 ~ B3

C0

S0~ S3

C4

C80

C81

C120

C121

C8

Number of bits for each stage

ex1) 32-bit case : 4, 4, 5, 6, 7, 6 ( or 4, 4, 5, 6, 6, 7)

ex2) 64-bit case : 4, 4, 5, 6, 7, 8, 9, 10


Minimization of carry propagation path delay

Minimization of Carry Propagation Path Delay

  • Carry Select Scheme (prepare result for each case, Cin=1, Cin=0)

  • Simplify the carry selection using the characteristic between Ci0 & Ci1

  • Take complement carries alternating the Even and Odd stages

  • Adjust each block size with the consideration to the delay of carry select logic

    • carry propagation delay of each block = = carry propagation delay to the block

adjust

eg. for 32-bit path

4

4

5

6

6

7


16 bit linear csa carry select adder

16-bit Linear CSA(Carry Select Adder)

  • tadd = tsetup + M * tcarry+ (N/M ) tmux + tsum

M: #of bits/stage

N : total # of bits


Square root csa

Square Root CSA

  • tadd = tsetup + M * tcarry+ 2N tmux + tsum

    • N = M + (M+1) + ….. + (M+P-1) = MP + P(P-1)/2 = P2/2 + P(M - 1/2 )

9 stage


Propagation delay of linear and square root csa and linear rca

Propagation Delay of Linear and Square Root CSA and linear RCA


Carry skip adder

Carry Skip Adder

  • Ripple Carry Adder와 CLA Adder의 Compromise

a15

b15

a13

b13

a3

b3

a1

b1

a14

b14

a12

b12

a2

b2

a0

b0

G12,15

G8,11

G4,7

c16

c12

c8

c4

c0

P12, 15

P8, 11

P4, 7


6 alu blocks and control

 pi’s and gi’s are computed from pi=aibi and gi = aibi

 Initially, c4, c8 and c12 are cleared

 After 4 clock cycle (at T0+4Tc), G-values are calculated as cout assuming ci=0(P-values are also calculated by then)

 At this time (at T0+4Tc), true cout in the first stage, c4 is obtained.

 After one, two and three clock cycles respectively, assuming the delay of each AOI gate as Tc true values of c8, c12 and c16 are obtained.

 Sum and cout of the last block are obtained at (T0+4Tc+2Tc+4Tc)


Comparison of carry select carry skip adder

Comparison of Carry Select & Carry Skip Adder

  • A 32-bit Carry Select Adder

Stage #

1

2

3

4

5

6

32 bit

bits/stage

4

4

5

6

7

6

inc. delay

4

1

1

1

1

1

9k2(k2=delay due to 1-bit addition or MUX)

  • A 32-bit Carry Skip Adder

Stage #

1

2

3

4

5

6

bits/stage

4

5

6

7

8

2

inc. delay

4

1

1

1

1

2

10k2


Conditional sum adder

A2

B2

A1

B1

A0

B0

S21

C31

S20

C30

S11

C21

S10

C20

S01

C11

S00

C10

C0

MPX

MPX

MPX

C3

(C1=0)

S2

(C1=1)

C3

(C1=1)

S1

(C1=1)

S2

(C1=0)

S1

(C1=0)

S0

C1

Triple 2-input MUX

S2

C3

S1

Conditional Sum Adder


Carry lookahead tree adder

a3

b3

a2

b2

a1

b1

a0

b0

ai

bi

g3

p3

g2

p2

g1

p1

g0

p0

gi

pi

G2,3

P2,3

G0,1

P0,1

Gj+1,k

Pj+1,k

Gi,j

Pi,j

G0,3

P0,3

Gi,k

Pi,k

Carry Lookahead Tree Adder

  • Previous CLA implementation is not very adequate due to fan-in, fan-out problem & irregularity, despite the small(5) number of logic levels.

    • Make it regular, using log2n - logic levels.

[ 1st Part ]


Carry lookahead tree adder1

Carry Lookahead Tree Adder

C3

C2

C1

C0

Cj+1

Ci

g2

g0

Gi,j

p2

p0

Pi,j

C2

C0

Ci

G0,1

P0,1

[ 2nd Part ]

C0

S3

a3

b3

S2

a2

b2

S1

a1

b1

S0

a0

b0

S3

ai

bi

C1

gi

pi

C3

C2

C0

Ci

Gj+1,k

Pj+1,k

Cj+1

C0

Gi,j

Pi,j

C0

Ci

Gi,k

Ci

Pi,k

[ Complete CLA Tree Adder ]


Carry save adder

Carry Save Adder

  • Ripple Carry Adder

  • Carry Lookahead Adder

  • CSA (Conditional Sum Adder)

  • CSA (Carry Select Adder)

  • CSA (Carry Skip Adder)

  • CSA (Carry Save Adder)

Carry Propagate Adder


Carry save adder1

Carry Save Adder

  • Carry Save Adder is used wherever a large number of operands have to be added.

Previous Cycle

Sum

Operand

Previous Cycle

Carry

ai

bi

ci

F.A

F.A

F.A

F.A

F.A

F.A

F.A

Carry

F/F

Sum

F/F

CSA

stages

F.A

F.A

F.A

F.A

F.A

F.A

F.A

F.A

F.A

F.A

F.A

F.A

CPA

F.A

F.A

F.A

F.A

F.A

F.A


2 multiplier

+

+

+

+

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

1

1

0

0

0

0

1

1

1

0

1

0

1

1

0

1

1

1

0

1

1

0

0

1

0

1

0

0

0

0

0

0

0

1

1

0

1

0

0

0

1

1

0

0

0

1

0

1

1

1

0

1

1

1

0

0

0

2. Multiplier

  • Add-and-Shift Algorithm

0

multiplicand

multiplier

Multiplication procedure

by Pencil-and-Paper Method

Multiplication procedure

by Add-and-Shift Algorithm


The serial parallel multiplier

The Serial-Parallel Multiplier

A

a3

a2

a1

a0

B

D

D

D

D

D

D

D

D

b2

D

b1

D

F.A

F.A

F.A

F.A

F.A

F.A

F.A

0

b0

D

D

D

D

D

D

D

D

Output


4x4 array multiplier

4x4 array multiplier


6 alu blocks and control

N(4)

  • tmult = [(M-1) + (N-1)] * tcarry + (N-1) * tsum+ tand

    • both tcarry and tsum are important

    • Sum and Carry generation time need to be similar.

M(3)


6 alu blocks and control

Carry-save Multiplier(CSM)

Rectangular floorplan of CSM


The modified booth algorithm cont

Booth Encoder Table

Booth Encoder

The Modified Booth Algorithm (cont’)

b2k+1

b2k

b2k-1

multiplied by

0

0

0

0

1

1

1

1

0

0

1

1

0

0

1

1

0

1

0

1

0

1

0

1

0

+ x

+ x

+ 2x

- 2x

- x

- x

0

b2k-1

A

= b2k b2k-1

b2k

2A

b2k+1

negative

= b2k+1


Booth multiplication example

A

X

Initial 0

Add -A

2-bit Shift

Add 2A

2-bit Shift

Add -A

+

+

+

01

11

-A

00

10

10

11

10

01

00

10

11

00

01

+2A

00

11

11

10

00

11

01

11

01

01

11

-A

00

11

11

11

10

01

11

11

10

11

11

01

01

11

11

17

-9

Operation

-153

Booth Multiplication Example


The modified booth algorithm

The Modified Booth Algorithm

  • Let’s consider a number B = (bn-1, bn-2, ... , b1, b0) written in 2’s-complement.

  • B may be rewritten as follows :

    • Example

  • In this equation, the terms in brackets is in the set {-2, -1, 0, 1, 2}

  • n-bit multiplier generates exactly n/2 partial products


Parallel multiplier

Parallel Multiplier

  • Multiplier has two basic operations

    • The generation of partial products

    • The summation of partial products

  • Parallel multiplier avoids the overhead that is due to the separate controls of these two operations

  • We speed up the multiplication

  • The gain in speed is obtained at the expense of extra hardware

  • Parallel multiplier can be implemented so as to support a high rate of pipelining


The braun multiplier

A straightforward implementation

One bit of the new partial product

( ai . bj )

One bit of the previous partial product

Carry in

In the first four rows there is no horizontal carry propagation (using carry-save adder)

a3b3

P6

a3b2

a2b3

P5

a3b1

a2b2

a1b3

P4

a3

b3

a3b0

a2b1

a1b2

a0b3

P3

a2

b2

a2b0

a1b1

a0b2

P2

a1

b1

a1b0

a0b1

P1

a0

b0

a0b0

P0

The Braun Multiplier


The braun multiplier cont

a3

a2

a1

a0

b0

0

0

0

p0

b1

F.A

F.A

F.A

p1

b2

F.A

F.A

F.A

p2

b3

F.A

F.A

F.A

p3

0

F.A

F.A

F.A

p7

p6

p5

p4

The Braun Multiplier (cont’)


Baugh wooley multiplier

Baugh-Wooley Multiplier

  • Modified in order to allow multiplication of signed number

  • Let’s consider 2 number A and B (2’s complement number)

  • The product A.B is


Baugh wooley multiplier cont

a3

a2

a1

a0

b0

p0

0

0

0

b1

F.A

F.A

F.A

p1

b2

F.A

F.A

F.A

p2

b3

F.A

F.A

F.A

F.A

a3

b3

1

F.A

F.A

F.A

F.A

F.A

p7

p6

p5

p4

p3

Baugh-Wooley Multiplier (cont’)


Wallace tree multipliers

20

20

20

20

20

20

Full Adder

Wallace n

21

20

2n

21

20

Wallace Tree Multipliers

  • Full adder vs Wallace tree

  • Useful whenever a large number of operands are to add.

  • Completion time in Braun or Baugh-Wooley multiplier

    • Using Ripple Carry Adder:

      Proportional to the twice number of n of bits

    • Using Wallace trees,

      Proportional to log2 (n)


Recursive decomposition of the multiplication

Recursive Decomposition of the Multiplication

  • Partitioning two operands

  • Four Terms (AH.BH, AH.BL, AL.BH, AL.BL) are computed using 4 p-bits multipliers

  • The results are collected through Wallace tree


Recursive decomposition of the multiplication1

AH

AL

AH

AL

BH

BL

BH

BL

ALX BL

ALX BH

AHX BL

ALX BH

AHX BH

ALX BL

AHX BH

AHX BL

AHX BL

AHX BH

ALX BL

4 X W3

4 X W3

ALX BH

Adder

Recursive Decomposition of the Multiplication

Aligning the four partial products


Booth s algorithm array multiplication

a

Pin (partial product)

H

D

cout

cin

Booth’s Algorithm Array Multiplication

  • Another approach to the design of a parallel multiplier for two’s complement operands

  • The basic cell in rows i perform an add, subtract or transfer-only

  • CASS (Controlled Add/Subtract/Shift) Cell


Booth s algorithm array multiplication cont

Xi

Xi-1

H

D

d

d

1

0

Shift

Shift

Subtract

Add

0

0

1

1

0

1

1

0

0

1

0

1

Booth’s Algorithm Array Multiplication (cont’)

0

0

0

0

a3

a2

a1

a0

H

x3

CTRL

CASS

CASS

CASS

CASS

0

D

0

H

x2

CTRL

CASS

CASS

CASS

CASS

CASS

0

D

0

H

x1

CTRL

CASS

CASS

CASS

CASS

CASS

CASS

0

D

0

H

x0

CTRL

CASS

CASS

CASS

CASS

CASS

CASS

CASS

D

0

0

P6

P5

P4

P3

P2

P1

P0


Generalized block diagram of an array multiplier

Generalized block diagram of an array multiplier


6 alu blocks and control

Q. Why use an array multiplier if it requires as many addition steps?

A1) Array multiplier is combinational circuit, where the signals flow without being clocked.

Multi-pass Array Multiplier : normally use a clock, but the cycle time for passing through k arrays is < kTc


6 alu blocks and control

A2) Some speed-up schemes are possible.

e.g. E/O array, Wallace-tree

  • Even-Odd Array


6 alu blocks and control

  • Wallace-tree Multiplier


6 alu blocks and control

  • 6 x 6 Wallace-tree Multiplier Example

  • (n : width of the Wallace tree)

    e.g. For 32-bit, number of adders necessary for each stage is

    32 - 22 - 16 - 12 - 8 - 6 - 4 - 3 - 2

    Total delay = 9 x adder delay


Datapath and its elements in bit slice organization

Datapath and its elements in bit-slice organization

3. Datapath Generation

MEMORY

CONTROL

INPUT-OUTPUT

DATAPATH


Two layout strategies for bit slice datapath

Two layout strategies for bit-slice datapath


Layout of 4 bit dp using layout strategy ii feedthrough

Layout of 4-bit DP using layout strategy II (feedthrough)


1 d placement vs 2 d placement

1-D placement vs. 2-D placement


1 d placement vs 2 d placement cont

1-D placement vs. 2-D placement(Cont’)


6 alu blocks and control

Datapath Layout Flow

  • circuit design

    • floorplan : block ordering, bus track assignment

    • schematic drawing : tr. sizing

  • layout

    • cell drawing : leaf cell layout

    • layout assemble : leaf cell integration (routing)

    • DRC / LVS : design rule check, layout vs. schematic

  • back-annotation

    • simulation with the exact capacitance

RTL description

Floorplan

Schematic Drawing

Cell Drawing

Layout Assemble

DRC / LVS

Back-Annotation

Datapath Layout


Datapath design case accent hk386

Datapath Design Case (ACCENT HK386)

  • real mode support of x86 instruction set

  • enhanced (pipelined) datapath

  • problems & practices of general DP layout


Datapath structure

Datapath structure

Segment,EA

  • 3 major blocks

    • alu, register file(32bit)

    • barallel shifter(40bit)

    • segment/effective address(32bit)

Barrel

Shifter

ALU

Register File


Track capacity

Track capacity

TRACK(6)

metal1

VSS

VDD

metal2

Control, Clock

Power

N-well

P-well

  • 6 vertical wires/track in metal 1

    • metal3 reserved for P & G routing


Power grid

Power Grid

Segment,EA

  • From bottom & left(chip edges)

  • Considering IR drop

BSH

ALU

RF


Cell structure

Cell Structure

  • Initial cell template decision

    • Nwell in the left

    • Pwell in the right

    • data flow vertical

    • control flow horizontal

    • Similar cell structure as VTI

    • Cell width

      • 80  for PMOS

      • 70  for NMOS

70

80

N-well

P-well

25

10

35

45

10

25


Cell structure1

Cell Structure

  • 모든 쎌에 power line이 통과함

  • power line width10  (2 contact)

  • power line location 25  to the insidefrom the boundary


Accent cell layout flow

Accent Cell Layout Flow

Block Spec.

  • 처음에 cap을 가정하고 시뮬레이션

  • TR sizing은 간단하게 끝냄

  • cap이 정확하지 않으니까 optimize는 필요 없고spec만 만족하면 된다고 생각함

  • 전체 assemble이 되어야 정확한 cap이 나오므로한참동안 일에서 손을 뗌

  • assemble된 다음 layout을 고치면 새로 다시assemble해야 하는데 엄청난 노가다

Schematic

SPICE


6 alu blocks and control

Cell Design(I)

  • Using 45 degree line for cell design

Control flow

Data flow


Cell design ii

Cell Design(II)

  • needless effort to reduce cell size

    • ugly poly; current crowding

Data flow


6 alu blocks and control

  • Critical path used for transistor sizing in relevant datapath element


6 alu blocks and control

Assemble

Data flow

  • Track assignment needs to be done before the cell layout (not after).


6 alu blocks and control

학점의 가치

대학 성적과 사회에서의 성공은

별로 correlation이 없는데,

이것은 사실 신기한 일이 아니다.

사회 성공의 요인과 대학성적

기준이 매우 다르니까.


6 alu blocks and control

창업의 갈림길

창업을 하려면 우선 두가지를 명확히 해야 한다.

첫째, 국내시장 만을 target으로 하든지,

세계시장 만을 target으로 하든지,

둘 중의 하나만 하라.

세계시장에의 도전은 어렵지만,

성공하면 국내시장은 저절로 따라간다.


  • Login