This presentation is the property of its rightful owner.
1 / 172

# VLSI Arithmetic Adders & Multipliers PowerPoint PPT Presentation

VLSI Arithmetic Adders & Multipliers. Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel. Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design.

The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way.

Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.

### Introduction

Computer Arithmetic

Multiplication

Division

Evaluation of Functions

Multi-Media

### Basic Operations

Computer Arithmetic

Full Adder. The full adder is the fundamental building block of most arithmetic circuits:

The sum and carry outputs are described as:

ai

bi

Full

Cout

Cin

si

Computer Arithmetic

Inputs

Outputs

ci

ai

bi

si

ci+1

0

0

0

0

0

0

0

1

1

0

0

1

0

1

0

0

1

1

0

1

1

0

0

1

0

1

0

1

0

1

1

1

0

0

1

1

1

1

1

1

Propagate

Generate

Propagate

Generate

Computer Arithmetic

Full Adder operations is defined by equations:

Carry-Propagate:

and Carry-Generate gi

One-bit adder could be implemented as shown

Computer Arithmetic

One-bit adder could be implemented more efficiently

because MUX is faster

Computer Arithmetic

Computer Arithmetic

From Rabaey

Computer Arithmetic

### Inversion Property

From Rabaey

Computer Arithmetic

### Minimize Critical Path by Reducing Inverting Stages

From Rabaey

Computer Arithmetic

Carry-Chain of an RCA implemented using multiplexer from the standard cell library:

Critical Path

Oklobdzija, ISCAS’88

Computer Arithmetic

### Manchester Carry-Chain Realization of the Carry Path

• Simple and very popular scheme for implementation of carry signal path

Computer Arithmetic

### Original Design

T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:

A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

Computer Arithmetic

### Manchester Carry Chain (CMOS)

• Implement P with pass-transistors

• Implement G with pull-up, kill (delete) with pull-down

• Use dynamic logic to reduce the complexity and speed up

Kilburn, et al, IEE Proc, 1959.

Computer Arithmetic

### Pass-Transistor Realization in DPL

Computer Arithmetic

MacSorley, Proc IRE 1/61

Lehman, Burla, IRE Trans on Comp, 12/61

Computer Arithmetic

Bypass

From Rabaey

Computer Arithmetic

### Carry-Skip Adder:N-bits, k-bits/group, r=N/k groups

Computer Arithmetic

k

Computer Arithmetic

### Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

6

5

5

4

4

3

3

D=9

1

1

Any-point-to-any-point delay = 9 D

as compared to 12 D for CSKA

Computer Arithmetic

### Carry-chain block size determination for a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Delay model:

Computer Arithmetic

### Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

Oklobdzija, Barnes, Arith’85

Computer Arithmetic

### Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

• No closed form solution for delay

• It is a dynamic programming problem

Computer Arithmetic

### Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Delay Comparison: Variable Block Adder

VBA

CLA

VBA- Multi-Level

Computer Arithmetic

## VLSI ArithmeticLecture 4

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

## Review

Lecture 3

### Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

6

5

5

4

4

3

3

D=9

1

1

Any-point-to-any-point delay = 9 D

as compared to 12 D for CSKA

Computer Arithmetic

### Carry-chain block size determination for a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Delay model:

Computer Arithmetic

### Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

Oklobdzija, Barnes, Arith’85

Computer Arithmetic

### Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

• No closed form solution for delay

• It is a dynamic programming problem

Computer Arithmetic

### Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

### Delay Comparison: Variable Block Adder

Square Root Dependency

VBA

Log Dependency

CLA

VBA- Multi-Level

Computer Arithmetic

### Circuit Issues

• Adder speed can not be estimated based on:

• logic gates in the critical path

• number of transistors in the path

• logic levels in the path

• Estimating Adders speed is much more complex and many of the “fast” schemes may be misleading you.

Computer Arithmetic

### Fan-Out Dependency

Computer Arithmetic

### Fan-In Dependency

This looks like

“Logical Effort”

(1985)

Computer Arithmetic

### Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Computer Arithmetic

Computer Arithmetic

ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who invented CLA adder in 1958)

Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

Gj

Pj

Computer Arithmetic

One gate delay D

to calculate p, g

One D to calculate

P and two for G

Three gate delays

To calculate C4(j+1)

Compare that to 8 D in RCA !

Computer Arithmetic

C16 will take a total of 5D vs. 32D for RCA !

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

## Motorola: CLA Implementation Example

A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”,

Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

### Critical path in Motorola's 64-bit CLA

4.8nS

1.05nS

1.7nS

3.75nS

2.7nS

2.0nS

2.35nS

Computer Arithmetic

### Motorola's 64-bit CLAconventional PG Block

no better situation here !

carry ripples locally

5-transistors in the path

Basically, this is MCC performance with Carry-Skip.

One should not expect any better results than VBA.

Computer Arithmetic

### Motorola's 64-bit CLAModified PG Block

Intermediate propagate signals Pi:0

are generated to speed-up C3

still critical path resembles MCC

Computer Arithmetic

3.9nS

1.8nS

2.2nS

3.55nS

2.9nS

3.2nS

### Motorola's 64-bit CLA

Computer Arithmetic

3.9nS

4.8nS

1.8nS

1.05nS

2.2nS

1.7nS

3.55nS

3.75nS

2.9nS

3.2nS

2.7nS

2.0nS

2.35nS

Computer Arithmetic

## Delay Optimized CLA

B. Lee, V. G. Oklobdzija

Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

### Delay Optimized CLA: Lee-Oklobdzija ‘91

(a.) Fixed groups and levels

(b.) variable-sized groups, fixed levels

(c.) variable-sized groups and fixed levels

(d.) variable-sized groups and levels

Computer Arithmetic

### Two-Levels of Logic Implementation of the Carry Block

Computer Arithmetic

### Two-Levels of Logic Implementation of the Carry-Lookahead Block

Computer Arithmetic

### Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)

Computer Arithmetic

### Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)

Computer Arithmetic

### Delay Optimized CLA: Lee-Oklobdzija ‘91

Delay: Three-level BCLA

Delay: Two-level BCLA

Computer Arithmetic

### Delay Optimized CLA: Lee-Oklobdzija ‘91

(a.) 2-level BCLA D=8.5nS (b.) 3-level BCLA D=8.9nS

Computer Arithmetic

IBM Journal of Research and Development, Vol.5, No.3, 1981.

Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.

ai

bi

ci+1

ci

si

### Ling’s Derivations

define:

gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1

Computer Arithmetic

### Ling’s Derivations

From:

and

because:

fundamental expansion

Now we need to derive Sum equation

Computer Arithmetic

Ling’s equations:

Variation of CLA:

Ling, IBM J. Res. Dev, 5/81

Computer Arithmetic

Ling’s equation:

Variation of CLA:

Ling uses different transfer function.

Four of those functions have desired

properties (Ling’s is one of them)

see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.

Computer Arithmetic

Conventional:

Fan-in of 5

Ling:

Fan-in of 4

Computer Arithmetic

• H16 contains 8 terms as compared to G16 that contains 15.

• H16 can be implemented with one level of logic (in ECL), while G16 can not.

(Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used)

Computer Arithmetic

## VLSI ArithmeticLecture 5

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

## Review

Lecture 4

IBM Journal of Research and Development, Vol.5, No.3, 1981.

Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.

ai

bi

ci+1

ci

si

### Ling’s Derivations

define:

gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1

Computer Arithmetic

### Ling’s Derivations

From:

and

because:

fundamental expansion

Now we need to derive Sum equation

Computer Arithmetic

Ling’s equations:

Variation of CLA:

Ling, IBM J. Res. Dev, 5/81

Computer Arithmetic

ai-1

ai

bi-1

bi

ci+1

ci-1

ci

si-1

si

Hi+1

Hi

gi, ti

gi-1, ti-1

Ling’s equation:

Variation of CLA:

Ling uses different transfer function.

Four of those functions have desired

properties (Ling’s is one of them)

see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.

Computer Arithmetic

Conventional:

Fan-in of 5

Ling:

Fan-in of 4

Computer Arithmetic

• H16 contains 8 terms as compared to G16 that contains 15.

• H16 can be implemented with one level of logic (in ECL), while G16 can not (with 8-way wire-OR).

(Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used - his IBM limitation was fan-in of 4 and wire-OR of 8)

Computer Arithmetic

### Ling: Weinberger Notes

Computer Arithmetic

### Ling: Weinberger Notes

Computer Arithmetic

### Ling: Weinberger Notes

Computer Arithmetic

• 32-bit adder used in: IBM 3033, IBM S370/ Model168, Amdahl V6.

• Implements 32-bit addition in 3 levels of logic

• Implements 32-bit AGEN: B+Index+Disp in 4 levels of logic (rather than 6)

• 5 levels of logic for 64-bit adder used in HP processor

Computer Arithmetic

### Implementation of Ling’s Adder in CMOS(S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96)

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

S. Naffziger, ISSCC’96

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

### LCS4 – Critical G Path

Computer Arithmetic

### LCS4 – Logical Effort Delay

Computer Arithmetic

### Results:

• 0.5u Technology

• Speed: 0.930 nS

• Nominal process, 80C, V=3.3V

See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96

Computer Arithmetic

from: Ercegovac-Lang

Computer Arithmetic

Following recurrence operation is defined:

(g, p)o(g’,p’)=(g+pg’, pp’)

such that:

(g0, p0)

i=0

Gi, Pi =

(gi, pi)o(Gi-1, Pi-1 )

1 ≤ i ≤ n

ci+1 = Gi

for i=0, 1, ….. n

(g-1, p-1)=(cin,cin)

c1 = g0+ p0 cin

This operation is associative, but not commutative

It can also span a range of bits (overlapping and adjacent)

Computer Arithmetic

from: Ercegovac-Lang

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Ercegovac-Lang

Computer Arithmetic

Pyramid Adder:M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic Units”, IFIP Congress, Munich, Germany, 1962.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Ercegovac-Lang

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Ercegovac-Lang

Computer Arithmetic

Computer Arithmetic

### Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>i≥j≥k

operation is idempotent: h>i≥j≥k

produces carry: cin=0

Computer Arithmetic

Exploits associativity, but not idempotency.

Produces minimal logical depth

Computer Arithmetic

Two wires at each level. Uniform, fan-in of two.

Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)

Computer Arithmetic

Exploits idempotency to limit the fan-out to 1.

Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.

Buffers needed in both cases: K-S, L-F

Computer Arithmetic

Computer Arithmetic

• Set the fan-out to one

• Avoids explosion of wires (as in K-S)

• Makes no sense in CMOS:

• fan-out = 1 limit is arbitrary and extreme

• much of the capacitive load is due to wire (anyway)

• It is more efficient to insert buffers in L-F than to use B-K scheme

Computer Arithmetic

Computer Arithmetic

• Is a hybrid synthesis of L-F and K-S

• Trades increase in logic depth for a reduction in fan-out:

• effectively a higher-radix variant of K-S.

• others do it similarly by serializing the prefix computation at the higher fan-out nodes.

• Others, similarly trade the logical depth for reduction of fan-out and wire.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Knowles

bounded by L-F and K-S at ends

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Following rules are used:

• Lateral wires at the jth level span 2j bits

• Lateral fan-out at jth level is power of 2 up to 2j

• Lateral fan-out at the jth level cannot exceed that a the (j+1)th level.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• The number of minimal depth graphs of this type is given in:

• at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

example of a new 32-bit adder [4,4,2,2,1]

Knowles 1999

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

Knowles 1999

Example of a new 32-bit adder [4,4,2,2,1]

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• Delay is given in terms of FO4 inverter delay: w.c.

(nominal case is 40-50% faster)

• K-S is the fastest

• K-S adders are wire limited (requiring 80% more area)

• The difference is less than 15% between examined schemes

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Conclusion

• Irregular, hybrid schmes are possible

• The speed-up of 15% is achieved at the cost of large wiring, hence area and power

• Circuits close in speed to K-S are available at significantly lower wiring cost

Computer Arithmetic

## VLSI ArithmeticLecture 6

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

## Review

Lecture 5

from: Ercegovac-Lang

Computer Arithmetic

Following recurrence operation is defined:

(g, p)o(g’,p’)=(g+pg’, pp’)

such that:

(g0, p0)

i=0

Gi, Pi =

(gi, pi)o(Gi-1, Pi-1 )

1 ≤ i ≤ n

ci+1 = Gi

for i=0, 1, ….. n

(g-1, p-1)=(cin,cin)

c1 = g0+ p0 cin

This operation is associative, but not commutative

It can also span a range of bits (overlapping and adjacent)

Computer Arithmetic

### Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>i≥j≥k

operation is idempotent: h>i≥j≥k

produces carry: cin=0

Computer Arithmetic

from: Ercegovac-Lang

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Ercegovac-Lang

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Ercegovac-Lang

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Ercegovac-Lang

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

Computer Arithmetic

Pyramid Adder:M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic Units”, IFIP Congress, Munich, Germany, 1962.

Computer Arithmetic

Exploits associativity, but not idempotency.

Produces minimal logical depth

Computer Arithmetic

Two wires at each level. Uniform, fan-in of two.

Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)

Computer Arithmetic

Exploits idempotency to limit the fan-out to 1.

Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.

Buffers needed in both cases: K-S, L-F

Computer Arithmetic

• Set the fan-out to one

• Avoids explosion of wires (as in K-S)

• Makes no sense in CMOS:

• fan-out = 1 limit is arbitrary and extreme

• much of the capacitive load is due to wire (anyway)

• It is more efficient to insert buffers in L-F than to use B-K scheme

Computer Arithmetic

### Two Parallel Prefix Adder Structures

Han-Carlson

Kogge-Stone

• log(bits) + 1 carry stages

• Reduced Wiring and Gates

• log(bits) carry stages

• Extra Wiring

Computer Arithmetic

• Is a hybrid synthesis of L-F and K-S

• Trades increase in logic depth for a reduction in fan-out:

• effectively a higher-radix variant of K-S.

• others do it similarly by serializing the prefix computation at the higher fan-out nodes.

• Others, similarly trade the logical depth for reduction of fan-out and wire.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

from: Knowles

bounded by L-F and K-S at ends

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Following rules are used:

• Lateral wires at the jth level span 2j bits

• Lateral fan-out at jth level is power of 2 up to 2j

• Lateral fan-out at the jth level cannot exceed that a the (j+1)th level.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• The number of minimal depth graphs of this type is given in:

• at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

example of a new 32-bit adder [4,4,2,2,1]

Knowles 1999

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilities

Knowles 1999

Example of a new 32-bit adder [4,4,2,2,1]

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

• Delay is given in terms of FO4 inverter delay: w.c.

(nominal case is 40-50% faster)

• K-S is the fastest

• K-S adders are wire limited (requiring 80% more area)

• The difference is less than 15% between examined schemes

Computer Arithmetic

### Parallel Prefix Adders: variety of possibilitiesKnowles 1999

Conclusion

• Irregular, hybrid schmes are possible

• The speed-up of 15% is achieved at the cost of large wiring, hence area and power

• Circuits close in speed to K-S are available at significantly lower wiring cost

Computer Arithmetic

### Possibilities for Further Research

• The logical depth is important (Knowles was right)

• The fan-out is less important than fan-in (Knowles was wrong):

• It is possible to examine a variety of topologies with restricted and varied fan-in.

• Driving strength and Logical Effort rules were overlooked and at least neglected:

• It is possible to create number of topologies taking LE rules into account.

• It is further possible to combine the rules with compound domino implementation taking advantage of two different rules governing “dynamic” and “static”.

• It is still possible to produce a better adder !

Computer Arithmetic

Computer Arithmetic

J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic

Computers, EC-9, p.226-231, 1960.

from: Ercegovac-Lang

Computer Arithmetic

Computer Arithmetic

from: Ercegovac-Lang

Computer Arithmetic

from: Ercegovac-Lang

Computer Arithmetic

Computer Arithmetic

O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June

1962, p.340-34

from: Ercegovac-Lang

Computer Arithmetic

Addition under assumption of Cin=0 and Cin =1.

Computer Arithmetic

### Carry Select Adder:combining two 32-b VBAs in select mode

Delay =DVBA32+ DMUX

Computer Arithmetic