cse477 vlsi digital circuits fall 2002 lecture 21 multiplier design l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design PowerPoint Presentation
Download Presentation
CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design

Loading in 2 Seconds...

play fullscreen
1 / 37

CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design - PowerPoint PPT Presentation


  • 544 Views
  • Uploaded on

CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design. Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits , © 2002, J. Rabaey et al.]. Review: Basic Building Blocks. Datapath Execution units

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cse477 vlsi digital circuits fall 2002 lecture 21 multiplier design

CSE477VLSI Digital CircuitsFall 2002 Lecture 21: Multiplier Design

Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477

[Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

review basic building blocks
Review: Basic Building Blocks
  • Datapath
    • Execution units
      • Adder, multiplier, divider, shifter, etc.
    • Register file and pipeline registers
    • Multiplexers, decoders
  • Control
    • Finite state machines (PLA, ROM, random logic)
  • Interconnect
    • Switches, arbiters, buses
  • Memory
    • Caches (SRAMs), TLBs, DRAMs, buffers
review binary adder landscape

T = O(N)

A = O(N)

T = O(log N)

A = O(N log N)

Review: Binary Adder Landscape

synchronous word parallel adders

ripple carry adders (RCA) carry prop min adders

signed-digit fast carry prop residue adders adders adders

Manchestercarry parallel conditional carry carry chain select prefixsum skip

T = O(N), A = O(N)

T = O(1), A = O(N)

T = O(N),

A = O(N)

multiply operation

can be formed in parallel

Multiply Operation
  • Multiplication as repeated additions

N

multiplicand

multiplier

partial

product

array

N

double precision product

2N

shift add multiplication
Shift & Add Multiplication
  • Right shift and add
    • Partial product array rows are accumulated from top to bottom on an N-bit adder
    • After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
    • Time for N bits Tserial_mult = O(NTadder) = O(N2) for a RCA
  • Making it faster
    • Use a faster adder
    • Use higher radix (e.g., base 4) multiplication
      • Use multiplier recoding to simplify multiple formation
    • Form partial product array in parallel and add it in parallel
  • Making it smaller (i.e., slower)
    • Use an array multiplier
      • Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI
      • Can be easily and efficiently pipelined
tree multiplier structure

0

D

Q (‘ier)

0

D

0

D

multiple forming circuits

0

D (‘icand)

partial product

array reduction tree

fast carry propagate adder (CPA)

P (product)

Tree Multiplier Structure

mux

+

reduction

tree (log N)

+

CPA (log N)

4 2 counter
(4,2) Counter
  • Built out of two (3,2) counters (just FA’s!)
    • all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)
    • the internal output is carried to the next higher weight position (indicated by the )

(3,2)

Note: Two carry outs - one “internal” and one “external”

(3,2)

tiling 4 2 counters
Tiling (4,2) Counters
  • Reduces columns four high to columns only two high
    • Tiles with neighboring (4,2) counters
    • Internal carry in at same “level” (i.e., bit position weight) as the internal carry out

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

4x4 partial product array reduction
4x4 Partial Product Array Reduction
  • Fast 4x4 multiplication using (4,2) counters

multiplicand

multiplier

partial

product

array

reduced pp array (to CPA)

double precision product

8x8 partial product array reduction
8x8 Partial Product Array Reduction

‘icand

How many (4,2) counters

minimum are needed to reduce it to 2 rows?

‘ier

partial

product

array

Answer: 24

reduced partial product array

alternate 8x8 partial product array reduction
Alternate 8x8 Partial Product Array Reduction

‘icand

More (4,2) counters, so what is the advantage?

‘ier

partial

product

array

reduced partial product array

array reduction layout approach

multiplicand

. . .

multiple generators

(4,2) counter slice

2

multiple selection signals

(‘ier)

(4,2) counter slice

(4,2) counter slice

CPA

Array Reduction Layout Approach
next lecture and reminders
Next Lecture and Reminders
  • Next lecture
    • Shifters, decoders, and multiplexers
      • Reading assignment – Rabaey, et al, 11.5-11.6
  • Reminders
    • Project final reports due December 5th
    • HW5 (last one!) due November 19th
    • Final grading negotiations/correction (except for the final exam) must be concluded by December 10th
    • Final exam scheduled
      • Monday, December 16th from 10:10 to noon in 118 and 121 Thomas
topics
Topics
  • Adders and ALUs (§6.4, §6.5)
  • Multipliers (§6.6)
    • Array multiplier
    • Baugh-Wooley multiplier
    • Booth encoding
    • Wallace tree multiplier
  • Subsystem design principles (§6.2)
elementary school algorithm
Elementary School Algorithm

0 1 1 0 multiplicand

× 1 0 0 1 multiplier

0 1 1 0

+ 0 0 0 0

0 0 1 1 0

+ 0 0 0 0

0 0 0 1 1 0

+ 0 1 1 0

0 1 1 0 1 1 0

partial products

combinational multiplier
Combinational Multiplier

bit of multiplier controls whether addition occurs

array multiplier
Array Multiplier
  • Regular layout
    • An n × m cell layout
    • Easy to be pipelined
    • Used frequently in FPGA and ASICs
  • Critical path
    • Less than (n+m-1) bit adder delay
  • Handles unsigned multiplication ONLY
a 4 4 unsigned array multiplier
A 4 × 4 Unsigned Array Multiplier

skew array

for rectangular

layout

unsigned array multiplier

x3y0

x2y0

x1y0

x0y0

0

0

0

P0

Cin

a

x2y1

x1y1

x0y1

+

+

+

x3y1

b

P1

+

x2y2

x1y2

x0y2

x3y2

+

+

+

P2

Cout

Sum

x2y3

x1y3

x0y3

+

+

+

x3y3

P3

0

+

+

+

P7

P6

P5

P4

Unsigned Array Multiplier
signed multiplication
Signed Multiplication
  • Signed number representation
  • Signed n×n multiplication
    • (1110)2 × (0011)2 = (1010)2 (-2) × 3 = (-6)
    • No difference from unsigned multiplication if the result has the same bit-width as the input
  • But what if we want the result to be 2n bit?
    • Use sign-bit extension
    • Needs 2n × 2n array multiplier
baugh wooley multiplier structure

Cin

a

x3y0

x2y0

x1y0

x0y0

0

0

0

b

+

P0

x2y1

x1y1

x0y1

+

+

+

x3y1

Cout

Sum

P1

x2y2

x1y2

x0y2

y3

x3

x3y2

+

+

+

P2

x3y3

x2y3

x1y3

x0y3

x3

+

+

+

+

1

y3

+

+

+

+

+

P3

P6

P5

P4

P7

Baugh-Wooley Multiplier: Structure
booth multiplier
Booth Multiplier
  • Utilize Booth encoding scheme
  • Booth encoding scheme
    • Handles signed multiplication
    • Reduce the number of partial products by half
    • Small area and fast
    • Encoding scheme cannot be applied hierarchically
      • Often used as the first stage partial products reduction
booth encoding principle
Booth Encoding: Principle
  • Two’s-complement form of multiplier y
  • Consider first two terms
    • By looking at three bits of y, we can determine whether to add x, 2x to partial product.
booth actions
Booth Actions

yi yi-1 yi-2 increment

0 0 0 0

0 0 1 X

0 1 0 X

0 1 1 2X

1 0 0 -2X

1 0 1 -X

1 1 0 -X

1 1 1 0

booth example
Booth Example
  • Don’t forget the sign extension of the encoded value when add them together
    • Only have to extend 2 bits though
  • x = 011001 (2510), y = 101110 (-1810).
  • y1y0y-1 = 100, P1 = P0 - (10  011001) = 11111001110
  • y3y2y1= 111, P2 = P1 0 = 11111001110.
  • y5y4y3= 101, P3 = P2 - 0110010000 = 11000111110.
wallace tree
Wallace Tree
  • Reduces the number of partial products
  • Built from carry-save adders:
    • Three inputs: a, b, c
    • Two outputs: y, z such that y + z = a + b + c
  • Carry-save equations:
    • yi = ai bi ci
    • zi+1 = aibi + bici + ciai
    • What’s the difference from carry-ripple adder?
wallace tree structure

a1

c1

a2

a0

c2

b1

c0

b2

b0

carry-ripple adder

FA

FA

FA

s2

s0

s1

a1

c1

a2

a0

c2

b1

c0

b2

b0

carry-save adder

FA

FA

FA

y2

z3

y1

z2

y0

z1

Wallace Tree Structure
wallace tree operation
Wallace Tree Operation
  • n additions are reduced to (2n/3) additions after each level
    • Sum of inputs = Sum of outputs
    • Can apply the reduction hierarchically
    • More efficient design uses 4-2 adders to reduce n additions to (n/2) additions after each level
  • Need final adder to add the last two numbers
a booth wallace tree multiplier

Booth encoders

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

Wallace tree level 1

4-2 adder array

4-2 adder array

4-2 adder array

4-2 adder array

FF

Wallace tree level 2

4-2 adder array

4-2 adder array

FF

Wallace tree level 3

4-2 adder array

FF

Wallace tree level 4

3-2 adder array

Final Adder

64-bit adder

(

not part of pipeline)

A Booth-Wallace Tree Multiplier

Most commonly used high-performance multiplier

topics31
Topics
  • Adders and ALUs (§6.4, §6.5)
  • Multipliers (§6.6)
  • Subsystem design principles (§6.2)
pipelining
Pipelining
  • Pipelining can be used to reduce clock period at the expense of latency:

combinational

logic 1

combinational

logic 2

cycle time and latency
Cycle Time and Latency

cycle time

latency

# stages

# stages

data paths
Data Paths
  • A data path is a logical and physical structure:
    • bit-wise logical organization
    • bit-wise physical structure
  • Data paths generally use busses to pass data between function units.
bit slice organization
Bit Slice Organization

control

registers

bit n-1

shifter

ALU

bus

bit 0

data path cell design
Data Path Cell Design
  • Connections may be made by:
    • abutment, requiring stretching cells;
    • river routing, requiring a routing channel between function units.
project
Project
  • Due 10/26
    • Schematic
    • Verilog/Spectre simulation results
    • 10/27 presentation (10-15 PowerPoint slides)
  • Important (efficiency-related)
    • How to add array of instances