Multiplication and sum of products circuits
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Multiplication and Sum-of-Products Circuits: PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

Multiplication and Sum-of-Products Circuits:. Giving Up Simplicity To Gain Speed Steve Nuchia. In The Beginning. ???. ???. With Log Table. Strength In Numbers. Partial Products. Partial Products. Partial Products. Accumulation. 13701. 095041. 091340. 0561741. 0456700.

Download Presentation

Multiplication and Sum-of-Products Circuits:

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multiplication and sum of products circuits

Multiplication andSum-of-Products Circuits:

Giving Up Simplicity

To Gain Speed

Steve Nuchia


In the beginning

In The Beginning

???

???


With log table

With Log Table


Strength in numbers

Strength In Numbers


Multiplication and sum of products circuits

Partial Products


Multiplication and sum of products circuits

Partial Products


Multiplication and sum of products circuits

Partial Products


Multiplication and sum of products circuits

Accumulation


Multiplication and sum of products circuits

13701

095041

091340

0561741

0456700

Pairwise Summation


Multiplication and sum of products circuits

Column Counting


Binary multiplication

Binary Multiplication

  • The multiplication table is trivial (AND gate).

  • No multi-digit entries in the table, so the partial products are well-formed numbers.

  • Addition of binary numbers is hard:

    • O(1+n/10) with linear hardware

    • O(log n) with O(n2 log n) hardware.

  • Column counting is the accepted solution.

    • Wallace Trees circa 1964.


Oklobdzija stelling 1998

Oklobdzija & Stelling 1998

Continuing research in the area has led to steady improvement in the designs for Partial Product Reduction Trees (PPRTs) for parallel multipliers designs, as evidenced in the progression of work in [18], [2], [12], [10], [11], [6]. However, almost all of this prior work focused on finding good basic building blocks (compressors) that could be connected in a regular pattern to build a PPRT. ...


Multiplication and sum of products circuits

A compressor operates in a single column of the PPRT […] These compressors are made up of full adders that are interconnected in a way to minimize the compressor’s delay. In contrast, our approach is to design a faster PPRT by finding a globally optimal way of interconnecting the low-level components (adders).


The dsp filter setting

The DSP Filter Setting

  • Infinite series of data values arriving at a fixed rate.

  • Compute the convolution with a specified vector, fast enough to keep up.

  • Economic considerations often favor an FPGA (programmable gate array) solution.

  • Linear algebra sum-of-products problems are more likely to a) be floating point and b) favor a software-intensive solution.


Multiplication and sum of products circuits

At+2

At+1

At

At-1

At-2

At-3

C-1

C0

C1


Improving the standard circuit

Improving the Standard Circuit

  • The final accumulator has to be fast enough. What if it isn’t?

  • Idea: distribute the feedback through the PPRT.

  • OK, How?

  • Opportunistic feedback: whenever a full adder has fewer than three inputs, give it feedback.

  • Problem: The Supermarket Separator.

  • Solution starts with the generalized full adder.


Generalized full adder

Generalized Full Adder

  • Inputs represent data and control information.

  • Outputs represent the number of “effective” one bits among the inputs.

  • Maps directly into a Xilinx FPGA logic cell (with maximum of four inputs).

a b c d

C

S


Supermarket separator problem

Supermarket Separator Problem

1

1

0

k=1

k=0, t=1

0

1

0

1

0

1

k=q-1, t=0

0

1

1

k=q-2

a b c d

C

S


Time signatures

Time Signatures

  • To allow for feedback, need to be able to do the bookkeeping.

  • Zeros may appear on some wires as columns are reduced. To exploit this sparseness, we need to detect and manipulate it.

  • Time signature algebra: associate a vector with each wire (or bus) giving the maximum arithmetic value carried on the wire in each clock period.


Time signature constraints

Time Signature Constraints

  • The arithmetic contribution of a signal must be conserved.

  • No non-zero contribution can cross over a supermarket barrier.

  • Remark: Delaying a signal by one clock should be an identity in the algebra.


Signal origination

Signal Origination

0

0

0

Control or N/C

1

0

0

At the top of the tree,

the input data are assumed

to have time signature 1111.

1

1

0

1

1

1

a b c d

C

S

3,2,1,0


Pair splitting

Pair Splitting

a b c d

C

S

3,2,1,0

1,1,1,0

1,1,0,0

A wire can carry no more than a contribution of 1. The sum

bit may be a one if the bus carries more than zero. The carry

bit may be one if the bus carries more than one.

Note: the carry bit belongs to the next higher file.


Wire splitting

Wire Splitting

1,0,1,1

1,0,G,1

G,0,1,G

G is for Garbage. The information content (contribution) of

the wire is split but the electrical signal is not altered.


Right shift rule

Right-Shift Rule

A signal may be re-assigned to the next-lower file

if it is doubled.

This is occasionally useful when a cell would

otherwise be underutilized.


Diagonal shift rule

Diagonal Shift Rule

As long as no contribution slides across a mod q barrier,

signals can be reassigned to neighbors on the positive-slope

diagonal. The TS is given relative to the rank r, so the TS

vector must be “rotated” by the shift length s.

t0 t1 t2 t3

If t3 = G or 0.

t3t0 t1 t2


Sink rule

Sink Rule

  • When a signal contains only one active timeslot and that timeslot contains the sole representative of the lowest remaining column, that signal is sunk and is removed from consideration.

  • Sunk signals may be stored for parallel output or may be consumed as soon as they are produced, depending on the application.


Gate rule

Gate Rule

k=2

k=1

G

1

Clock-period indicator signals

are used to gate out garbage

in the generalized full adder.

1

0

G

G

1

G

a b c d

C

S

1,0,1,1

0,0,0,0


Design generation

Design Generation

  • Currently, I have a Prolog program with constraint propagation extensions that knows the algebra. It does not yet successfully generate designs.

  • The general strategy is to generate desgigns rank-by rank, under iterative deepening, until a successful (valid and complete) design is found.


Generation continued

Generation, Continued

  • Once a valid design is found, its cost will be used as an upper bound for an exhaustive search for better designs.

  • Efficiently generating candidate designs with feedback is a chicken-and-egg problem. I am using a “suspense list” of inputs not yet connected to outputs to handle this problem.


Generation continued1

Generation, Continued

  • The routines that implement the TS algebra have to “wire up” the TS rules without knowing the TS of the feedback inputs. Tricky coding problem, but under control.

  • The end game is not yet well understood. That needs more study.

  • I hope to be generating real designs soon, and to have some idea what an optimal design might look like in January.


Sign handling

Sign Handling

  • We haven’t talked about signed numbers. Signed data can be handled rather easily by this circuit, but signed coefficients require some thought.

  • The standard circuit sign-extends the partial product terms in the feedback path. To do that, you have to know the sign bit’s value!

  • I have a solution: next seminar.


Conclusions

Conclusions

  • Inventing an appropriate algebra helped me to formulate the optimization problem for software solution and gives me confidence that the resulting designs are correct.

  • Optimality, of course, is a different problem.

  • The range of applicability of this circuit is not very broad: it is best suited for FPGA realization near the maximum clock speed of the logic family.


  • Login