Decimal Multiplication with Efficient Partial Product Generation

Download Presentation

Decimal Multiplication with Efficient Partial Product Generation

Loading in 2 Seconds...

- 102 Views
- Uploaded on
- Presentation posted in: General

Decimal Multiplication with Efficient Partial Product Generation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Decimal Multiplication with Efficient Partial Product Generation

Mike Schulte

Dept. of Electrical & Computer Engineering

University of Wisconsin

at Madison

Mark Erle, Eric Schwarz

Server & Technology Group

IBM

- Introduction and motivation
- Decimal multiplication challenges
- Novel aspects of algorithm
- Algorithm components
- Operand recode
- Digit-by-digit multiplication
- Partial product generation
- Overlap removal & encoding
- Partial product accumulation
- Final product correction

- Summary

- Preponderance of business data in decimal form
- Inexact mapping between decimal and binary
- Decimal arithmetic used (required) in banking, finance, insurance, accounting
- Increasing support in arithmetic community (revising IEEE 754/854)
- Significant speedup achievable in hardware
- Multiplication a key function

By the way, we’re about

20% through the talk:

0.2010 = 0.00110011…2

- Greater number of multiplicand tuples
- Complicates partial product generation

- Representing decimal values with two-state devices
- Complicates partial product generation
- Complicates partial product accumulation

- Inability to use binary arithmetic techniques directly

- Recode operands
- Simplify partial product generation
- Improve latency of partial product generation

- Restrict magnitude range of partial product digits
- Simplify partial product accumulation
- Improve latency of partial product accumulation

- Generate partial products as needed, not a priori
- Benefits:
- Reduces cycles to generate tuples
- Reduces wiring to distribute tuples
- Eliminates registers needed to store tuples

- Cost can be delay during iterative portion of algorithm
- Reduce cost via pipelining
- Generate partial product in cycle i
- Accumulate partial product in cycle i+1

- Need signed-digits to restrict range
- E.g., 2 5 6 is recoded into 3 -4 -4
- aiS .elem. {-5, -4, …, 0, …, +4, +5}
- Recode in parallel all digits .ge. 5
- Four cases: ai .ge. 5 ?, ai-1 .ge. 5 ?
- Need three operations
- “Do nothing”
- Increment
- Radix complement
- Diminished radix complement

- Recode entire multiplicand, recode multiplier digit by digit
- Fig. a: single digit
- Fig. b: n-digit

- Restrict digits to yield only 16 combinations
- Magnitude: {0, …, 9} {-5, …, +5} (100)
- Absolute value: {-5, …, +5} {0, …, 5} (36)
- Zero & identity: {0, …, 5} {2, …, 5} (16)

- Lookup-table or combinatorial logic
- Product characteristics
- Absolute value sign correction
- {0, …, 25}, i.e., two digits overlap removal
- Restrict LSD to |5| signed-digit addition

- LSD magnitude restriction eases
- Overlap removal
- Partial product accumulation

- LSD mux selects:
- a0S or biS = 0
- a0S = 1
- biS = 1
- a0S and biS > 1

- MSD mux selects:
- a0S and biS < 2
- a0S and biS > 1

- Fig. a: single digit
- Fig. b: n+1 -digit

- Partial products are sign-corrected, signed-magnitude digits in overlapped form
- In each digit position
- Four-bit, signed-magnitude digit {-5, …, +5}
- Three-bit, signed-magnitude digit {-2, …, +2}

- Prepare for partial product accumulation via Svoboda signed-digit adder
- Use combinatorial circuit to
- remove the overlap
- produce Svoboda-encoded signed-digits

- Addition with signed-digits eliminates carry propagation
- Use Svoboda signed-digit adder to accumulate
- Partial product in encoded form
- Shifted intermediate product (previous iteration)

- One final product digit converted to BCD each cycle
- Four cases: IPi[0] .ge. 0 ?, IPi-1[0] .ge. 0 ?
- Need four operations
- Convert to BCD
- Convert to BCD and decrement
- Convert additive inverse to BCD and radix complement
- Convert additive inverse to BCD, radix complement, and decrement

- Algorithm utilizes restricted-range, signed digits throughout
- Original aspects include:
- Recoding operands into restricted-range, signed-digits
- Generating non-overlapping, sign-corrected partial products from recoded operands
- Recoding partial products for entry into signed-digit adder

- Algorithm achieves n+5 latency
- Extendable to floating-point multiplication