1 / 50

Coping With the Carry Problem

Coping With the Carry Problem. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number Systems Detect the End of Propagation Rather Than Wait for Worst-case Time Asynchronous (Self-Timed) Design Speed-up Propagation Using Carry Lookahead and Other Methods

Download Presentation

Coping With the Carry Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coping With the Carry Problem • Limit Carry to Small Number of Bits • Hybrid Redundant • Residue Number Systems • Detect the End of Propagation Rather Than Wait for Worst-case Time • Asynchronous (Self-Timed) Design • Speed-up Propagation Using Carry Lookahead and Other Methods • Lookahead • Carry-skip • Ling Adder • Carry-select • Prefix Adders • Conditional Sum • Eliminate Carry Propagation Altogether • Redundant Number Systems • Signed-Digit Representations

  2. Residue Number Systems (RNS) • Convert Arithmetic on Large Numbers to Arithmetic on Small Numbers • Significant Speedup in Some Signal Processing Algorithms • Valuable Tool for Theoretical Studies of the Limits of Fast Arithmetic

  3. Residue Number Systems (RNS) • Integer System • Addition, Subtraction, Multiplication • Carry Free !!! • Division, Comparison, Sign Detection • Complex and Slow • Inconvenient For Fractional Representations • Generally Used For Special Purpose Applications • such as DSP Filters

  4. Residue Number Systems (RNS) • Radix is n-tuple of Integers (mn,mn-1,...,m1) • Not a Single Base Value • Integer X Represented by n-tuple (xn,xn-1,...,x1) • qi is Largest Integer Such That: • xiis the Residue of X mod mi

  5. RNS Example Problem Chinese Scholar, Sun Tzu wrote (1500 years ago): What number has the remainders of 2, 3 and 2 when divided by the values 7, 5 and 3 respectively? NOTATION: Sun Tzu’s Problem:

  6. Residue (Modulo) of a Number Many Examples in Chapter 4 of Text Use:

  7. Moduli Selection • Dynamic Range – Product of kRelatively Prime Moduli • Product, M, is Number of Different Representable Values in the RNS • DEFINITON • mi and mj are Relatively Prime if gcd(mi,mj) = 1 • EXAMPLE • mi = 4 and mj = 9, gcd(4,9) = 1 • Although Neither 4 Nor 9 is Prime, They are Relatively Prime

  8. RNS Representation • Consider RNS(8|7|5|3) (our default RNS in this class) • 840 Distinct Representable Values • Since • Can Represent • Any Interval of 840 Consecutive Values

  9. Example RNS ValuesRNS=(8|7|5|3)

  10. RNS Example170110RNS=(8|7|5|3)

  11. RNS Complementation • Given RNS Representation of X, -X is Obtained by Complementing Each Digit. Zero Digits are unchanged. EXAMPLE CHECK

  12. Chinese Remainder Theorem • RNS can be viewed as a weighted system. EXAMPLE

  13. RNS Encoding Efficiency • Example Requires 11 Bits mod 8 mod 7 mod 5 mod 3 • 840 Different Values Represented • 211=2048 lg2(840)=9.71411-9.714=1.3 Bits Wasted

  14. RNS Arithmetic • Addition, Subtraction, Multiplication Can be Performed with Independent Operations on Each Digit • Following Examples Show This Process • For Subtraction, Can Complement the Number and Add Also

  15. RNS Circuit Structure mod-8 unit mod-7 unit mod-5 unit mod-3 unit mod 8 mod 7 mod 5 mod 3

  16. Choosing RNS Moduli • Assume we wish to represent 100,00010 Values • Standard Binary lg2(100,000)10 = 16.609610 =17 bits • RNS(13|11|7|5|3|2), Dynamic RangeM=30,03010 • Insufficient Dynamic Range • Maximum Digit Width = 4 bits, Total = 17 bits • RNS(17|13|11|7|5|3|2), Dynamic RangeM=510,51010 • Dynamic Range 5.1 Times Too Large • Maximum Digit Width = 5 bits, Total = 22 bits • Adding More Prime Moduli is Inefficient

  17. Choosing RNS Moduli • Remove mi=5 FromRNS(17|13|11|7|5|3|2) • RNS(17|13|11|7|3|2), Dynamic RangeM=102,10210 • Still Have Relatively Prime Moduli • Maximum Digit Width = 5 bits, Total = 19 bits • 1 5-bit, 2 4-bit, 1 3-bit, 1 2-bit and 1 1-bit Modulo Units Required • Maximum Delay 5-bit Carry-Propagate • Can Combine (3,7) and (2,13) Moduli With no Speed Penalty • RNS(26|21|17|11), Dynamic RangeM=102,10210 • Maximum Digit Width = 5 bits, Total = 19 bits • 3 5-bit and 1 4-bit Modulo Units Required

  18. Relatively Prime Values • Powers of Smaller Primes are Relatively Prime • Example • gcd(32, 22) = 1 But gcd(32,3) = 3 • Can REPLACE a Modulus With its Power • Try Use Sequence of SMALLEST Valued Moduli • RNS(22 |3), Dynamic RangeM=1210 • RNS(32 |23 |7|5), Dynamic RangeM=2,52010 • RNS(11|32 |23 |7|5), Dynamic RangeM=27,72010 • RNS(13|11|32 |23 |7|5), Dynamic RangeM=360,36010 • Maximum Digit Width = 4 bits, Total = 21 bits • Dynamic Range 3.6 times that Needed

  19. Relatively Prime Values • RNS(13|11|32 |23 |7|5), Dynamic RangeM=360,36010 • Maximum Digit Width = 4 bits, Total = 21 bits • Dynamic Range 3.6 times that Needed • Reduce the Above by Factor of 3 • Replace 32 with 3 and Combine 3 and 5 to Get 15 • RNS(15|13|11 |23 |7), Dynamic RangeM=120,12010 • Maximum Digit Width = 4 bits, Total = 18 bits • Dynamic Range 1.2 times that Needed • Using This Strategy Can Generally Find the “Best” Moduli in Terms of Speed and Representation Efficiency

  20. Moduli Choice for Simple Arithmetic Unit Design • Simple Units Also Lead to Speed and Cost Benefits • Modulo-ADD,SUBTRACT, MULTIPLY Units Simple to Design if mi=2ai or 2ai-1 • Power of 2 Moduli Lead to Simple Design • Standard a-bit Binary Adder • Example: Use 16 Instead of 13 • Exception in Case of Lookup Table Implementation • Power of 2a-1 Moduli Lead to Simple Design • Standard a-bit Binary Adder with End-around Carry • Referred to as “Low-cost” Moduli

  21. RNS Low-Cost Moduli • Theorem: • A sufficient condition for 2a-1 and 2b-1 to be a relatively prime pair is that a and b are relatively prime. • Any List of Relatively Prime Numbers:ak-2> ...>a1>a0 • Can be Used as a BASIS of k-modulus RNS: RNS(2ak-2|2ak-2-1|...|2a1-1|2a0-1) • Widest Residues (Longest Carry-chain) is ak-2-bit Values

  22. Low-Cost Moduli Example • Consider the Example From EarlierX=[0,100,000] • Choosing the Moduli From Smallest to Largest: RNS(23 | 23 -1| 22 -1) Basis:3, 2 M=16810 RNS(24 | 24 -1| 23 -1) Basis:4, 3 M=168010 RNS(25 | 25 -1 | 23 -1| 22 -1) Basis:5, 3, 2 M=20,83210 RNS(25 | 25 -1 | 24 -1| 23 -1) Basis:5, 4, 3 M=104,16010 • Can’t Include 2 and 4 in Same Basis Set, gcd(2,4)=2

  23. Low-Cost Moduli Example • RNS(25 | 25 -1 | 24 -1| 23 -1) Basis:5, 4, 3 M=104,16010= RNS(32| 31 | 15| 7) • Requires 5+5+4+3=17 bits • Requires 2 5-bit, 1 4-bit and 1 3-bit Module • 4 RNS Digits • Efficiency = (100,001/104,160)=0.96004100% • Comparing With Unrestricted Moduli: RNS(25 | 25 -1 | 24 -1| 23 -1) 17 bits M=104,16010 5-bit Carry-ripple but Simpler Circuit, Fewer Digits RNS(15|13|11 |23 |7) 18 bits M=120,12010 4-bit Carry-ripple , 1 Extra Digit

  24. Encoding and Decoding • Advantages of Alternative Number Systems Must Not be Outweighed By Conversions to/from the System • Encoding From Fixed Positional System to RNS Easily Accomplished Using a Table- Lookup and Modulo Addition Circuits

  25. Encoding with Lookup Table • Conversion of Signed-Magnitude or 2’s Complement Accomplished by Converting Magnitude and Taking RNS Complement • Consider the Following Identity: • Idea is to Compute a Table of All Terms and Store in a Table for all i, j Then Add

  26. Example Lookup Table • Use Default RNS=(8|7|5|3) • For mi=8 We Can Use 3 LSbs of Value

  27. Example Encoding

  28. RNS to Mixed-Radix Form • CRT States That a Mixed-Radix Number System (MRS) is Associated with any RNS • Solves comparison, sign detection, and overflow problems • MRS is k-digit Weighted Positional Number System (mk-1|mk-2|...|m2|m1|m0) • MRS Weights are Products: (mk-2...m2m1m0, ...,m2m1m0, m1m0, m0,1) • MRS Digit Sets in Each of k Positions: [0, mk-1-1],...,[0, m2-1],[0, m1-1],[0, m0-1] • MRS Digits in Same Range as RNS Digits

  29. RNS to MRS Example • Example Position Weights MRS (8|7|5|3) (7)(5)(3)=105, (5)(3)=15, 3, 1 • (0|3|1|0)MRS(8|7|5|3) =(0)(105)+(3)(15)+(1)(3)+(0)(1)=4810 • RNS to MRS Conversion Requires Finding the zi that Correspond to the yi in:

  30. RNS to MRS Conversion • From MRS Definition we Have: • Easy to See that z0 = y0, Subtracting This Value From RNS and MRS Values Results in:

  31. RNS to MRS Conversion (cont) • Next, Divide Both Representations by m0: • Thus, if We Can Divide by m0, We Have an Iterative Approach for Conversion • Dividing y' (a Multiple of m0) by m0 is SCALING Easier Than Normal RNS Division • Accomplished by Multiplying by Muliplicative Inverse of m0

  32. Multiplicative Inverses • Multiplicative Inverse is a Value When Multiplied by Given Quantity Yields a Product of 1 • Example Multiplicative Inverses of 3 Relative tomi=8, 7, 5: • Thus, Multiplicative Inverses are 3, 5 and 2 • Can Build a Lookup Table Circuit to Store Inverses

  33. CRT LUT

  34. Multiplicative Inverses Example • Divide the Number Y'= (0|6|3|0)RNS by 3 • Accomplish Through Multiplication by (3|5|2|-)RNS

  35. RNS/MRS Conversion Example • Convert Y=(0|6|3|0)RNS to MRS z0 = y0 = 0 • Divide by 3 • Now, We Have z1=1, Subtract by 1 and Divide by 5 • This Gives z2 = 3, Subtract by 3 and Divide by 7

  36. RNS/MRS Conversion Example • Thus Y=(0|6|3|0)RNS is (0|3|1|0)MRS • Position Weights MRS (8|7|5|3) (7)(5)(3)=105, (5)(3)=15, 3, 1 • So, Y=(0|6|3|0)RNS = (0|3|1|0)MRS = (48)10

  37. RNS/MRS Conversion • Consider Conversion of (3|2|4|2)RNS from RNS(8|7|5|3) to Decimal • Need to Determine Values of (1|0|0|0)RNS, (0|1|0|0)RNS,(0|0|1|0)RNSand (0|0|0|1)RNS

  38. RNS/MRS Conversion • From Definition of RNS, Positions with 0 are Multiples of RNS(8|7|5|3) and Position with 1 are <Y>mi=1

  39. Chinese Remainder Theorem • How Did We Find w3 = (1|0|0|0)RNS= 105? • Since Digits in 7, 5, 3 Places are 0, w3 Must be a Multiple of (7)(5)(3)=105 • Must Pick the Multiple of 105 Such That its Residue With Respect to 8 is 1 • Accomplished by Multiplying 105 by its’ Multiplicative Inverse with Respect to 8 • This Process is Formalized in Chinese Remainder Theorem

  40. Chinese Remainder Theorem THEOREM: Chinese Remainder Theorem (CRT) The magnitude of an RNS number can be obtainedfrom the CRT formula: where, by definition, Mi = M/mi and i = < Mi-1>mi is the multiplicative inverse of Mi with respect to mi.

  41. Chinese Remainder Theorem • Can Avoid Multiplications in Conversion Process by Storing <Mi<iyi>mi>M in a Table • Example Table Given on page 64 of Textbook (and also in slide 33)

  42. Difficult RNS Operations • Sign Test • Magnitude Comparison • Overflow Detection • Generalized Division Suffices to discuss first three in context of being able todo magnitude comparison since they are essentially same if M is such that M=N+P+1 where the values representedare in interval [-N,P].

  43. Difficult RNS Operations • Sign Test same as Comparison with P • Overflow Detection accomplished using Signs of Operands and Results • Focus On: • Magnitude Comparison • Generalized Division

  44. Magnitude Comparison • Could Convert to Weighted Representation Using CRT • Too Complicated – too much Overhead • Use Approximate CRT Instead • Divide CRT Equality by M by Definition

  45. Approximate CRT • Addition of Terms is Modulo-1 • All mi-1<iyi>mi Are in [0,1) • Whole Part of Result Discarded and Fractional Part Kept • Much Easier than CRT Modulo-M Addition • mi-1<iyi>mi Can be Precomputed for all y and i • Use Table Lookup Circuit and Fractional Adder (ignore carry-outs)

  46. Approximate CRT LUT

  47. Magnitude Comparison Example Use approximate CRT decoding to determine the larger of the two numbers. Reading the Values from the Tables: Thus, we conclude that:

  48. Approximate CRT Error If Maximum Error in Approximate CRT Table is , then Approximate CRT Decoding Yields Scaled Value of RNS Number with Error No Greater than k Previous Example Table Entries Rounded to 4 Digits Maximum Error in Each Entry is  = 0.00005 k = 4 Digits Error is 4 = 0.0002 0.0571 - 0.0536 = 0.0035 > 4 = 0.0002, so X > Y is Safe

  49. Redundant RNS Representations • Do Not Have Restrict Digits in RNS to Set [0, mi -1] • If [0, i] Where i  mi Then RNS is Redundant • Redundant RNS Simplifies Modular Reduction Step for Each Arithmetic Operation

  50. Redundant RNS Example • Consider mod-13 with [0,15] • Redundant since: • Addition Using Pseudo-redundancies Can be Done with Two 4-bit Adders X Y Cout 00 Ignore SUM

More Related