Design Tradeoffs of Approximate Analog Neural Accelerators

Design Tradeoffs of Approximate Analog Neural Accelerators Neural-Inspired Accelerators for Computing - January 22, 2013 Renée St. Amant, HadiEsmaeilzadeh, Adrian Sampson, ArjangHassibi, Luis Ceze, Doug Burger

Technology Trends • Shrinking transistors are less reliable • Leakage, variation, noise, faults • Precise computation more expensive • Motivates research in approximate computing • End of Dennardscaling • Dark silicon motivates research in acceleration

Opportunity • Approximate computing – precise results not required • trade accuracy for energy efficiency • Analog circuits trade accuracy for efficiency • Emerging applications are error-tolerant • Machine learning, gaming, sensor data processing, augmented reality, etc., etc.

Outline • Context / Background • Translates general-purpose, approximation-tolerant code segments to neural networks • Analog Neural Acceleration • Opportunity, tradeoffs, and challenges unique to analog! • Related Work • Conclusion

Context / Background [Esmaeilzadeh et al., MICRO’12] • Learning approach to accelerating approximate programs • Goal: accelerate error-tolerant portions of general-purpose code • Code transformation to neural network • Accelerated execution on Neural Processing Unit (NPU)

Neural Processing Unit (NPU) Compute outputs for various network topologies Configuration of PEs, Storage, Control PE Processing Element (PE)

Digital NPU(left), Digital PE (right) Time multiplexed [Esmaeilzadeh et al., MICRO’12]

Results [Esmaeilzadeh et al., MICRO’12] • 2.3x application speedup, 3x energy reduction on average • Ideal NPU (potential for analog): 3.4x speedup, 3.7x energy improvement on average

Outline • Context / Background • Analog neural acceleration • Relevant design components • Tradeoffs and challenges • Preliminary design • Preliminary results • Related Work • Conclusion

Design Space of Neural Processing Units • Analog presents opportunity for increased energy savings Flexibility Accuracy Efficiency

Design Components to Balance Flexibility, Accuracy, and Efficiency Potential!

Analog Neural Processing Unit (ANPU) Efficiently and accurately compute outputs for various network topologies Configuration of APEs, Storage, Control APE Analog Processing Element (APE)

Analog/Digital Boundary Digital APE Analog • Analog computation is cheap! • Conversions are expensive! • Boundary affects flexibility • Robustness to noise • Fan out Analog Digital Analog Digital Opportunity: Analog Storage

APE Configuration Map various topologies to one substrate APE APE APE APE 2 3 1 • Time-multiplexed vs. geometric approach • Analog efficiency with simultaneous computation

APE Configuration Map various topologies to one substrate APE APE APE APE Analog outputs fed to next layer 2 3 1 APE APE APE APE • Time-multiplexed vs. geometrical layout • Analog efficiency with simultaneous computation • Fixed computation width • Challenge: Range! • Larger range decrease circuit accuracy • Maximize efficient simultaneous computation, maintaining accuracy • Row width (connections) – hardware / software accuracy tradeoff

Value Representation • Represent values – inputs, weights, intermediates • Current? voltage? Some combination? One or more wires? • Analog computation circuits have favorites • Signal type • Signal range • Affect accuracy, efficiency • Cost of conversion and scaling vs. computation accuracy and efficiency

Value Representation: Bit Width • Number of bits of inputs, weights, outputs • Implications on power • Hardware / software accuracy • More bits, more accuracy? • Challenge: Range! APE

Outline • Context / Background • Analog neural acceleration • Overview • Relevant design components • Preliminary APE design • Preliminary results • Related Work • Conclusion

Analog Processing Element Design

Analog Processing Element Design Weight 0 Input 0 Weight 1 Input 1 Weight 7 Input 7 Current Steering DAC Ibias I+ I- V+ Resistor Ladder (DAC) V- Ibias/2 + ΔI Ibias/2 - ΔI MUL ADD ADC Output Clock

Methodology

Circuit Power, Accuracy, and Delay • 8-wide APE, 5-bit inputs, 4-bit weights • Power = 23 mW • Error below one quantization step at 1.67 GHz

Input Bit Width and Energy * * * * * S. Galal and M. Horowitz. Energy-efficient floating-point unit design. IEEE Trans. Comput., 60(7):913–922, 2011. APE input bit-width has exponential effect on energy consumption

Bit Width and Potential Accuracy APE input bit-width and weight bit-width affect achievable accuracy

Range Ibias V+ V- I- I+ Output Bits 1 uA 6 bits 10 uA 6 bits, 7 bits? 100 uA 8 bits • Strategy: increase “linear” range • Hardware / software accuracy tradeoff • Answers at the application level • Exponential increase in computation power for linear increase in output bits

Outline • Context / Background • Analog neural acceleration • Related Work • Conclusion

Related Work – Approximate Computing • Digital hardware techniques [PCMOS] • Limited benefit • Analog hardware techniques • Lack successful integration with high-performance CPU • Approximate programming models [EnerJ] • ANPU is an implementation of approximate computing

Related Work – Hardware Neural Networks • Most prior work on analog neural networks • Small network, not designed to be fast, old technology, targets very specific applications • More recent work • SpiNNaker, IBM’s Cognitive Chip, ByMoore, FACETS, neuFlow • Goal? • Could be considered for NPU implementations

Conclusion • Analog, fine-grained knobs balance flexibility, accuracy, and efficiency • Hardware / software accuracy tradeoff • Challenge: Watch your range! • Work in Progress • Circuit-level accuracy  application-level accuracy • Digital / analog boundaries and opportunity analog storage • Open questions: Noise? • Important with the rise of error-tolerant applications

Questions? • Suggestions? Feedback? • stamant@cs.utexas.edu • Thank you!

Design Tradeoffs of Approximate Analog Neural Accelerators

Design Tradeoffs of Approximate Analog Neural Accelerators

Presentation Transcript

Design Tradeoffs in Instruction Window of Superscalar Processors

Analog VLSI Design

Electronics Analog IC Design

ANALOG IC DESIGN

Automating analog circuit design

Design Tradeoffs for SSD Performance

Neural Prosthetic Design

ANALOG IC DESIGN

Naming System Design Tradeoffs

Analog Circuit Design

Analog Integrated Circuit Design (Analog CMOS Circuit Design)

Analog VLSI Neural Circuits

Tradeoffs in Combinatorial Auction Design

Tradeoffs in Approximate Range Searching Made Simpler

Analog VLSI Design

Optimized Hybrid Scaled Neural Analog Predictor

Approximate Computing on FPGA using Neural Acceleration

Design of Digital-to-Analog Converter

Analog Integrated Circuit Design (Analog CMOS Circuit Design)