- 61 Views
- Uploaded on
- Presentation posted in: General

Design Tradeoffs of Approximate Analog Neural Accelerators

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Design Tradeoffs of Approximate Analog Neural Accelerators

Neural-Inspired Accelerators for Computing - January 22, 2013

Renée St. Amant, HadiEsmaeilzadeh, Adrian Sampson, ArjangHassibi, Luis Ceze, Doug Burger

- Shrinking transistors are less reliable
- Leakage, variation, noise, faults

- Precise computation more expensive
- Motivates research in approximate computing

- End of Dennardscaling
- Dark silicon motivates research in acceleration

- Approximate computing – precise results not required
- trade accuracy for energy efficiency

- Analog circuits trade accuracy for efficiency
- Emerging applications are error-tolerant
- Machine learning, gaming, sensor data processing, augmented reality, etc., etc.

- Context / Background
- Translates general-purpose, approximation-tolerant code segments to neural networks

- Analog Neural Acceleration
- Opportunity, tradeoffs, and challenges unique to analog!

- Related Work
- Conclusion

- Learning approach to accelerating approximate programs
- Goal: accelerate error-tolerant portions of general-purpose code
- Code transformation to neural network
- Accelerated execution on Neural Processing Unit (NPU)

Compute outputs for various

network topologies

Configuration of PEs,

Storage, Control

PE

Processing Element (PE)

Time multiplexed

[Esmaeilzadeh et al., MICRO’12]

- 2.3x application speedup, 3x energy reduction on average
- Ideal NPU (potential for analog): 3.4x speedup, 3.7x energy improvement on average

- Context / Background
- Analog neural acceleration
- Relevant design components
- Tradeoffs and challenges
- Preliminary design
- Preliminary results

- Analog presents opportunity for increased energy savings

Flexibility

Accuracy

Efficiency

Potential!

Efficiently and accurately

compute outputs for various

network topologies

Configuration of APEs,

Storage, Control

APE

Analog Processing Element (APE)

Digital

APE

Analog

- Analog computation is cheap!
- Conversions are expensive!
- Boundary affects flexibility
- Robustness to noise
- Fan out

Analog

Digital

Analog

Digital

Opportunity: Analog Storage

Map various topologies to one substrate

APE

APE

APE

APE

2

3

1

- Time-multiplexed vs. geometric approach
- Analog efficiency with simultaneous computation

Map various topologies to one substrate

APE

APE

APE

APE

Analog outputs

fed to next layer

2

3

1

APE

APE

APE

APE

- Time-multiplexed vs. geometrical layout
- Analog efficiency with simultaneous computation

- Fixed computation width
- Challenge: Range!
- Larger range decrease circuit accuracy
- Maximize efficient simultaneous computation, maintaining accuracy
- Row width (connections) – hardware / software accuracy tradeoff

- Represent values – inputs, weights, intermediates
- Current? voltage? Some combination? One or more wires?

- Analog computation circuits have favorites
- Signal type
- Signal range
- Affect accuracy, efficiency

- Cost of conversion and scaling vs. computation accuracy and efficiency

- Number of bits of inputs, weights, outputs
- Implications on power
- Hardware / software accuracy
- More bits, more accuracy?

- Challenge: Range!

APE

- Context / Background
- Analog neural acceleration
- Overview
- Relevant design components
- Preliminary APE design
- Preliminary results

Weight 0

Input 0

Weight 1

Input 1

Weight 7

Input 7

Current Steering DAC

Ibias

I+

I-

V+

Resistor Ladder (DAC)

V-

Ibias/2 + ΔI

Ibias/2 - ΔI

MUL

ADD

ADC

Output

Clock

- 8-wide APE, 5-bit inputs, 4-bit weights
- Power = 23 mW
- Error below one quantization step at 1.67 GHz

*

*

*

*

* S. Galal and M. Horowitz. Energy-efficient floating-point unit design. IEEE Trans. Comput., 60(7):913–922, 2011.

APE input bit-width has exponential effect on energy consumption

APE input bit-width and weight bit-width affect achievable accuracy

Ibias

V+

V-

I-

I+

Output Bits

1 uA 6 bits

10 uA 6 bits, 7 bits?

100 uA 8 bits

- Strategy: increase “linear” range
- Hardware / software accuracy tradeoff
- Answers at the application level

- Exponential increase in computation power for linear increase in output bits

- Context / Background
- Analog neural acceleration
- Related Work
- Conclusion

- Digital hardware techniques [PCMOS]
- Limited benefit

- Analog hardware techniques
- Lack successful integration with high-performance CPU

- Approximate programming models [EnerJ]
- ANPU is an implementation of approximate computing

- Most prior work on analog neural networks
- Small network, not designed to be fast, old technology, targets very specific applications

- More recent work
- SpiNNaker, IBM’s Cognitive Chip, ByMoore, FACETS, neuFlow
- Goal?
- Could be considered for NPU implementations

- Analog, fine-grained knobs balance flexibility, accuracy, and efficiency
- Hardware / software accuracy tradeoff
- Challenge: Watch your range!
- Work in Progress
- Circuit-level accuracy application-level accuracy
- Digital / analog boundaries and opportunity analog storage

- Open questions: Noise?
- Important with the rise of error-tolerant applications

- Suggestions? Feedback?
- [email protected]
- Thank you!