Design tradeoffs of approximate analog neural accelerators
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Design Tradeoffs of Approximate Analog Neural Accelerators PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on
  • Presentation posted in: General

Design Tradeoffs of Approximate Analog Neural Accelerators. Neural-Inspired Accelerators for Computing - January 22, 2013 Renée S t. Amant, Hadi Esmaeilzadeh , Adrian Sampson, Arjang Hassibi , Luis Ceze , Doug Burger. Technology Trends. Shrinking transistors are less reliable

Download Presentation

Design Tradeoffs of Approximate Analog Neural Accelerators

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Design tradeoffs of approximate analog neural accelerators

Design Tradeoffs of Approximate Analog Neural Accelerators

Neural-Inspired Accelerators for Computing - January 22, 2013

Renée St. Amant, HadiEsmaeilzadeh, Adrian Sampson, ArjangHassibi, Luis Ceze, Doug Burger


Technology trends

Technology Trends

  • Shrinking transistors are less reliable

    • Leakage, variation, noise, faults

  • Precise computation more expensive

    • Motivates research in approximate computing

  • End of Dennardscaling

    • Dark silicon motivates research in acceleration


Opportunity

Opportunity

  • Approximate computing – precise results not required

    • trade accuracy for energy efficiency

  • Analog circuits trade accuracy for efficiency

  • Emerging applications are error-tolerant

    • Machine learning, gaming, sensor data processing, augmented reality, etc., etc.


Outline

Outline

  • Context / Background

    • Translates general-purpose, approximation-tolerant code segments to neural networks

  • Analog Neural Acceleration

    • Opportunity, tradeoffs, and challenges unique to analog!

  • Related Work

  • Conclusion


Context background esmaeilzadeh et al micro 12

Context / Background [Esmaeilzadeh et al., MICRO’12]

  • Learning approach to accelerating approximate programs

    • Goal: accelerate error-tolerant portions of general-purpose code

    • Code transformation to neural network

    • Accelerated execution on Neural Processing Unit (NPU)


Neural processing unit npu

Neural Processing Unit (NPU)

Compute outputs for various

network topologies

Configuration of PEs,

Storage, Control

PE

Processing Element (PE)


Digital npu left digital pe right

Digital NPU(left), Digital PE (right)

Time multiplexed

[Esmaeilzadeh et al., MICRO’12]


Results esmaeilzadeh et al micro 12

Results [Esmaeilzadeh et al., MICRO’12]

  • 2.3x application speedup, 3x energy reduction on average

  • Ideal NPU (potential for analog): 3.4x speedup, 3.7x energy improvement on average


Outline1

Outline

  • Context / Background

  • Analog neural acceleration

    • Relevant design components

    • Tradeoffs and challenges

    • Preliminary design

    • Preliminary results

  • Related Work

  • Conclusion


  • Design space of neural processing units

    Design Space of Neural Processing Units

    • Analog presents opportunity for increased energy savings

    Flexibility

    Accuracy

    Efficiency


    Design components to balance flexibility accuracy and efficiency

    Design Components to Balance Flexibility, Accuracy, and Efficiency

    Potential!


    Analog neural processing unit anpu

    Analog Neural Processing Unit (ANPU)

    Efficiently and accurately

    compute outputs for various

    network topologies

    Configuration of APEs,

    Storage, Control

    APE

    Analog Processing Element (APE)


    Analog digital boundary

    Analog/Digital Boundary

    Digital

    APE

    Analog

    • Analog computation is cheap!

    • Conversions are expensive!

    • Boundary affects flexibility

    • Robustness to noise

    • Fan out

    Analog

    Digital

    Analog

    Digital

    Opportunity: Analog Storage


    Ape configuration

    APE Configuration

    Map various topologies to one substrate

    APE

    APE

    APE

    APE

    2

    3

    1

    • Time-multiplexed vs. geometric approach

      • Analog efficiency with simultaneous computation


    Ape configuration1

    APE Configuration

    Map various topologies to one substrate

    APE

    APE

    APE

    APE

    Analog outputs

    fed to next layer

    2

    3

    1

    APE

    APE

    APE

    APE

    • Time-multiplexed vs. geometrical layout

      • Analog efficiency with simultaneous computation

    • Fixed computation width

      • Challenge: Range!

      • Larger range decrease circuit accuracy

      • Maximize efficient simultaneous computation, maintaining accuracy

      • Row width (connections) – hardware / software accuracy tradeoff


    Value representation

    Value Representation

    • Represent values – inputs, weights, intermediates

      • Current? voltage? Some combination? One or more wires?

    • Analog computation circuits have favorites

      • Signal type

      • Signal range

      • Affect accuracy, efficiency

    • Cost of conversion and scaling vs. computation accuracy and efficiency


    Value representation bit width

    Value Representation: Bit Width

    • Number of bits of inputs, weights, outputs

    • Implications on power

    • Hardware / software accuracy

      • More bits, more accuracy?

    • Challenge: Range!

    APE


    Outline2

    Outline

    • Context / Background

    • Analog neural acceleration

      • Overview

      • Relevant design components

      • Preliminary APE design

      • Preliminary results

  • Related Work

  • Conclusion


  • Analog processing element design

    Analog Processing Element Design


    Analog processing element design1

    Analog Processing Element Design

    Weight 0

    Input 0

    Weight 1

    Input 1

    Weight 7

    Input 7

    Current Steering DAC

    Ibias

    I+

    I-

    V+

    Resistor Ladder (DAC)

    V-

    Ibias/2 + ΔI

    Ibias/2 - ΔI

    MUL

    ADD

    ADC

    Output

    Clock


    Methodology

    Methodology


    Circuit power accuracy and delay

    Circuit Power, Accuracy, and Delay

    • 8-wide APE, 5-bit inputs, 4-bit weights

    • Power = 23 mW

    • Error below one quantization step at 1.67 GHz


    Input bit width and energy

    Input Bit Width and Energy

    *

    *

    *

    *

    * S. Galal and M. Horowitz. Energy-efficient floating-point unit design. IEEE Trans. Comput., 60(7):913–922, 2011.

    APE input bit-width has exponential effect on energy consumption


    Bit width and potential accuracy

    Bit Width and Potential Accuracy

    APE input bit-width and weight bit-width affect achievable accuracy


    Range

    Range

    Ibias

    V+

    V-

    I-

    I+

    Output Bits

    1 uA 6 bits

    10 uA 6 bits, 7 bits?

    100 uA 8 bits

    • Strategy: increase “linear” range

      • Hardware / software accuracy tradeoff

      • Answers at the application level

    • Exponential increase in computation power for linear increase in output bits


    Outline3

    Outline

    • Context / Background

    • Analog neural acceleration

    • Related Work

    • Conclusion


    Related work approximate computing

    Related Work – Approximate Computing

    • Digital hardware techniques [PCMOS]

      • Limited benefit

    • Analog hardware techniques

      • Lack successful integration with high-performance CPU

    • Approximate programming models [EnerJ]

      • ANPU is an implementation of approximate computing


    Related work hardware n eural networks

    Related Work – Hardware Neural Networks

    • Most prior work on analog neural networks

      • Small network, not designed to be fast, old technology, targets very specific applications

    • More recent work

      • SpiNNaker, IBM’s Cognitive Chip, ByMoore, FACETS, neuFlow

      • Goal?

      • Could be considered for NPU implementations


    Conclusion

    Conclusion

    • Analog, fine-grained knobs balance flexibility, accuracy, and efficiency

      • Hardware / software accuracy tradeoff

      • Challenge: Watch your range!

      • Work in Progress

        • Circuit-level accuracy  application-level accuracy

        • Digital / analog boundaries and opportunity analog storage

      • Open questions: Noise?

      • Important with the rise of error-tolerant applications


    Questions

    Questions?

    • Suggestions? Feedback?

    • [email protected]

    • Thank you!


  • Login