Loading in 5 sec....

Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE)PowerPoint Presentation

Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE)

- 79 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE)' - elliott-boyle

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms

Kyungtae Han

Ph.D. Defense

Committee Members:

Prof. Ross Baldick (Dept. of ECE)

Prof. Brian L. Evans (Dept. of ECE), advisor

Prof. Margarida F. Jacome (Dept. of ECE)

Prof. Earl E. Swartzlander (Dept. of ECE)

Prof. Robert A. van de Geijn (Dept. of CS)

Computer Engineering Curriculum Track

Dept. of Electrical and Computer Engineering

The University of Texas at Austin

May 9th, 2006

Outline Point for Implementing

- Introduction
- Background
- Contributions
- Optimize fixed-point wordlengths
- Reduce power consumption in arithmetic
- Automate transformations of systems

- Conclusion

L Point for Implementing

H

Introduction

Implementing Digital Signal Processing AlgorithmsHardware

Price

Power*

Floating-

Point

Processor

$

Floating-Point Program

Code

Conversion

Digital Signal

Processing

Algorithms

Fixed-

Point

Processor

Fixed Point

(Uniform Wordlength)

$

Wordlength

Optimization

L

H

Fixed-

Point

ASIC

Fixed Point

(Optimized Wordlength)

$

L

H

ASIC: Application Specific Integrated Circuit

* Power consumption

Introduction Point for Implementing

Transformations to Fixed Point- Advantages
- Lower hardware complexity
- Lower power consumption
- Faster speed in processing

- Disadvantages
- Introduces distortion due toquantization error
- Search for optimum wordlengthby trial & error is time-consuming

- Research goals
- Automate transformations to fixed point
- Control distortion vs. complexity tradeoffs

Floating-Point Program

Code

Conversion

Transformation

Wordlength

Optimization

Fixed Point

(Optimized Wordlength)

Outline Point for Implementing

- Introduction
- Background
- Contributions
- Optimize fixed-point wordlengths
- Reduce power consumption in arithmetic
- Automate transformations of systems

- Conclusion

Wordlength Point for Implementing

S

X

X

X

X

X

Integer

wordlength

Fractional

wordlength

(Binary point)

Background

Fixed-Point Data Format- Integer wordlength (IWL)
- Number of bits assigned to integer representation

- Fractional wordlength (FWL)
- Number of bits assigned to fraction

- Wordlength (WL)

SystemC format

www.systemc.org

π = 3.14159…(10) [Floating Point]

3.140625(10) = 011.001001(2)[WL=9; IWL=3; FWL=6] 3.141479492(10) = 011.00100100001110(2)[WL=16; IWL=3; FWL=13]

Background Point for Implementing

Distortion vs. Complexity Tradeoffs- Shorter wordlength may increase application distortion and decrease implementation complexity

Applicationdistortion d(w)

Feasible

region

Optimal

tradeoff

curve

Implementation

complexity c(w)

- Minimize implementation cost
- Minimize application distortion

Distortion constraint Point for Implementing

Complexity constraint

Background

Wordlength Optimization ConstraintsApplication-specific distortion d(w)

Application-specific distortion d(w)

Dmax

Cmax

Implementation

Complexity c(w)

Implementation

Complexity c(w)

Enforcing both constraints bounds

the search to a finite area region

Wordlengths of signals (variables) in digital system as vector

Multiple objective optimization

Background

Wordlength Optimization- Single objective optimization

Function vector

Evaluation

New Gene

Pool

Genes w/

Measure

Mutation

Selection

Mating

Child

Genes

Parental

Genes

Background

Genetic Algorithm- Evolutionary algorithm
- Inspired by Holland 1975
- Mimic processes of plant and animal evolution
- Find optimum of a complex function

[From Greg Rohling’s Ph.D Defense 2004]

: Nondominated vector

: Dominated

Background

Pareto Optimality- Pareto optimality: “best that could be achieved without disadvantaging at least one group”[Allan Schick 1970]
- Pareto optimal set is set of nondominated solutions
- E is dominated by C as all objectives for C are less than corresponding objectives for E
- Solutions A, B, C, D are nondominated (not dominated by any solution)

- Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions

Pareto Front

I

A

G

Objective 2

H

B

E

C

F

D

Objective 1

Outline vector

- Introduction
- Background
- Contributions
- Optimize fixed-point wordlengths
- Reduce power consumption in arithmetic
- Automate transformations of systems

- Conclusion

Contribution #1 vector

Search for Optimum Wordlength- Complete search
- Search whole space
- Impractical in systems with many variables

- Gradient-based search
- Utilizes gradient information to determine next candidates
- Complexity measure (CM) [Sung and Kum, 1995]
- Distortion measure (DM) [Han et al., 2001]
- Complexity-and-distortion measure (CDM) [Han and Evans, 2004]

- Guided random search
- Genetic algorithm for single objective [Leban and Tasic, 2000]
- Multiple objective genetic algorithm

Proposed

Proposed

Contribution #1 vector

Complexity-and-Distortion Measure- Weighted combination of measures
- Single objective function:
- Gradient-based search
- Initialization
- Iterative greedy search based on complexity and distortiongradient information

b vector0

x[n]

y[n]

Delay

b1

-a1

Contribution #1

Case Study: Filter Design- Infinite impulse response (IIR) filter
- Complexity measure: Area model offield-programmable gate array (FPGA)[Constantinides, Cheung, and Luk 2003]
- Distortion measure: Root mean square (RMS) error
- Seven fixed-point variables (indicated by slashes)

Contribution #1 vector

Case Study: Gradient-Based Search- CDM could lead to lower complexity and lower number of simulations compared to DM and CM

* Maximumdistortion measured by root mean square (RMS) erroris 0.1

** 167 = 268,435,456 (8.5 years, if 1second per 1 simulation)

Contribution #1 vector

Case Study: Genetic Algorithm- Search Pareto optimal set (nondominated)
- Handles multiple objectives: Error andArea

Pareto Front

22,500 simulations

45,000 simulations

9,000 simulations

100th Generation

250th Generation

500th Generation

* Population for one generation: 90

LUT: Lookup table

Contribution #1 vector

Case Study: Comparison- Superpose gradient-based search (GS) results on GA results

50th Generation (4500 simulations)

500th Generation (45000 simulations)

* Required RMSmax for gradient-based search areDmax{0.12, 0.1, 0.08}

- GS methods can get stuck in a local minimum
- GS methods reduce running time (CDM: 145 simulations)

Contribution #1 vector

Comparison of Proposed MethodsOutline vector

- Introduction
- Background
- Contributions
- Optimize fixed-point wordlengths
- Reduce power consumption in arithmetic
- Automate transformations of systems

- Conclusion

Contribution #2 vector

Lower Power Consumption in DSP- Minimize power dissipation due to limited battery power and cooling system
- Multipliers often a major source of dynamic power consumption in typical DSP applications
- Multi-precision multipliers can select smaller multipliers (8, 16 or 24 bits) to reduce power consumption
- Wordlength reduction to select any word size[Han, Evans, and Swartzlander 2004]

Proposed

Contribution #2 vector

Wordlength Reduction in Multiplication- Input data wordlength reduction
- Smaller bits enough to represent, e.g. π x π ≈ 9

- Truncation
- Signed right shift
- Move toward the least significant bit (LSB)
- Signed bit extended for arithmetic right shift

Sign bit

Contribution #2 vector

Power Reduction via Wordlength Reduction- Power dissipation
- Switching power consumption
- Static power consumption

- Switching power consumption
- Switching activity parameter, α
- Reduce α by wordlength reduction

Relationship between reduced wordlength and

switching parameter α in power consumption?

L vector bits

M bits

N bits

S

…

…

S

…

…

S

S

…

S

S

…

Contribution #2

Analytical Method- Consider stream of data for one of the multiplicands
- Compare two adjacent numbers in stream after reduction
- Expectation of bitswitching, x, withprobability Px
- L-bit input data
- Truncate input datato M bits (N bits areremoved)
- N-bit signed rightshift in L-bit input(Y is sign bit)

L vector bits

M bits

N bits

S

…

…

S

…

…

S

S

…

S

S

…

Contribution #2

Analytical MethodNo Reduction

Reduction

Wordlength (L) = 16

Contribution #2 vector

Dynamic Power Consumption for Wallace Multiplier (1 MHz)Reduction

(56%)

16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)

Truncate 1st arg

Truncate 2nd arg

(recode,nonrecode)

Truncation- First

Truncation- Second

Wallace multiplier used in TI 320C64 DSP

Contribution #2 vector

Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz)Sensitive

(13%)

Reduction

(31%)

16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)

Truncate 1st arg

Truncate 2nd arg

(recode,nonrecode)

Swapping could have benefit

Radix-4 modified Booth multiplier used in TI 320C62 DSP

Contribution #2 vector

Summary of Contribution #2- Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers
- Signed right shift exhibits no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth multipliers (for 8-bit shift)
- Power consumption in tree-based multiplier
- Highly depends on input data
- Simulation of all switching activity matches analysis of switching activity in reduced multiplicands in Wallace mult.

- Operand swapping can reduce power consumption
- In Booth multiplier, non-recoded operand 13% more sensitive in power consumption

Outline vector

- Introduction
- Background
- Contributions
- Optimize fixed-point wordlengths
- Reduce power consumption in arithmetic
- Automate transformations of systems

- Conclusion

Fixed-point tools vector

- SNU gFix, Autoscaler
- CoWare SPW HDS
- Synopsys CoCentric
- MATLAB Fixed-point toolbox
- MATLAB Fixed-point blockset
- AccelChip DSP synthesis
- Catalytic RMS, MCS

Contribution #3

Automating Transformations from Floating Point to Fixed Point- Existing fixed-point tools
- Support fixed-point simulation
- Convert floating-point code to raw fixed-point code
- Manually find optimum wordlength by trial and error

- Automating transformations
- Fully automate conversion and wordlength optimization process (Proposed)

Floating-Point

Program

Code

Conversion

Wordlength

Optimization

Wordlength-Optimized

Fixed-Point Program

Contribution #3 vector

Automatic Transformation Flow- Code generation
- Parse floating-point program
- Generate a raw fixed-point program and auxiliary programs (top, objective, cost, etc.)

- Range estimation
- Estimate range to avoid overflow (Analytical/Simulation)
- Determine integer wordlength (IWL)

- Wordlength optimization
- Optimize wordlength according to given input, and error specification (Analytical/Simulation)
- Determine fractional wordlength (FWL)

Code

Generation

Range

Estimation

Wordlength

Optimization

Contribution #3 vector

Code Generation for Fixed-Point Program- Adder function in MATLAB

Function [c] = adder_fx(a, b)

c = 0;

a = fi (a, 1,32,16);

b = fi (b, 1,32,16);

c = fi (c, 1,32,16);

c(:) = a + b;

Function [c] = adder(a, b)

c = 0;

c = a + b;

Determined by designers

with trial and error

(a) Floating point program for adder

(b) Raw fixed-point program

Function [c] = adder_fx(a, b, numtype)

c = 0;

a = fi (a, numtype.a);

b = fi (b, numtype.b);

c = fi (c, numtype.c);

c(:) = a + b;

WL

S

FWL

fi(a, S,WL,FWL) is a constructor

function for a fixed-point object in

fixed-point toolbox [S: Signed, WL:

Wordlength, FWL: Fraction length]

(c) Converted fixed-point program for

automating optimization (Proposed)

Contribution #3 vector

Automating Transformation Environment for Wordlength OptimizationInput Data

Top

Program

Floating-Point

Program

Optimum Wordlength

Evaluation

Program

(Objectives)

Search

Engine

Fixed-Point

Program

Gradient-based or Genetic algorithm

Range

Estimation

Complexity

Estimation

Error

Estimation

- Given floating-point program and options,
- auxiliary programs are automatically generated
- Given input data, optimum wordlength is searched

Contribution #3 vector

Demo of Released SoftwareConclusion vector

Conclusion- Search for optimum wordlength
- Gradient-based search reduces execution time with complexity-and-distortion measure method while solutions could be trappedin local optimum
- Genetic algorithm can finddistortion vs. complexity tradeoff curve, but it requires longer execution time

- Reduce power consumption by data wordlength reduction of multiplicands
- Automate transformations from floating-point programs to fixed-point programs
- Free software release is available at
www.ece.utexas.edu/~bevans/projects/wordlength/converter/

- Free software release is available at

End vector

Thank you!

Download Presentation

Connecting to Server..