Loading in 2 Seconds...
Loading in 2 Seconds...
Wireless Networking and Communications Group. Reducing Complexity in Signal Processing Algorithms for Communication Receiver and Image Display Software. Brian L. Evans Prof. Brian L. Evans. Seminar at the American University of Beirut. 27 July 2010. Outline. Embedded digital systems
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Wireless Networking and Communications Group
Brian L. Evans
Prof. Brian L. Evans
Seminar at the American University of Beirut
27 July 2010
2004
2005
2006
2007
2008
2009
2010
1200M cell phones 70M DSL modems
300M PCs 55M cars/light trucks
100M digital cameras 30M gaming consoles (2007)
100M DVD players
Inexpensive with small area and volume
Predictable offchip input/output (I/O) rates
“Low” power (TI C5504 45mW @ 300MHz)
Limited onchip memory
Fixedpoint arithmetic
External I/O: block data transfers to/from onchip memory
Internal I/O: onchip memory to CPU registers using data buses (e.g. TI C6000 processor has two 32bit data buses)
64bit floatingpoint for desktop computing (e.g. Matlab)
32bit floatingpoint for proaudio and sonar beamforming
16bit fixedpoint for speech, consumer audio, image proc.
Handles many special cases (e.g. +∞, ∞ and not a number)
Add, multiply, divide have comparable hardware complexity
Multiplicationbased on addition operations
Division takes 12instructions perbit of accuracy
Multiplication canconsume muchdynamic power
56%
Multiplier used in TI C64 processors
[Han, Evans & Swartzlander, 2005]
Discretetime fixed frequency 0 = 2 f0 / fs
Example: f0 = 1200 Hz and fs = 8000 Hz, 0 = 3/10
Discretetime realization drops fs term in front of cosine
Uses doubleprecision floatingpoint arithmetic
No standard in C for internal implementation
Generally meant for highaccuracy desktop calculations
20 multiply, 30 add, 2 divide, 2 power calculations/output
y[n] = (2 cos 0) y[n1]  y[n2] + x[n]  (cos 0) x[n1]
From inverse ztransform of ztransform of cos(0n) u[n]
Impulse response gives cos(0n) u[n]
2 multiplications and 3 adds per output value
Buildup in error as n increases due to feedback
Discretetime frequency 0 = 2 f0 / fs = 2 N / L
All common factors between integers N and L removed
= 2 k = 2 (N / L) n → n = L → store L samples
Entries in either floatingpoint or fixedpoint format
Table would contain N periods of the cosine
Initial conditions are all zero
MAC MultiplicationaccumulationRAM Random Access Memory (writeable) ROM ReadOnly Memory
x[k1]
x[k]
z1
z1
…
z1
…
h[0]
h[1]
h[2]
h[M1]
S
y[k]
Discretetime convolution
x[k]
b0
y[k]
UnitDelay
a1
b1
v[k1]
UnitDelay
v[k2]
a2
b2
DiscreteTime FiltersBiquad building block: 2 poles and 02 zeros
Generally, coefficients a1, a2, b0, b1, b2 are realvalued
Biquad is short for biquadratic− transfer function is ratio of two quadratic polynomials
(1) For same piecewise constant magnitude specification(2) Algorithm to estimate minimum order for ParksMcClellan algorithm by Kaiser may be off by 10%. Search for minimum order is often needed.(3) Algorithms can tune design to implementation target to minimize risk
Polynomial deflation (rooting) reliable in floatingpoint
Polynomial inflation (expansion) may degrade roots
Direct form IIR structures expand zeros and poles, and may become unstable for large order filters (order > 12)
Cascade of biquads expands zeros and poles in each biquad
Efficiency depends on target implementation
Consider poweroftwo coefficient design
Efficient designs may require search of ∞ design space
Nonlinear distortion, e.g. amplitude nonlinearities
Linear distortion, e.g. convolution by channel impulse response
Additive noise, e.g. thermal (Gaussian) and impulsive
Spreading/attenuation in time
Magnitude/phase distortion in frequency
Received bit stream
Message bit stream
Transmitter
Channel
Receiver
Equalizer
Baseband transmission based on fast Fourier transform (FFT)
Each subchannel carries singlecarrier transmission
Standardized for digital subscriber line (DSL) communication
channel
carrier
magnitude
subchannel
frequency
Subchannels are 4.3 kHz wide in DSL systems
nk
Channel
Equalizer
Shortens channelimpulse response(time domain eq.)
Compensates phase/magnitude distortion(freq. domain eq.)
FIR filter w performs time and frequency domain equalization
Time domain equalizer (w) then FFT & freq. domain equalizer
yk
xk
rk
ek
w
h
+
+
+
Training signal

Ideal Channel
Receiver generates xk
g
z
Discretized Baseband System
Equalization in DSL receivers increases bit rate by 10x
Minimize energy leakage outside shortened channel length
For each position of window [Melsa, Younce & Rohrs, 1996]
Computationallyintensive: O(Lw3)
Floatingpoint multiplications/divisions
Restricts TEQ length to be less than n+1
n+1 samples
channel impulse response
effective channel impulse response
Bit Rate (Mbps)
TEQ length of 17
Data rates averaged over eight standard DSL test lines
[Martin et al., 2006]
Training complexity in log10(multiplyadd operations)
Most efficient floatingpoint versions of algorithms used
A and B are square (LwLw) and depend on choice of
Constraint prevents trivial nonpractical solution w = 0
Formulation
Power method
Alternating
Lagrangian
Iterative Methods
divisionfree
20 iterations to converge for 17tap MSSNR TEQ design
Threshold at MidGray
x(m)
b(m)
Digital Image HalftoningGrayscale: 8bit image to 1bit image
Color: 24bit RGB image to 12bit RGB display
Each pixel in original image is 8bit unsigned intensity in [0, 255]
For display, 0 is black and 255 is white
Feedback quantization error
For constant input 1001 = 9
Average output value
¼ (10+10+10+11) = 1001
4bit resolution at DC !
Noise shaping
Truncating from 4 to 2 bits increases noise by ~12dB
Feedback removes noise at DC & increases HF noise
Inputsignal
words
4
2
Todisplaydevice
2
2
1 sample
delay
Quantization with FeedbackAdder Inputs OutputTime Upper Lower Sum to display
1 1001 00 1001 10
2 1001 01 1010 10
3 1001 10 1011 10
4 1001 11 1100 11
Added noise
12 dB
(2 bits)
Periodic
f
7/16
3/16
5/16
1/16
Halftone Spectrum
Halftone
Error Diffusion Halftoningdifference
threshold
u(m)
x(m)
b(m)
current pixel
_
+
_
+
e(m)
[Floyd & Steinberg, 1976]
compute error
shape error
error filter weights
Thresholds input to black (0) or white (255)
Flip quantized value about midgray (128)
Reduces false textures in midgrays
Implemented with two comparisons
DBF(x)
255
x1
128
x2
x
Signal transfer function models sharpening
Ks ≈ 2 for FloydSteinberg
Noise transfer function models noiseshaping
Kn = 1
Ks = 2
2
1
w
w
w1
w1
w1
w1
Pass high frequency noise
Pass low and enhance high frequencies
Plots for ideal lowpass H()
Scale image by gain L and add it to quantizer input
L
b(m)
u(m)
x(m)
_
+
_
+
e(m)
Decrease data sizes to reduce onchip memory usage and increase data bus efficiency
Truncate multiplicand constants to reduce power
Keep offline design results in full precision until end
Order of calculations matters in implementation
Exploit problem structure in developing fixedpoint algorithms
Linearize nonlinear systems to leverage linear system methods
Tomorrow (Wednesday) 1:30 – 2:30 pm in this room (RCR)
Panelists: Prof. Zaher Dawy (AUB), Prof. Imad ElHajj (AUB) and Prof. Brian Evans (UT Austin)
Early October 2011
Short walk from the AUB campus
Organizers include Prof. Magdy Bayoumi (Univ. of Louisiana at Lafayette), Prof. Brian Evans (UT Austin), Dean Ibrahim Hajj (AUB) and Prof. Mohammad Mansour (AUB)
Share
Digital Signal ProcessorsDSP Processor Market
~1/3 of $25B embedded digital signal processing market
2007 cholesterol loweringPzifer Lipitor sales: $13B
Source: Forward Concepts
Source: Forward Concepts
Periodic application leads to aliasing (gridding effect)
Clustered dot screening is more resistant to ink spread
Dispersed dot screening has higher spatial resolution
Blue larger masks (e.g. 1” by 1”)
Clustered dot mask
Dispersed dot mask
index
Threshold Lookup Table
Linear gain model for quantizer in 1D [Ardalan and Paulos, 1988]
Linear gain model for grayscale image [Kite, Evans, Bovik, 1997]
Signal transfer function (STF): quantizer acts as scalar gain
Noise transfer function (NTF): quantizer acts as additive noise
{
us(m)
Ks us(m)
Signal Path
u(m)
b(m)
n(m)
un(m)
un(m) + n(m)
Noise Path
Threshold at MidGray
Dispersed Dot Screening
Clustered DotScreening
Stucki Error
Diffusion
Floyd SteinbergError Diffusion
Spatial DomainThreshold at MidGray
Original Image
Clustered DotScreening
Stucki Error
Diffusion
Floyd SteinbergError Diffusion
Magnitude SpectraBandpass: nondimbackgrounds[Manos & Sakrison, 1974; 1978]
Lowpass: highluminance officesettings with lowcontrast images[Georgeson & G. Sullivan, 1975]
Exponential decay[Näsäsen, 1984]
Modified lowpass version[e.g. J. Sullivan, Ray & Miller, 1990]
Angular dependence: cosinefunction[Sullivan, Miller & Pios, 1993]
Floyd
Stucki
Jarvis
Analysis and Modeling
barbara
2.01
3.62
3.76
boats
1.98
4.28
4.93
lena
2.09
4.49
5.32
mandrill
2.03
3.38
3.45
Average
2.03
3.94
4.37
Linear Gain Model for QuantizerStable for FloydSteinberg
Can use average value to estimate Ks from only error filter
Value of Ks: Floyd Steinberg < Stucki < Jarvis
FloydSteinberg > Stucki > Jarvis at all viewing distances