INTRODUCTION TO DIGITAL SIGNAL PROCESSORS. Accumulator architecture. Memoryregister architecture. Prof. Brian L. Evans in collaboration with Niranjan DameraVenkata and Magesh Valliappan Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 787121084
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Accumulator architecture
Memoryregister architecture
Prof. Brian L. Evans
in collaboration withNiranjan DameraVenkata andMagesh Valliappan
Embedded Signal Processing LaboratoryThe University of Texas at AustinAustin, TX 787121084
http://anchovy.ece.utexas.edu/
Loadstore architecture
Conventional DSP Architecture (con’t)
Datashifting
Time
Buffer contents
Next sample
xNK+1
xN1
xN+1
xNK+1
xN
n=N
xNK+2
xN
xNK+3
xN+1
n=N+1
xN+2
xNK+3
xN+1
xNK+4
xN+2
n=N+2
xN+3
Modulo addressing
Time
Next sample
Buffer contents
n=N
xN2
xNK+1
xN
xN+1
xN1
xNK+2
xN+2
xN2
xN+1
xN
xN
xNK+3
xN1
xNK+2
n=N+1
xN2
xN+1
xN
xN1
xN
xN+2
xNK+3
xNK+4
xNK+4
n=N+2
xN+3
Sequential(Motorola 56000)
Fetch
Decode
Read
Execute
Pipelined(Most conventional DSP processors)
Fetch
Decode
Read
Execute
Superscalar(Pentium, MIPS)
Fetch
Decode
Read
Execute
Superpipelined (CDC7600)
Fetch
Decode
Read
Execute
Fetch
Decode
Read
Execute
F
D
R
E
C
D
E
F
G
H
I
J
K

L
D
E
F
G
H
I
J
K
L
L
B
C
D
E
F
G
H
I
J
K

L
A
B
C
D
E
F
G
H
I
J
K

L
MAC X0,Y0,A X:(R0)+,X0 Y:(R4),Y0
MPYF *++AR0(1),*++AR1(IR0),R0
Fetch
Decode
Read
Execute
F
D
R
E
D
E
F
br
G


X
Y
Y
Z
CD
E
F
br



X

Y
Z
BCD
E
F
br



X

Y
Z
ABCD
E
F
br



X

Y
Z
TMS320C5x example
LAC #064h
SAMM AR2
NOP
LACC *
LAR AR2, DATA
LACC *
Fetch
Decode
Read
Execute
A key factor in the numeric performance of DSPs is the provision of special hardware to perform looping.
F
D
R
E
D
E
F
rpt
X
X
X
X
X
X
X
X
C
D
E
F
rpt


X
X
X
X
X
B
CD
E
F
rpt


X
X
X
X
ABCD
E
F
rpt


X
X
X
RPT COUNT
TBLR *+
Reorder
Load/store
FP Unit
Integer Unit
Load/store
Load/store
Address
ALU
Multiplier
Registers
I/DCache
Physical
memory
Outof
order
TLB
TLB: Translation Lookaside Buffer
Internal
memories
I Cache
Registers
External
memories
DMA Controller
DMA: Direct Memory Access
Simplified Architecture
Program RAM
Data RAM
or Cache
Addr
Internal Buses
DMA
Serial Port
Host Port
Boot Load
Timers
Pwr Down
Data
.D1
.D2
.M1
.M2
External
Memory
Sync
Async
Regs (A0A15)
Regs (B0B15)
.L1
.L2
.S1
.S2
Control Regs
CPU
TMS320C5x
TMS320C6x
ADD #0FFh add .L1 13,A1,A6
(implied) add .L1 A7,A6,A7
ADD 010h not supported
ADD * ldw .L1 *A5++[8],A1
BDTImarks: Berkeley Design Technology Inc. DSP benchmarkresults (larger means better) http://www.bdti.com/bdtimark/results.htm
http://www.ece.utexas.edu/~bevans/courses/ee382c/lectures/processors.html
z1
z1
z1
Application: FIR Filter on a TMS320C5x
Coefficients
Data
COEFFP .set 02000h ; Program mem address
X .set 037Fh ; Newest data sample
LASTAP .set 037FH ; Oldest data sample
…
LAR AR3, #LASTAP ; Point to oldest sample
RPT #127
MACD COEFFP, * ; Do the thing
APAC
SACH Y,1 ; Store result  note shift
Application: FIR Filter on a TMS320C62x
Coefficients
Data
SingleCycle Loop
...
C7: ldh .D1 *A1++, A2 ; Read coefficient
 ldh .D2 *B1++, B2 ; Read data
 [B0] sub .L2 B0, 1, B0 ; Decrement counter
 [B0] B .S2 c7 ; Branch if not zero
 mpy .M1x A2, B2, A3 ; Form product
 add .L1 A4, A3, A4 ; Accumulate result
...
Ordered Dithering on a TMS320C62x
1/8
5/8
7/8
3/8
SingleCycle Loop
Array of thresholds
...
C7: ldb .D1 *A1++, A2 ; Read pixel
 ldb .D2 *B1++, B2 ; Read threshold
 [B0] sub .L2 B0, 1, B0 ; Decrement counter
 [B0] B .S2 c7 ; Branch if not zero
 cmpgtu .L1x A2, B2, A3 ; Threshold and store
 stb .D1 A3, *A5++ ; Accumulate result
...