INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs) Accumulator architecture Memoryregister architecture Prof. Brian L. Evans Contributions by Niranjan DameraVenkata and Magesh Valliappan Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 787121084
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Accumulator architecture
Memoryregister architecture
Prof. Brian L. Evans
Contributions byNiranjan DameraVenkata andMagesh Valliappan
Embedded Signal Processing LaboratoryThe University of Texas at AustinAustin, TX 787121084
http://signal.ece.utexas.edu/
Loadstore architecture
Single DSP
Multiple DSPs
Data Shifting Using a Linear Buffer
Time
Buffer contents
Next sample
xNK+1
xN1
xN+1
xNK+2
xN
n=N
xNK+2
xN
xNK+3
xN+1
n=N+1
xN+2
xNK+3
xN+1
xNK+4
xN+2
n=N+2
xN+3
Conventional DSP Architecture (con’t)
Modulo Addressing Using a Circular Buffer
Time
Next sample
Buffer contents
n=N
xN2
xNK+1
xN
xN+1
xN1
xNK+2
xN+2
xN2
xN+1
xN
xN
xNK+3
xN1
xNK+2
n=N+1
xN2
xN+1
xN
xN1
xN
xN+2
xNK+3
xNK+4
xNK+4
n=N+2
xN+3
DSP Market
Fixedpoint 95%Floatingpoint 5%
Sequential(Motorola 56000)
Fetch
Decode
Read
Execute
Pipelined(Most conventional DSPs)
Fetch
Decode
Read
Execute
Superscalar(Pentium, MIPS)
Fetch
Decode
Read
Execute
Superpipelined(TMS320C6000)
Fetch
Decode
Read
Execute
Decode
Read
Execute
F
D
R
E
C
D
E
F
G
H
I
J
K

L
D
E
F
G
H
I
J
K
L
L
B
C
D
E
F
G
H
I
J
K

L
A
B
C
D
E
F
G
H
I
J
K

L
Pipelining: OperationMAC X0,Y0,A X:(R0)+,X0 Y:(R4),Y0
MPYF *++AR0(1),*++AR1(IR0),R0
MAC means multiplicationaccumulation.
Decode
Read
Execute
F
D
R
E
D
E
F
br
G


X
Y
Y
Z
CD
E
F
br



X

Y
Z
BCD
E
F
br



X

Y
Z
ABCD
E
F
br



X

Y
Z
Pipelining: HazardsLAR AR2, ADDR ; load address reg.
LACC * ; load accumulator w/; contents of AR2
LAR: 2 cycles to update AR2 & ARP; need NOP after it
Fetch
Decode
Read
High throughput performance of DSPs is helped by onchip dedicated logic for looping (downcounters/looping registers)
Execute
F
D
R
E
D
E
F
rpt
X
X
X
X
X
X
X
X
C
D
E
F
rpt


X
X
X
X
X
B
CD
E
F
rpt


X
X
X
X
ABCD
E
F
rpt


X
X
X
; repeat TBLR inst. COUNT1 times
RPT COUNT
TBLR *+
Reorder
Load/store
Memory
FloatingPoint Unit
Integer Unit
Load/store
Load/store
Memory
Address
ALU
Multiplier
Registers
I/DCache
Physical
memory
Outof
order
TLB
TLB: Translation Lookaside Buffer
Internal
memories
I Cache
Registers
External
memories
DMA Controller
DMA: Direct Memory Access
Simplified Architecture
Program RAM
Data RAM
or Cache
Addr
Internal Buses
DMA
Serial Port
Host Port
Boot Load
Timers
Pwr Down
Data
.D1
.D2
.M1
.M2
External
Memory
Sync
Async
Regs (A0A15)
Regs (B0B15)
.L1
.L2
.S1
.S2
Control Regs
CPU
C6200 fixedpt. 150 300 MHz ADSL, printers
C6400 fixed pt. 3001,000 MHz video, wireless basestations
C6700 floating 100 225 MHz medical imaging, proaudio
300 million multiplyaccumulates/s, 1200 RISC MIPS
Onchip memory: 16 kwords program, 16 kwords data
334 million multiplyaccumulates/s, 1336 RISC MIPS
Onchip memory: 16 kwords program, 16 kwords data
External: one 133MHz 64kword, two 100MHz 1Mword
C6000 Instruction Set by Functional Unit
.S Unit
ADD NEGADDK NOTADD2 ORAND SETB SHLCLR SHREXT SSHLMV SUBMVC SUB2MVK XORMVKH ZERO
.L Unit
ABS NOTADD ORAND SADDCMPEQ SATCMPGT SSUBCMPLT SUBLMBD SUBCMV XORNEG ZERONORM
.D Unit
ADD STADDA SUBLD SUBAMV ZERONEG
.M Unit
MPY SMPYMPYH SMPYH
Other
NOP IDLE
Six of the eight functional units can perform integer add, subtract, and move operations
ArithmeticABSADDADDAADDKADD2MPYMPYHNEGSMPYSMPYHSADDSATSSUBSUBSUBASUBCSUB2ZERO
LogicalANDCMPEQCMPGTCMPLTNOTORSHLSHRSSHLXOR
DataManagementLDMVMVCMVKMVKHST
ProgramControlBIDLENOP
BitManagementCLREXTLMBDNORMSET
C6000 InstructionSet by Category
(un)signed multiplicationsaturation/packed arithmetic
C6000 vs. C5000 Addressing Modes
TI C5000
TI C6000
ADD #0FFh add .L1 13,A1,A6
(implied) add .L1 A7,A6,A7
ADD 010h not supported
ADD * ldw .L1 *A5++[8],A1
C6700 Floating Point Extensions by Unit
.S Unit
ABSDP CMPLTSP ABSSP RCPDPCMPEQDP RCPSP CMPEQSP RSARDP CMPGTDP RSQRSP CMPGTSP SPDPCMPLTDP
.L Unit
ADDDP INTSPADDSP SPINTDPINT SPTRUNCDPSP SUBDPDPTRUNC SUBSPINTDP
.M Unit
MPYDP MPYIDMPYI MPYSP
.D Unit
ADDAD LDDW
Four functional units can perform IEEE singleprecision (SP) and doubleprecision (DP) floatingpoint add, subtract, move.
Operations beginning with R are reciprocal calculations.
150 MHz clock,900 MFLOPS
4 kB/4kB of L1 program/data memory
64 kB of L2 cache
0 kB of L2 SRAM
1200 MB/s onchip data bus bandwidth
$13.50 each in volume
C6713
225 MHz clock,1350 MFLOPS
4 kB/4kB of L1 program/data memory
256 kB of L2 cache
192 kB of L2 SRAM
1800 MB/s onchip data bus bandwidth
$26.85 each in volume
C6712 vs. C6713Information as of December 14, 2003
Digital Signal Processor Cores
40% annual growth 19902000: #1 in semiconductor market
Revenue: $4.4B ‘99, $6.1B ‘00, $4.5B ‘01, $4.9B ‘02, $6B ‘03
2000: 44% TI, 23% Agere, 13% Motorola, 10% Analog Dev.
2001: 40% TI, 16% Agere, 12% Motorola, 8% Analog Dev.
2002: 43% TI, 14% Motorola, 14% Agere, 9% Analog Dev.