Introduction to Digital Signal Processing for Real-Time Applications

Lecture 1 Introduction

Learning Objectives • Why process signals digitally? • Definition of a real-time application. • Why use Digital Signal Processing processors? • What are the typical DSP algorithms? • Parameters to consider when choosing a DSP processor. • Programmable vs ASIC DSP. • Texas Instruments’ TMS320 family.

Why go digital? • Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance. • Examples: • FIR filter with linear phase. • Adaptive filters.

Why go digital? • Analogue signal processing is achieved by using analogue components such as: • Resistors. • Capacitors. • Inductors. • The inherent tolerances associated with these components, temperature, voltage changes and mechanical vibrations can dramatically affect the effectiveness of the analogue circuitry.

Why go digital? • With DSP it is easy to: • Change applications. • Correct applications. • Update applications. • Additionally DSP reduces: • Noise susceptibility. • Chip count. • Development time. • Cost. • Power consumption.

Why NOT go digital? • High frequency signals cannot be processed digitally because of two reasons: • Analog to Digital Converters, ADC cannot work fast enough. • The application can be too complex to be performed in real-time.

Real-time processing • DSP processors have to perform tasks in real-time, so how do we define real-time? • The definition of real-time depends on the application. • Example: a 100-tap FIR filter is performed in real-time if the DSP can perform and complete the following operation between two samples:

Real-time processing • We can say that we have a real-time application if: • Waiting Time  0 Waiting Time Processing Time n n+1 Sample Time

Why do we need DSP processors? • Why not use a General Purpose Processor (GPP) such as a Pentium instead of a DSP processor? • What is the power consumption of a Pentium and a DSP processor? • What is the cost of a Pentium and a DSP processor?

Why do we need DSP processors? • Use a DSP processor when the following are required: • Cost saving. • Smaller size. • Low power consumption. • Processing of many “high” frequency signals in real-time. • Use a GPP processor when the following are required: • Large memory. • Advanced operating systems.

What are the typical DSP algorithms? • The Sum of Products (SOP) is the key element in most DSP algorithms:

Hardware vs. Microcode multiplication • DSP processors are optimised to perform multiplication and addition operations. • Multiplication and addition are done in hardware and in one cycle. • Example: 4-bit multiply (unsigned). Hardware Microcode 1011 x 1110 1011 x 1110 Cycle 1 10011010 0000 Cycle 2 1011. Cycle 3 1011.. Cycle 4 1011... 10011010 Cycle 5

Parameters to consider when choosing a DSP processor Parameter TMS320C6211 (@150MHz) TMS320C6711 (@150MHz) Arithmetic format Extended floating point Extended Arithmetic Performance (peak) Number of hardware multipliers Number of registers Internal L1 program memory cache Internal L1 data memory cache Internal L2 cache 32-bit N/A 40-bit 1200MIPS 2 (16 x 16-bit) with 32-bit result 32 32K 32K 512K 32-bit 64-bit 40-bit 1200MFLOPS 2 (32 x 32-bit) with 32 or 64-bit result 32 32K 32K 512K • C6711 Datasheet: \Links\TMS320C6711.pdf • C6211 Datasheet: \Links\TMS320C6211.pdf

Parameters to consider when choosing a DSP processor Parameter TMS320C6211 (@150MHz) TMS320C6711 (@150MHz) I/O bandwidth: Serial Ports (number/speed) DMA channels Multiprocessor support Supply voltage Power management On-chip timers (number/width) Cost Package External memory interface controller JTAG 2 x 75Mbps 16 Not inherent 3.3V I/O, 1.8V Core Yes 2 x 32-bit US$ 21.54 256 Pin BGA Yes Yes 2 x 75Mbps 16 Not inherent 3.3V I/O, 1.8V Core Yes 2 x 32-bit US$ 21.54 256 Pin BGA Yes Yes

Floating vs. Fixed point processors • Applications which require: • High precision. • Wide dynamic range. • High signal-to-noise ratio. • Ease of use. Need a floating point processor. • Drawback of floating point processors: • Higher power consumption. • Can be more expensive. • Can be slower than fixed-point counterparts and larger in size.

Floating vs. Fixed point processors • It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost. • For educational purposes, use the floating-point device (C6711) as it can support both fixed and floating point operations.

General Purpose DSP vs. DSP in ASIC • Application Specific Integrated Circuits (ASICs) are semiconductors designed for dedicated functions. • The advantages and disadvantages of using ASICs are listed below: Advantages Disadvantages • High throughput • Lower silicon area • Lower power consumption • Improved reliability • Reduction in system noise • Low overall system cost • High investment cost • Less flexibility • Long time from design to market

Texas Instruments’ TMS320 family C5000 C6000 Lowest Cost Control Systems • Motor Control • Storage • Digital Ctrl Systems Efficiency Best MIPS per Watt / Dollar / Size • Wireless phones • Internet audio players • Digital still cameras • Modems • Telephony • VoIP Performance & Best Ease-of-Use C2000 • Multi Channel and Multi Function App's • Comm Infrastructure • Wireless Base-stations • DSL • Imaging • Multi-media Servers • Video • Different families and sub-families exist to support different markets.

TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. TMS320C62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems. TMS320C67x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive.

C6000 Roadmap Floating Point Multi-core C64x™ DSP 1.1 GHz 2nd Generation C6416 C6414 C6415 DM642 C6412 C6411 HighestPerformance Object Code Software Compatibility Performance 1st Generation C6713 C6203 C6202 C6204 C6205 C6201 C62x/C64x/DM642: Fixed Point C67x: Floating Point C6211 C6701 C6712 C6711 Time

Useful Links • Selection Guide: • \Links\DSP Selection Guide.pdf \Links\DSP Selection Guide.pdf (3Q 2004) \Links\DSP Selection Guide.pdf (4Q 2004)

Learning Objectives • Describe C6000 CPU architecture. • Introduce some basic instructions. • Describe the C6000 memory map. • Provide an overview of the peripherals.

General DSP System Block Diagram External Memory Internal Memory Internal Buses PERIPHERALS Central Processing Unit

Implementation of Sum of Products (SOP) N Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required • anxn Y = * n = 1 = a1* x1 + a2 * x2+... + aN *xN It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So let’s write the code for this algorithm and at the same time discover the C6000 architecture.

Implementation of Sum of Products (SOP) N • anxn Y = * n = 1 Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required So let’s implement the SOP algorithm! The implementation in this module will be done in assembly. = a1* x1 + a2 * x2+... + aN *xN

Multiply (MPY) N • anxn Y = * n = 1 = a1* x1 + a2 * x2+... + aN *xN The multiplication of a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y This instruction is performed by a multiplier unit that is called “.M”

Multiply (.M unit) 40 • an xn Y = * n = 1 .M The . M unit performs multiplications in hardware MPY .M a1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.

Addition (.?) 40 • an xn Y = * n = 1 .M MPY .M a1, x1, prod ADD .? Y, prod, Y .?

Add (.L unit) 40 • an xn Y = * n = 1 .M MPY .M a1, x1, prod ADD .L Y, prod, Y .L RISC processors such as the C6000 use registers to hold the operands, so lets change this code.

Register File - A 40 • an xn Y = * Register File A n = 1 A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... 32-bits MPY .M a1, x1, prod ADD .L Y, prod, Y Let us correct this by replacing a, x, prod and Y by the registers as shown above.

Specifying Register Names 40 • an xn Y = * Register File A n = 1 A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... 32-bits MPY .M A0, A1, A3 ADD .L A4, A3, A4 The registers A0, A1, A3 and A4 contain the values to be used by the instructions.

Specifying Register Names 40 • an xn Y = * Register File A n = 1 A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... 32-bits MPY .M A0, A1, A3 ADD .L A4, A3, A4 Register File A contains 32 registers (A0 –A31) which are 32-bits wide.

Data loading Register File A A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... 32-bits Q: How do we load the operands into the registers?

Load Unit “.D” Register File A A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... .D 32-bits Data Memory Q: How do we load the operands into the registers? A: The operands are loaded into the registers by loading them from the memory using the .D unit.

Load Unit “.D” Register File A A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... .D 32-bits Data Memory It is worth noting at this stage that the only way to access memory is through the .D unit.

Load Instruction Register File A A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... .D 32-bits Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers?

Load Instructions (LDB, LDH,LDW,LDDW) Register File A A0 A1 A2 A3 A31 a1 x1 .M prod Y .L ... .D 32-bits Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers? A: The load instructions.

Using the Load Instructions Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. Data address 00000000 00000002 00000004 00000006 00000008 FFFFFFFF 16-bits

Using the Load Instructions The syntax for the load instruction is: Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. Data address 00000000 a1 00000002 x1 LD *Rn,Rm 00000004 00000006 prod Y 00000008 FFFFFFFF 16-bits

Using the Load Instructions The syntax for the load instruction is: The question now is how many bytes are going to be loaded into the destination register? Data address 00000000 a1 00000002 x1 LD *Rn,Rm 00000004 00000006 prod Y 00000008 FFFFFFFF 16-bits

Using the Load Instructions LD *Rn,Rm The syntax for the load instruction is: Data address 00000000 a1 00000002 x1 00000004 00000006 prod • The answer, is that it depends on the instruction you choose: • LDB: loads one byte (8-bit) • LDH: loads half word (16-bit) • LDW: loads a word (32-bit) • LDDW: loads a double word (64-bit) • Note: LD on its own does not exist. Y 00000008 FFFFFFFF 16-bits

Using the Load Instructions LD *Rn,Rm Data The syntax for the load instruction is: address 1 0 0xA 0xB 00000000 0xC 0xD 00000002 0x2 0x1 00000004 0x4 0x3 00000006 Example: If we assume that A5 = 0x4 then: (1) LDB *A5, A7 ; gives A7 = 0x00000001 (2) LDH *A5,A7; gives A7 = 0x00000201 (3) LDW *A5,A7; gives A7 = 0x04030201 (4) LDDW *A5,A7:A6; gives A7:A6 = 0x0807060504030201 0x6 0x5 00000008 0x8 0x7 FFFFFFFF 16-bits

Using the Load Instructions LD *Rn,Rm The syntax for the load instruction is: Data address 0xA 0xB 00000000 0xC 0xD 00000002 0x2 0x1 00000004 0x4 0x3 00000006 Question: If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place? 0x6 0x5 00000008 0x8 0x7 FFFFFFFF 16-bits

Loading the Pointer Rn • The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL .? a, A5 (‘a’ is a constant or label) • How many bits represent a full address? 32 bits • So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode).

Loading the Pointer Rn ah al a ah x A5 • To solve this problem another instruction is available: MVKH eg.MVKH .? a, A5 (‘a’ is a constant or label) • Finally, to move the 32-bit address to a register we can use: MVKL a, A5 MVKH a, A5

Loading the Pointer Rn • Always use MVKL then MVKH, look at the following examples: Example 1 A5 = 0x87654321 MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH 0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH 0x1234FABC, A5 A5 = 0x12344321 MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong

LDH, MVKL and MVKH MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 Register File A A0 A1 A2 A3 A4 A31 a x .M prod Y .L ... .D 32-bits pt1 and pt2 point to some locations in the data memory. Data Memory

Creating a loop MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 So far we have only implemented the SOP for one tap only, i.e. Y= a1* x1 So let’s create a loop so that we can implement the SOP for N Taps.

Creating a loop So far we have only implemented the SOP for one tap only, i.e. Y= a1* x1 So let’s create a loop so that we can implement the SOP for N Taps. With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction.

What are the steps for creating a loop 1. Create a label to branch to. 2. Add a branch instruction, B. 3. Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter.

Introduction to Digital Signal Processing for Real-Time Applications

Introduction to Digital Signal Processing for Real-Time Applications

Presentation Transcript

Introduction Lecture 1

Lecture 1 Introduction

Lecture 1: Introduction

Lecture 1: Introduction

Lecture 1 – Introduction

Lecture 1: Introduction

Introduction Lecture 1

Lecture 1: Introduction

Lecture 1: Introduction

Lecture #1 INTRODUCTION

Lecture 1: Introduction

Lecture 1: Introduction

Lecture 1 - Introduction

Lecture 1 – Introduction

Lecture 1 : Introduction

Lecture: 1 - Introduction

LECTURE 1: INTRODUCTION

Lecture 1: Introduction

Lecture 1 Introduction

Lecture 1: Introduction

Lecture 1. Introduction

Lecture 1 Introduction