hardware accelerator for ppc microprocessor n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hardware accelerator for PPC microprocessor PowerPoint Presentation
Download Presentation
Hardware accelerator for PPC microprocessor

Loading in 2 Seconds...

play fullscreen
1 / 26

Hardware accelerator for PPC microprocessor - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Hardware accelerator for PPC microprocessor. Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri. Agenda. Ways to implement an algorithm Starting with ASC

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hardware accelerator for PPC microprocessor' - wayde


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hardware accelerator for ppc microprocessor

Hardware accelerator for PPC microprocessor

Final presentation

By: Instructor:

Kopitman Reem Fiksman Evgeny

Stolberg Dmitri

agenda
Agenda
  • Ways to implement an algorithm
  • Starting with ASC
  • HW architecture
  • SW architecture
  • System optimization
  • Generic module (iDCT)
  • Timing results
abstract
Abstract
  • Problem
    • There are complex functions (e.g. FFT) which takes a lot of CPU recourses
    • Consider the ways of implementation of such functions and choose the best solution according to specified constraints
  • Solutions
    • Pure SW implementation
    • Pure HW implementation
    • Combinational HW + SW

- ASC technology

abstract1
Abstract
  • SW
    • Low cost
    • Low performance
  • HW
    • High cost
    • High performance
  • Combinational
project goals
Project Goals
  • Study of ASC (A Stream compiler)
  • Study of functions in PamDC library
  • Implementation of interface between a generic module and the CPU using ASC
  • Implementation of some specific module to test the interface
  • Implementation of the same module in SW and make conclusions about performance
asc a stream compiler
ASC - A Stream Compiler
  • Combinational (SW/HW) code
  • Familiar C++ writing
  • Generates a flexible HW
  • Standard NetList output (edif)
  • Supported by standard Cad tools
  • Provides HW optimization
  • UNIX oriented
asc code example
ASC – code example

#include "asc.h"

main(int argc, char **argv) {

printf("Hello World\n");

STREAM_START; // ASC code start

// Hardware Variable Declarations

HWint in(IN);

HWint out(OUT);

HWint tmp(TMP);

STREAM_LOOP(16);

tmp = (in << 1) + 55;

out = tmp;

STREAM_END; // ASC code end

}

Software

Hello World

Hardware

55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87

system components
System components
  • Memec evaluation board
    • Xilinx Virtex II Pro FPGA with PPC405
    • JTAG
    • LCD, Serial port for debug
  • SW tools
    • Xilinx EDK
    • Xilinx Platform Studio
    • Chip Scope
design approach general

System Bus

(PLB)

Design Approach - general
  • FPGA module

Memory

EDAC

Memory

EDAC

Memory

DRAM

Peripheral module

PPC405

Processor

Peripheral

Peripheral

ASC

Monitor module

Monitor

Monitor

other

peripheral

asc interface general view
ASC interface (General view)

Generic Module

PLB bus

Interrupt controller

Fifo_full

DMA engine

CTRL

CTRL

Serdes

Data_in

FIFO_in

Data

DMA Buffer

Data_out

Addr

FIFO_out

Fifo_full

sw review main algorithm
SW review – main algorithm

Start/reset

System blocks initialization(FIFO,DMA,GPIO,LCD)

Yes

DMA busy

No

Read data packets from ASC application

Write data packets to ASC application

No

Yes

Calculation complete

sw review c code fundament
SW review – C code fundament
  • DMA – control and data TX/RX func.
  • LCD – setup and data TX func.
  • Data size manipulation
  • Timers control func.
  • MASK definition – user friendly orientation
idct abstract
iDCT abstract
  • Reconstructs an image or audio block from it’s discrete cosine transform
  • Why iDCT?

Complex iterative algorithm which takes a lot of CPU resources

asc design idct module
ASC design – IDCT module
  • Discrete Cosine Transform
  • This transform is utilized in the current standards for still images (JPEG) and video compression (MPEG).
  • The principle:

Xm - matrix of discrete samples (iDCT samples)

Tm - cosine coefficientmatrix

Fm - DCT matrix

asc design optimization 1
ASC design – Optimization (1)
  • ASC supports:
    • Latency
    • Throughput
    • Area
  • For large amount of data:

Throughput – calculation time optimized

asc design optimization 2
ASC design – Optimization (2)
  • Optimization…

Throughput, Area, Latency?

asc design optimization 3
ASC design – Optimization (3)
  • Optimization – Area consumption
  • Absolute values refer to Xilinx Virtex II Pro XC2VP7 FPGA
asc design optimization 4
ASC design – Optimization (4)
  • Optimization – Area Consumption
  • Optimization by latency is the choice . Best throughput and latency,
  • with average area consumption
clock calculations
Clock calculations
  • Clock calculations

Get time 1

Set DMA control

Tx / Rx data packet

complete

No

LCD write Data + calculation time

Yes

Get time 2

Calk_time = time2 – time1

idct running results sw 1

SW performance

1400

1200

1000

800

Calculation Time [us]

600

400

200

0

1

2

4

8

16

17

20

24

28

30

32

38

40

44

46

48

50

54

60

64

68

72

74

78

80

84

88

90

94

96

98

104

100

108

112

116

120

Packet length (x*32)

SW performance

iDCT running results – SW (1)
  • Linear calculation time growth vs. data packet length as expected
  • in iDCT
  • Basic packet size is 32 bytes. Packet length scale is in num. of
  • basic packets
idct running results sw 2
iDCT running results – SW (2)

Exponential Data incease

100000000

10000000

1000000

100000

10000

log (Calculation time[us])

1000

100

10

1

1

3

7

10

20

30

50

70

100

150

200

250

300

350

400

450

470

500

512

550

700

1000

10000

30000

50000

100000

300000

500000

1000000

log (Packet length) (x*32)

Exponential Data increase

  • Exponential time calculation growth with exp. data length increasing
idct running results hw 1
iDCT running results – HW (1)
  • FIFO size influence (512 bytes)
  • High calculation time vs. writing new data to FIFO
idct running results hw 2
iDCT running results – HW (2)
  • FIFO size influence (512 bytes)
  • High calculation time vs. writing new data to FIFO
  • Basic packet size is 32 bytes. Packet length scale is in num. of
  • basic packets
innovations
Innovations
  • Make this generic interface hard coded and include it as part of FPGA (IP) development packet.
    • Development becomes to C++ coding only
    • Interconnection between PPC & Generic Module becomes transparent
  • Make current design faster using separate DMA channels for read and write