Loading in 5 sec....

CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)PowerPoint Presentation

CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

- 50 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)' - reilly

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

CPRE 583Reconfigurable ComputingLecture 10: Wed 9/24/2010(High-level Acceleration Approaches)

Instructor: Dr. Phillip Jones

Reconfigurable Computing Laboratory

Iowa State University

Ames, Iowa, USA

http://class.ee.iastate.edu/cpre583/

Announcements/Reminders

- HW2: Due Wed 10/6
- Problem 2 will have a separate deadline (to be announced)

- MP2: Due Fri 10/1 (you can work in pairs)
- Make sure to read the README file in the MP2 distribution
- Contains info on how to fix a Gigabit core licensing issue ISE has

- Make sure to read the README file in the MP2 distribution
- Start thinking of class projects and forming teams
- Submit teams and project ideas: Mon 10/11 midnight
- Project proposal presentations: Wed 10/20

Projects

- Expectations
- Working system
- Write up that can potentially be submitted to a conference
- Will use DAC format as write up guide line

- 15-20minute PowerPoint Presentation

- DAC (Design Automation Conference)
- http://www2.dac.com/
- Conference papers
- Due Date: 5pm (MT) Thur 11/18/2010

- Student Design Contest
- Due Date: 5pm (MT) Wed 11/24/2010,Cash Prizes!

Projects Ideas: Relevant conferences

- Micro
- Super Computing
- HPCA
- IPDPS

- FPL
- FPT
- FCCM
- FPGA
- DAC
- ICCAD
- Reconfig
- RTSS
- RTAS
- ISCA

Initial Project Proposal Slides (5-10 slides) High-level Plan Research papers related to you project idea

- Project team list: Name, Responsibility (who is project leader)
- Project idea
- Motivation (why is this interesting, useful)
- What will be the end result
- High-level picture of final product

- Break project into mile stones
- Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip.

- System block diagrams
- High-level algorithms (if any)
- Concerns
- Implementation
- Conceptual

Weekly Project Updates

- The current state of your project write up
- Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section

- The current state of your Final Presentation
- Your Initial Project proposal presentation (Due Wed 10/20). Should make for a starting point for you Final presentation

- What things are work & not working
- What roadblocks are you running into

Projects: Target Timeline

- Teams Formed and Idea: Mon 10/11
- Project idea in Power Point 3-5 slides
- Motivation (why is this interesting, useful)
- What will be the end result
- High-level picture of final product

- Project team list: Name, Responsibility

- Project idea in Power Point 3-5 slides
- High-level Plan/Proposal: Wed 10/20
- Power Point 5-10 slides
- System block diagrams
- High-level algorithms (if any)
- Concerns
- Implementation
- Conceptual

- Related research papers (if any)

- Power Point 5-10 slides

Projects: Target Timeline

- Work on projects: 10/22 - 12/8
- Weekly update reports
- More information on updates will be given

- Weekly update reports
- Presentations: Last Wed/Fri of class
- Present / Demo what is done at this point
- 15-20 minutes (depends on number of projects)

- Final write up and Software/Hardware turned in: Day of final (TBD)

Overview

- First 15 minutes of Google FPGA lecture
- How to run Gprof
- Discuss some high-level approaches for accelerating applications.

What you should learn

- Start to get a feel for approaches for accelerating applications.

Why use Customize Hardware?

- Great talk about the benefits of Heterogeneous Computing
- http://video.google.com/videoplay?docid=-4969729965240981475#

Profiling Applications

- Finding bottlenecks
- Profiling tools
- gprof: http://www.cs.nyu.edu/~argyle/tutorial.html
- Valgrind

Pipelining

How many ns to process to process 100 input vectors? Assuming each LUT

Has a 1 ns delay.

Input vector

<A,B,C,D>

output

A

4-LUT

4-LUT

4-LUT

4-LUT

B

C

DFF

DFF

DFF

DFF

D

How many ns to process 100 input vectors? Assume a 1 ns clock

1 DFF delay

per output

A

4-LUT

4-LUT

4-LUT

4-LUT

B

C

DFF

DFF

DFF

DFF

D

Pipelining (Systolic Arrays)

Dynamic Programming

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

Pipelining (Systolic Arrays)

Dynamic Programming

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

Pipelining (Systolic Arrays)

Dynamic Programming

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

1

1

Pipelining (Systolic Arrays)

Dynamic Programming

1

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

2

1

1

1

Pipelining (Systolic Arrays)

Dynamic Programming

1

3

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

2

3

1

1

1

Pipelining (Systolic Arrays)

Dynamic Programming

1

3

6

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

2

3

1

1

1

Pipelining (Systolic Arrays)

Dynamic Programming

1

3

6

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

2

3

1

1

1

How many ns to process if CPU can process one cell per clock (1 ns clock)?

Pipelining (Systolic Arrays)

Dynamic Programming

1

3

6

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

2

3

1

1

1

How many ns to process if FPGA can obtain maximum parallelism each clock?

(1 ns clock)

Pipelining (Systolic Arrays)

Dynamic Programming

1

3

6

- Start with base case
- Lower left corner

- Formula for computing
- numbering cells
- 3. Final result in upper
- right corner.

1

2

3

1

1

1

What speed up would an FPGA obtain (assuming maximum parallelism) for

an 100x100 matrix. (Hint find a formula for an NxN matrix)

Dr. James Moscola (Example)

ROOT0

S0

g

a

c

c

a

g

IL1

IR2

1

2

3

MATP1

MP3

ML4

MR5

D6

ROOT0

1

MATP1

3

IL7

IR8

2

MATL2

MATL2

END3

ML9

D10

IL11

END3

E12

Example RNA Model

ROOT0

S0

g

a

c

c

a

g

IL1

IR2

1

2

3

MATP1

MP3

ML4

MR5

D6

ROOT0

1

MATP1

3

IL7

IR8

2

MATL2

MATL2

END3

ML9

D10

IL11

END3

E12

Baseline Architecture Pipeline

END3

MATL2

MATP1

ROOT0

E12

IL11

D10

ML9

IR8

IL7

D6

MR5

ML4

MP3

IR2

IL1

S0

u

g

g

c

g

a

c

a

c

c

c

residue

pipeline

Processing Elements

-INF

-INF

.40

-INF

.44

.30

-INF

.30

.72

.22

ML4

d

0

1

2

3

0

1

j

IL7,3,2

+

2

ML4_t(7)

=

3

IR8,3,2

+

ML4_t(8)

=

ML9,3,2

+

ML4_t(9)

=

D10,3,2

ML4,3,3 =

.22

+

+

ML4_t(10)

ML4_e(A)

ML4_e(C)

ML4_e(G)

ML4_e(U)

input residue, xi

Baseline Results for Example Model

- Comparison to Infernal software
- Infernal run on Intel Xeon 2.8GHz
- Baseline architecture run on Xilinx Virtex-II 4000
- occupied 88% of logic resources
- run at 100 MHz

- Input database of 100 Million residues

- Bulk of time spent on I/O (41.434s)

Expected Speedup on Larger Models

- Speedup estimated ...
- using 100 MHz clock
- for processing database of 100 Million residues

- Speedups range from 500x to over 13,000x
- larger models with more parallelism exhibit greater speedups

Next Class

- Models of Computation (Design Patterns)

Questions/Comments/Concerns

- Write down
- Main point of lecture
- One thing that’s still not quite clear
- If everything is clear, then give an example of how to apply something from lecture

OR

Download Presentation

Connecting to Server..