CPRE 583
Download
1 / 32

CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches) - PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on

CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches). Instructor: Dr. Phillip Jones ([email protected]) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA. http://class.ee.iastate.edu/cpre583/. Announcements/Reminders.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)' - reilly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CPRE 583Reconfigurable ComputingLecture 10: Wed 9/24/2010(High-level Acceleration Approaches)

Instructor: Dr. Phillip Jones

([email protected])

Reconfigurable Computing Laboratory

Iowa State University

Ames, Iowa, USA

http://class.ee.iastate.edu/cpre583/


Announcements reminders
Announcements/Reminders

  • HW2: Due Wed 10/6

    • Problem 2 will have a separate deadline (to be announced)

  • MP2: Due Fri 10/1 (you can work in pairs)

    • Make sure to read the README file in the MP2 distribution

      • Contains info on how to fix a Gigabit core licensing issue ISE has

  • Start thinking of class projects and forming teams

    • Submit teams and project ideas: Mon 10/11 midnight

    • Project proposal presentations: Wed 10/20


Projects
Projects

  • Expectations

    • Working system

    • Write up that can potentially be submitted to a conference

      • Will use DAC format as write up guide line

    • 15-20minute PowerPoint Presentation

  • DAC (Design Automation Conference)

    • http://www2.dac.com/

    • Conference papers

      • Due Date: 5pm (MT) Thur 11/18/2010

    • Student Design Contest

      • Due Date: 5pm (MT) Wed 11/24/2010,Cash Prizes!


Projects ideas relevant conferences
Projects Ideas: Relevant conferences

  • Micro

  • Super Computing

  • HPCA

  • IPDPS

  • FPL

  • FPT

  • FCCM

  • FPGA

  • DAC

  • ICCAD

  • Reconfig

  • RTSS

  • RTAS

  • ISCA


Initial project proposal slides 5 10 slides
Initial Project Proposal Slides (5-10 slides)

  • Project team list: Name, Responsibility (who is project leader)

  • Project idea

    • Motivation (why is this interesting, useful)

    • What will be the end result

    • High-level picture of final product

  • High-level Plan

    • Break project into mile stones

      • Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip.

    • System block diagrams

    • High-level algorithms (if any)

    • Concerns

      • Implementation

      • Conceptual

  • Research papers related to you project idea


  • Weekly project updates
    Weekly Project Updates

    • The current state of your project write up

      • Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section

    • The current state of your Final Presentation

      • Your Initial Project proposal presentation (Due Wed 10/20). Should make for a starting point for you Final presentation

    • What things are work & not working

    • What roadblocks are you running into


    Projects target timeline
    Projects: Target Timeline

    • Teams Formed and Idea: Mon 10/11

      • Project idea in Power Point 3-5 slides

        • Motivation (why is this interesting, useful)

        • What will be the end result

        • High-level picture of final product

      • Project team list: Name, Responsibility

    • High-level Plan/Proposal: Wed 10/20

      • Power Point 5-10 slides

        • System block diagrams

        • High-level algorithms (if any)

        • Concerns

          • Implementation

          • Conceptual

        • Related research papers (if any)


    Projects target timeline1
    Projects: Target Timeline

    • Work on projects: 10/22 - 12/8

      • Weekly update reports

        • More information on updates will be given

    • Presentations: Last Wed/Fri of class

      • Present / Demo what is done at this point

      • 15-20 minutes (depends on number of projects)

    • Final write up and Software/Hardware turned in: Day of final (TBD)



    Overview
    Overview

    • First 15 minutes of Google FPGA lecture

    • How to run Gprof

    • Discuss some high-level approaches for accelerating applications.


    What you should learn
    What you should learn

    • Start to get a feel for approaches for accelerating applications.


    Why use customize hardware
    Why use Customize Hardware?

    • Great talk about the benefits of Heterogeneous Computing

      • http://video.google.com/videoplay?docid=-4969729965240981475#


    Profiling applications
    Profiling Applications

    • Finding bottlenecks

    • Profiling tools

      • gprof: http://www.cs.nyu.edu/~argyle/tutorial.html

      • Valgrind


    Pipelining
    Pipelining

    How many ns to process to process 100 input vectors? Assuming each LUT

    Has a 1 ns delay.

    Input vector

    <A,B,C,D>

    output

    A

    4-LUT

    4-LUT

    4-LUT

    4-LUT

    B

    C

    DFF

    DFF

    DFF

    DFF

    D

    How many ns to process 100 input vectors? Assume a 1 ns clock

    1 DFF delay

    per output

    A

    4-LUT

    4-LUT

    4-LUT

    4-LUT

    B

    C

    DFF

    DFF

    DFF

    DFF

    D


    Pipelining systolic arrays
    Pipelining (Systolic Arrays)

    Dynamic Programming

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.


    Pipelining systolic arrays1
    Pipelining (Systolic Arrays)

    Dynamic Programming

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1


    Pipelining systolic arrays2
    Pipelining (Systolic Arrays)

    Dynamic Programming

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    1

    1


    Pipelining systolic arrays3
    Pipelining (Systolic Arrays)

    Dynamic Programming

    1

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    2

    1

    1

    1


    Pipelining systolic arrays4
    Pipelining (Systolic Arrays)

    Dynamic Programming

    1

    3

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    2

    3

    1

    1

    1


    Pipelining systolic arrays5
    Pipelining (Systolic Arrays)

    Dynamic Programming

    1

    3

    6

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    2

    3

    1

    1

    1


    Pipelining systolic arrays6
    Pipelining (Systolic Arrays)

    Dynamic Programming

    1

    3

    6

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    2

    3

    1

    1

    1

    How many ns to process if CPU can process one cell per clock (1 ns clock)?


    Pipelining systolic arrays7
    Pipelining (Systolic Arrays)

    Dynamic Programming

    1

    3

    6

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    2

    3

    1

    1

    1

    How many ns to process if FPGA can obtain maximum parallelism each clock?

    (1 ns clock)


    Pipelining systolic arrays8
    Pipelining (Systolic Arrays)

    Dynamic Programming

    1

    3

    6

    • Start with base case

      • Lower left corner

    • Formula for computing

    • numbering cells

    • 3. Final result in upper

    • right corner.

    1

    2

    3

    1

    1

    1

    What speed up would an FPGA obtain (assuming maximum parallelism) for

    an 100x100 matrix. (Hint find a formula for an NxN matrix)


    Dr james moscola example
    Dr. James Moscola (Example)

    ROOT0

    S0

    g

    a

    c

    c

    a

    g

    IL1

    IR2

    1

    2

    3

    MATP1

    MP3

    ML4

    MR5

    D6

    ROOT0

    1

    MATP1

    3

    IL7

    IR8

    2

    MATL2

    MATL2

    END3

    ML9

    D10

    IL11

    END3

    E12


    Example rna model
    Example RNA Model

    ROOT0

    S0

    g

    a

    c

    c

    a

    g

    IL1

    IR2

    1

    2

    3

    MATP1

    MP3

    ML4

    MR5

    D6

    ROOT0

    1

    MATP1

    3

    IL7

    IR8

    2

    MATL2

    MATL2

    END3

    ML9

    D10

    IL11

    END3

    E12


    Baseline architecture pipeline
    Baseline Architecture Pipeline

    END3

    MATL2

    MATP1

    ROOT0

    E12

    IL11

    D10

    ML9

    IR8

    IL7

    D6

    MR5

    ML4

    MP3

    IR2

    IL1

    S0

    u

    g

    g

    c

    g

    a

    c

    a

    c

    c

    c

    residue

    pipeline


    Processing elements
    Processing Elements

    -INF

    -INF

    .40

    -INF

    .44

    .30

    -INF

    .30

    .72

    .22

    ML4

    d 

    0

    1

    2

    3

    0

    1

    j 

    IL7,3,2

    +

    2

    ML4_t(7)

    =

    3

    IR8,3,2

    +

    ML4_t(8)

    =

    ML9,3,2

    +

    ML4_t(9)

    =

    D10,3,2

    ML4,3,3 =

    .22

    +

    +

    ML4_t(10)

    ML4_e(A)

    ML4_e(C)

    ML4_e(G)

    ML4_e(U)

    input residue, xi


    Baseline results for example model
    Baseline Results for Example Model

    • Comparison to Infernal software

      • Infernal run on Intel Xeon 2.8GHz

      • Baseline architecture run on Xilinx Virtex-II 4000

        • occupied 88% of logic resources

        • run at 100 MHz

      • Input database of 100 Million residues

    • Bulk of time spent on I/O (41.434s)


    Expected speedup on larger models
    Expected Speedup on Larger Models

    • Speedup estimated ...

      • using 100 MHz clock

      • for processing database of 100 Million residues

    • Speedups range from 500x to over 13,000x

      • larger models with more parallelism exhibit greater speedups


    Distributed memory
    Distributed Memory

    ALU

    Cache

    BRAM

    BRAM

    PE

    BRAM

    BRAM


    Next class
    Next Class

    • Models of Computation (Design Patterns)


    Questions comments concerns
    Questions/Comments/Concerns

    • Write down

      • Main point of lecture

      • One thing that’s still not quite clear

      • If everything is clear, then give an example of how to apply something from lecture

    OR


    ad