Liquid architecture r esults of optimization of the leon
Download
1 / 32

Liquid Architecture – R esults of Optimization of the LEON - PowerPoint PPT Presentation


  • 36 Views
  • Uploaded on

John W Lockwood Visiting Associate Professor Stanford University [email protected] http://stanford.edu/~jwlockwd Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Liquid Architecture – R esults of Optimization of the LEON ' - ulysses-mcdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Liquid architecture r esults of optimization of the leon

John W LockwoodVisiting Associate Professor

Stanford [email protected]

http://stanford.edu/~jwlockwd

Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al.

In Collaboration with: Ron Cytron, Roger Chamberlain, Jason Fritts David Schuehler, Phillip Jones, Scott Friedman, Ben Brodie, Huakai Zhang

Funded by : NSF 03-13203

Washington University in St. Louis

www.arl.wustl.edu/arl/projects/fpx/projects/liquid_arch/

Liquid Architecture –Results of Optimization of the LEON


Soft processors
Soft processors

App performance

  • Parameterized general purpose processors

FPGA resources

Power

Number of registers

Set size

Set Associativity

  • Customization is performance-cost tradeoff

  • More “knobs” more options for customization



Soft processor customization
Soft processor customization

  • LEON: 10 reconfigurable subsystems

    • Instruction cache

      • Parameters: sets, set size, line size, replacement policy

        • 4 * 7 * 2 * 3 = 168 configurations (4 parameters; 16 values)

    • Data cache

      • sets, set size, line size, replacement, fast read, fast write, local RAM, local RAM size

        • 168 * 2 * 2 * 2 * 7 = 9,408 configns (8 params; 29 values)

    • Integer unit

      • multiplier, registers, fast jump, fast decode, ICC, load delay, FPU enable, co-processor enable, hardware watchpoints

        • 119,040 configurations (10 parameters; 56 values)

    • & Floating-point unit, memory controller, peripherals,…

  • In total

    • 190 parameter values; 5*(1024) configurations!!



Highlights of optimization technique
Highlights of optimization technique

  • Optimize and Bound Parameterize

    • Search space still includes all 5*(1024) configurations

    • Build only 100’s instead of 5*(1024) of configurations

  • Formulate as binary integer nonlinear optimization program

  • Measure actual cost and performance of resulting application performance andprocessor area


  • Cost measurement
    Cost measurement

    • Application runtime

      • Measured in cycles from direct execution

        • Hardware-based profiler providesnon-intrusive, cycle-accurate, near-real-time measurement

    • FPGA resources

      • Measured In terms as number of LUTs and BRAM, from actual build

    • Power / Energy : Work in Progress

      • FPL, FPT, VLSI conference ..


    Optimization technique
    Optimization technique

    Start with soft processor:

    base configuration

    Perturb parameter values,

    build configuration, track resource cost

    Run application on each configuration,

    trackruntime cost

    Formulate costs as

    Binary Integer NonlinearProgram

    Solve using TOMLAB/MatLab



    Processor ICache reconfiguration

    xi = 0 or 1

    (off or on)


    Processor ICache reconfiguration

    xi = 0 or 1

    (off or on)


    Processor ICache reconfiguration

    xi = 0 or 1

    (off or on)

    No constraint needed


    Fpga resource constraints
    FPGA resource constraints

    • LUTs

    • BRAM

    xi = 0 or 1

    (off or on)

    ri,li,bi: delta costs from base configuration

    n is number of

    configurations


    Optimization
    Optimization

    • Optimize application runtime

    • Optimize resource utilization also

    xi = 0 or 1

    (off or on)

    ri,li,bi: delta costs from base configuration

    n is number of

    configurations


    Problem formulation
    Problem formulation

    • Minimize

    • Subject to

    Parameter validity constraints

    FPGA resource constraints

    Binary variables constraint

    xi = 0 or 1

    (off or on)

    ri,li,bi: delta costs from base configuration

    n is number of

    configurations


    Evaluation
    Evaluation

    Our technique selects the same configuration

    Despite parameter independence assumption, near-optimal configuration


    FPX Module(Stacked below Enet card)

    Virtex w/LEON core

    Gigabit

    Ethernet

    Serial

    port

    Front-panelNetworkPort


    Method

    Time /

    Cycles

    Liquid architecture: cycle-accurate profiling for free

    .text

    main

    addQuery

    findMatch

    computeKey

    computeBase

    coreLoop

    fillQuery

    Rnd


    Method

    Address

    Range

    Liquid architecture: cycle-accurate profiling for free

    .text

    main

    Lo

    addQuery

    0x4000027C

    0x400003EF

    Hi

    findMatch

    computeKey

    computeBase

    coreLoop

    fillQuery

    Rnd


    Liquid architecture: cycle-accurate profiling for free

    Method

    Event Monitor Bus

    PC

    CLK

    .text

    Stats Module

    main

    0x4000035A

    Lo

    addQuery

    0x4000027C

    0x400003EF

    Hi

    findMatch

    computeKey

    computeBase

    coreLoop

    fillQuery

    Rnd


    Liquid architecture: cycle-accurate profiling for free

    Function

    Event Monitor Bus

    PC

    CLK

    .text

    Stats Module

    Lo

    main

    0x4000027C

    0x4000035A

    0x400003EF

    Hi

    addQuery

    Counter

    findMatch

    INCR

    computeKey

    computeBase

    coreLoop

    fillQuery

    Rnd


    Liquid architecture: cycle-accurate profiling for free

    Function

    Event Monitor Bus

    PC

    CLK

    .text

    Stats Module

    Lo

    main

    0x4000027C

    0x4000035A

    0x400003EF

    Hi

    addQuery

    Counter

    findMatch

    INCR

    computeKey

    computeBase

    Lo

    0x400005D8

    0x4000035A

    0x4000061F

    Hi

    coreLoop

    fillQuery

    Counter

    INCR

    Rnd


    Cache behavior

    Hits and misses in LEON


    Cache behavior

    These signals are fed into the Event Monitoring Bus


    Cache behavior

    Statistics

    Module


    Cache behavior

    Statistics

    Module

    Statistics Module counts events


    Liquid architecture enables fast accurate results
    Liquid architecture enables fast, accurate results

    Seconds: fast, but no cache performance data available


    Liquid architecture enables fast, accurate results

    Days: so slow you wouldn’t do this on the whole program


    Liquid architecture enables fast, accurate results

    ½ hour: Practical, reasonably fast, totally accurate




    Port of liquid architecture to ramp
    Port of Liquid Architecture to RAMP

    • Determine optimal architecture for applications

    • Determine effect on performance withadditional parameters of

      • Interconnect bandwidth

      • Interconnect latency

    • Automate process to vary RDL parametersfor Multi-Processor system


    ad