liquid architecture r esults of optimization of the leon
Download
Skip this Video
Download Presentation
Liquid Architecture – R esults of Optimization of the LEON

Loading in 2 Seconds...

play fullscreen
1 / 32

Liquid Architecture – R esults of Optimization of the LEON - PowerPoint PPT Presentation


  • 38 Views
  • Uploaded on

John W Lockwood Visiting Associate Professor Stanford University [email protected] http://stanford.edu/~jwlockwd Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Liquid Architecture – R esults of Optimization of the LEON ' - ulysses-mcdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
liquid architecture r esults of optimization of the leon

John W LockwoodVisiting Associate Professor

Stanford [email protected]

http://stanford.edu/~jwlockwd

Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al.

In Collaboration with: Ron Cytron, Roger Chamberlain, Jason Fritts David Schuehler, Phillip Jones, Scott Friedman, Ben Brodie, Huakai Zhang

Funded by : NSF 03-13203

Washington University in St. Louis

www.arl.wustl.edu/arl/projects/fpx/projects/liquid_arch/

Liquid Architecture –Results of Optimization of the LEON

soft processors
Soft processors

App performance

  • Parameterized general purpose processors

FPGA resources

Power

Number of registers

Set size

Set Associativity

  • Customization is performance-cost tradeoff
  • More “knobs” more options for customization
soft processor customization
Soft processor customization
  • LEON: 10 reconfigurable subsystems
    • Instruction cache
      • Parameters: sets, set size, line size, replacement policy
        • 4 * 7 * 2 * 3 = 168 configurations (4 parameters; 16 values)
    • Data cache
      • sets, set size, line size, replacement, fast read, fast write, local RAM, local RAM size
        • 168 * 2 * 2 * 2 * 7 = 9,408 configns (8 params; 29 values)
    • Integer unit
      • multiplier, registers, fast jump, fast decode, ICC, load delay, FPU enable, co-processor enable, hardware watchpoints
        • 119,040 configurations (10 parameters; 56 values)
    • & Floating-point unit, memory controller, peripherals,…
  • In total
    • 190 parameter values; 5*(1024) configurations!!
highlights of optimization technique
Highlights of optimization technique
  • Optimize and Bound Parameterize
      • Search space still includes all 5*(1024) configurations
      • Build only 100’s instead of 5*(1024) of configurations
  • Formulate as binary integer nonlinear optimization program
  • Measure actual cost and performance of resulting application performance andprocessor area
cost measurement
Cost measurement
  • Application runtime
    • Measured in cycles from direct execution
      • Hardware-based profiler providesnon-intrusive, cycle-accurate, near-real-time measurement
  • FPGA resources
    • Measured In terms as number of LUTs and BRAM, from actual build
  • Power / Energy : Work in Progress
    • FPL, FPT, VLSI conference ..
optimization technique
Optimization technique

Start with soft processor:

base configuration

Perturb parameter values,

build configuration, track resource cost

Run application on each configuration,

trackruntime cost

Formulate costs as

Binary Integer NonlinearProgram

Solve using TOMLAB/MatLab

slide10

Processor ICache reconfiguration

xi = 0 or 1

(off or on)

slide11

Processor ICache reconfiguration

xi = 0 or 1

(off or on)

slide12

Processor ICache reconfiguration

xi = 0 or 1

(off or on)

No constraint needed

fpga resource constraints
FPGA resource constraints
  • LUTs
  • BRAM

xi = 0 or 1

(off or on)

ri,li,bi: delta costs from base configuration

n is number of

configurations

optimization
Optimization
  • Optimize application runtime
  • Optimize resource utilization also

xi = 0 or 1

(off or on)

ri,li,bi: delta costs from base configuration

n is number of

configurations

problem formulation
Problem formulation
  • Minimize
  • Subject to

Parameter validity constraints

FPGA resource constraints

Binary variables constraint

xi = 0 or 1

(off or on)

ri,li,bi: delta costs from base configuration

n is number of

configurations

evaluation
Evaluation

Our technique selects the same configuration

Despite parameter independence assumption, near-optimal configuration

slide17

FPX Module(Stacked below Enet card)

Virtex w/LEON core

Gigabit

Ethernet

Serial

port

Front-panelNetworkPort

slide18

Choose methods to profile from the user interface

Method

Time /

Cycles

Liquid architecture: cycle-accurate profiling for free

.text

main

addQuery

findMatch

computeKey

computeBase

coreLoop

fillQuery

Rnd

slide19

Method

Address

Range

Liquid architecture: cycle-accurate profiling for free

.text

main

Lo

addQuery

0x4000027C

0x400003EF

Hi

findMatch

computeKey

computeBase

coreLoop

fillQuery

Rnd

slide20

Liquid architecture: cycle-accurate profiling for free

Method

Event Monitor Bus

PC

CLK

.text

Stats Module

main

0x4000035A

Lo

addQuery

0x4000027C

0x400003EF

Hi

findMatch

computeKey

computeBase

coreLoop

fillQuery

Rnd

slide21

Liquid architecture: cycle-accurate profiling for free

Function

Event Monitor Bus

PC

CLK

.text

Stats Module

Lo

main

0x4000027C

0x4000035A

0x400003EF

Hi

addQuery

Counter

findMatch

INCR

computeKey

computeBase

coreLoop

fillQuery

Rnd

slide22

Liquid architecture: cycle-accurate profiling for free

Function

Event Monitor Bus

PC

CLK

.text

Stats Module

Lo

main

0x4000027C

0x4000035A

0x400003EF

Hi

addQuery

Counter

findMatch

INCR

computeKey

computeBase

Lo

0x400005D8

0x4000035A

0x4000061F

Hi

coreLoop

fillQuery

Counter

INCR

Rnd

slide23

Cache behavior

Hits and misses in LEON

slide24

Cache behavior

These signals are fed into the Event Monitoring Bus

slide25

Cache behavior

Statistics

Module

slide26

Cache behavior

Statistics

Module

Statistics Module counts events

liquid architecture enables fast accurate results
Liquid architecture enables fast, accurate results

Seconds: fast, but no cache performance data available

slide28

Liquid architecture enables fast, accurate results

Days: so slow you wouldn’t do this on the whole program

slide29

Liquid architecture enables fast, accurate results

½ hour: Practical, reasonably fast, totally accurate

port of liquid architecture to ramp
Port of Liquid Architecture to RAMP
  • Determine optimal architecture for applications
  • Determine effect on performance withadditional parameters of
    • Interconnect bandwidth
    • Interconnect latency
  • Automate process to vary RDL parametersfor Multi-Processor system
ad