1 / 32

Liquid Architecture – R esults of Optimization of the LEON

John W Lockwood Visiting Associate Professor Stanford University jwlockwd@stanford.edu http://stanford.edu/~jwlockwd Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al.

Download Presentation

Liquid Architecture – R esults of Optimization of the LEON

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. John W LockwoodVisiting Associate Professor Stanford Universityjwlockwd@stanford.edu http://stanford.edu/~jwlockwd Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al. In Collaboration with: Ron Cytron, Roger Chamberlain, Jason Fritts David Schuehler, Phillip Jones, Scott Friedman, Ben Brodie, Huakai Zhang Funded by : NSF 03-13203 Washington University in St. Louis www.arl.wustl.edu/arl/projects/fpx/projects/liquid_arch/ Liquid Architecture –Results of Optimization of the LEON

  2. Soft processors App performance • Parameterized general purpose processors FPGA resources Power Number of registers Set size Set Associativity • Customization is performance-cost tradeoff • More “knobs” more options for customization

  3. Actual System Configuration

  4. Soft processor customization • LEON: 10 reconfigurable subsystems • Instruction cache • Parameters: sets, set size, line size, replacement policy • 4 * 7 * 2 * 3 = 168 configurations (4 parameters; 16 values) • Data cache • sets, set size, line size, replacement, fast read, fast write, local RAM, local RAM size • 168 * 2 * 2 * 2 * 7 = 9,408 configns (8 params; 29 values) • Integer unit • multiplier, registers, fast jump, fast decode, ICC, load delay, FPU enable, co-processor enable, hardware watchpoints • 119,040 configurations (10 parameters; 56 values) • & Floating-point unit, memory controller, peripherals,… • In total • 190 parameter values; 5*(1024) configurations!!

  5. Manual Adjustment of LEON Parameters

  6. Highlights of optimization technique • Optimize and Bound Parameterize • Search space still includes all 5*(1024) configurations • Build only 100’s instead of 5*(1024) of configurations • Formulate as binary integer nonlinear optimization program • Measure actual cost and performance of resulting application performance andprocessor area

  7. Cost measurement • Application runtime • Measured in cycles from direct execution • Hardware-based profiler providesnon-intrusive, cycle-accurate, near-real-time measurement • FPGA resources • Measured In terms as number of LUTs and BRAM, from actual build • Power / Energy : Work in Progress • FPL, FPT, VLSI conference ..

  8. Optimization technique Start with soft processor: base configuration Perturb parameter values, build configuration, track resource cost Run application on each configuration, trackruntime cost Formulate costs as Binary Integer NonlinearProgram Solve using TOMLAB/MatLab

  9. Processor ICache reconfiguration

  10. Processor ICache reconfiguration xi = 0 or 1 (off or on)

  11. Processor ICache reconfiguration xi = 0 or 1 (off or on)

  12. Processor ICache reconfiguration xi = 0 or 1 (off or on) No constraint needed

  13. FPGA resource constraints • LUTs • BRAM xi = 0 or 1 (off or on) ri,li,bi: delta costs from base configuration n is number of configurations

  14. Optimization • Optimize application runtime • Optimize resource utilization also xi = 0 or 1 (off or on) ri,li,bi: delta costs from base configuration n is number of configurations

  15. Problem formulation • Minimize • Subject to … … Parameter validity constraints FPGA resource constraints Binary variables constraint xi = 0 or 1 (off or on) ri,li,bi: delta costs from base configuration n is number of configurations

  16. Evaluation Our technique selects the same configuration Despite parameter independence assumption, near-optimal configuration

  17. FPX Module(Stacked below Enet card) Virtex w/LEON core Gigabit Ethernet Serial port Front-panelNetworkPort

  18. Choose methods to profile from the user interface Method Time / Cycles Liquid architecture: cycle-accurate profiling for free .text main addQuery findMatch computeKey computeBase coreLoop fillQuery Rnd

  19. Method Address Range Liquid architecture: cycle-accurate profiling for free .text main Lo addQuery 0x4000027C 0x400003EF Hi findMatch computeKey computeBase coreLoop fillQuery Rnd

  20. Liquid architecture: cycle-accurate profiling for free Method Event Monitor Bus PC CLK .text Stats Module main 0x4000035A Lo addQuery 0x4000027C 0x400003EF Hi findMatch computeKey computeBase coreLoop fillQuery Rnd

  21. Liquid architecture: cycle-accurate profiling for free Function Event Monitor Bus PC CLK .text Stats Module Lo main 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi addQuery Counter findMatch INCR computeKey computeBase coreLoop fillQuery Rnd

  22. Liquid architecture: cycle-accurate profiling for free Function Event Monitor Bus PC CLK .text Stats Module Lo main 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi addQuery Counter findMatch INCR computeKey computeBase Lo 0x400005D8 0x4000035A 0x4000061F ≤ ≤ Hi coreLoop fillQuery Counter INCR Rnd

  23. Cache behavior Hits and misses in LEON

  24. Cache behavior These signals are fed into the Event Monitoring Bus

  25. Cache behavior Statistics Module

  26. Cache behavior Statistics Module Statistics Module counts events

  27. Liquid architecture enables fast, accurate results Seconds: fast, but no cache performance data available

  28. Liquid architecture enables fast, accurate results Days: so slow you wouldn’t do this on the whole program

  29. Liquid architecture enables fast, accurate results ½ hour: Practical, reasonably fast, totally accurate

  30. Illustration of all configurations being searched

  31. Distribution of generated configurations

  32. Port of Liquid Architecture to RAMP • Determine optimal architecture for applications • Determine effect on performance withadditional parameters of • Interconnect bandwidth • Interconnect latency • Automate process to vary RDL parametersfor Multi-Processor system

More Related