1 / 26

Microblaze Performance Monitoring Engine

Microblaze Performance Monitoring Engine. Alex Burns ECE 631 April 25, 2005. Outline. Project description Microblaze performance Xilinx tools overview Microblaze implementation Performance monitoring Stream processing engine Implementation results. Description.

Download Presentation

Microblaze Performance Monitoring Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microblaze Performance Monitoring Engine Alex Burns ECE 631 April 25, 2005

  2. Outline • Project description • Microblaze performance • Xilinx tools overview • Microblaze implementation • Performance monitoring • Stream processing engine • Implementation results

  3. Description • Implement a Microblaze soft processor • Use a Virtex 2 Pro evaluation board • Connect a performance monitoring engine to relevant signals of the processor

  4. Microblaze Core Diagram Memory Interface Memory Interface Processor Core

  5. Pipeline • 3-stage pipeline • “Most” instructions require 1 clock cycle execution • ISA supports branch delay slot • Decreases branch penalty from 2 clock cycles to 1 • Instruction pre-fetch buffer continues to fetch during pipeline stalls

  6. Pipeline Latency 1 cycle (if branch is not taken) 2 cycles (taken with delay slot) 3 cycles (taken without delay slot) • Branch instructions • Division • Multiply • Any Load/Store or FSL transaction • Instruction Break or Subroutine return 2 cycles (if register A = 0) 34 cycles 3 cycles 2 cycles 2 cycles

  7. Performance • Soft architecture trades configurability for limited performance • Due to limited performance, software optimization offers great potential • Need to monitor software algorithms for efficiency to achieve the most performance for a given logic area • Logic could be added to improve performance • Designer must decide if this is necessary

  8. Xilinx Platform Studio System Setup System Description Create system description files

  9. System Creation 1 • Use System Description (*.mhs) to generate a Netlist • Setup project hierarchy as microblaze is a submodule to a Xilinx Project Navigator entity • Create Board Support Libraries for given system description • Create HDL using Project Navigator to attach logic system description • Perform Synthesis, Mapping, Place and Route, Bit File creation using Project Navigator

  10. System Creation 2 • Return to Platform Studio to attach compiled software to bitstream • Load bitstream to FPGA • Attach Xilinx debugger (XMD) through JTAG port and attach process to configured Microblaze within FPGA • Load compiled software into instruction memory using XMD • Cross fingers • Run code

  11. Project Goal 1 • Implement a Microblaze • Synthesized all logic • Mapped to Memec evaluation board containing a Virtex 2 Pro (XCS2VP4) • Successfully ran test program which tests all IO, memory and displays output through UART

  12. Performance Monitoring Plan • Detect cache parameters for given algorithm • Monitor Instruction side memory bus for accesses • Store accesses into VHDL counter • Read counter upon completion of micro-benchmark

  13. Cache Operation • Detect if address is cacheable • If cacheable, lookup in tag memory • If tag matches and valid bit is set, drive the ready signal (Cache Hit) • On cache miss, the cache waits for the OPB to fetch the data from memory • does not assert ready signal

  14. Trace Interface The geniuses at Xilinx already thought of the need to monitor performance of the configurable Microblaze

  15. New Project Goal 2 • Use given performance monitoring tools to connect to Stream Processing Engine

  16. What is a Stream • A stream is a block of instructions stored consecutively in memory and executed without branches for (i=0; i<30; i++) a += c[i]; Streams Functional Blocks Initialize Stream 1 Stream 2 Stream 3 Execute Terminate

  17. Stream Processor Stream Processor ID IDValid Detector NewStreamValid Start Address Stream Collector Length FIFO Clock Reset PC Hash Table

  18. System Outline

  19. Design Steps • Develop UART • Modified an existing design from Opencores.org • Runs at 115K baud • Transmitter only • Connected UART to Stream Processor output

  20. Status • Working Microblaze running board test software • Software could be easily changed to any microbenchmark • Working Stream Processor • Working UART

  21. Design Problems • Microblaze requires 26 block rams with selected configuration • Stream Processor requires 39 block rams due to enormous hash table (4K x 49) and other FIFOs (8 x 48) • Available V2Pro FPGA contains 28 block rams and no external ram

  22. Design Mitigation • Decrease hash table within FPGA by not storing entire PC • Decrease hash table by shortening maximum stream length to be detected • Removed cache in Microblaze • 28 required Block Rams now fit into FPGA

  23. More Problems • Branch occurs every 8 instructions on average • Microblaze runs at 100MHz • UART runs at 115K with ascii data requiring 10 bits per character (with start and stop bits)

  24. Problem Mitigation 2 • Use FPGA Digital Clock Manager to decrease clock speed • Processor clock slows while UART stays at maximum speed which decreases bandwidth requirement out of the Stream Processor • Didn’t work • JTAG connection required to debug Microblaze no longer connected

  25. Future Work • Use a much higher bandwidth communication link to send data out of the stream processor • IDT FIFO, external memory, USB, Ethernet • Use an FPGA with more available BRAMs to avoid performance hit in stream hash table

  26. Questions

More Related