1 / 21

Presenter: Shao -Jay Hou

Real-Time Address Trace Compression for Emulated and Real System-on-Chip Processor Core Debugging Bojan Mihajlovi´c , Željko Žili´c McGill University Dept. of Electrical and Computer Engineering Montreal, Quebec, Canada GLSVLSI’11, May 2–4, 2011. Presenter: Shao -Jay Hou. Abstract.

zareh
Download Presentation

Presenter: Shao -Jay Hou

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-Time Address Trace Compression for Emulated and Real System-on-Chip Processor Core DebuggingBojanMihajlovi´c, ŽeljkoŽili´cMcGill UniversityDept. of Electrical and Computer EngineeringMontreal, Quebec, CanadaGLSVLSI’11, May 2–4, 2011 Presenter: Shao-Jay Hou

  2. Abstract • In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. We consider on-chip trace compression performed in hardware to reduce data volume, using techniques that exploit inherent higher-order redundancy in address trace data. While hardware trace compression is often restricted to poor or moderate performance due to area and memory constraints, we present a parameterizable scheme that leverages the re- sources already found on existing platforms. Harnessing resources such as existing trace buffers on CPUs, and unused embedded memory on FPGA emulation platforms, our trace compression scheme requires only a small additional hardware area to achieve superior compression ratios.

  3. What’s the problem? • MPSoCs multi-threaded program • Traditional debug method can’t be use • Non-invasive method is a good way(on-chip emulation) • immense amount of data that must be either stored on-chip or transferred off-chip in real-time • trace of a 32-bit processor, 1 clock per instruction, 100 MHz 400 MB/s data • Data need to be compressed

  4. Related work This Paper Some example tools Trace compression schemes Compression methods Compression algorithms[5] Lempel-Ziv(LZ) [18] Multi-stage compression [11] DMTF [17] Combine MTF and LZ [1] ARM ETM[2] MCDS[12]

  5. Proposes method

  6. Compression flow

  7. Consecutive Address Elimination • Why? • instructions consecutively until a branch is reached • Branch target address • How? • Divided into two part • address • length • Example:

  8. Compression flow

  9. Finite Context Method • Why? • Branch will be taken or not taken • Sequential locality • How? • similar to a cache • miss the first time a set of instructions is encountered • hit for every subsequent encounter that matches the prediction

  10. Compression flow

  11. Move-to-Front & Address Encoding • Why? • MTF • Increase the relevance • Prefix • Assist for differential compression • How? • Input address and predicted address • Differential compression

  12. Compression flow

  13. Run-length and Prefix Encoding • Why? • Prefix byte compression • Probability of prefix • How? • Huffman encoding

  14. Compression flow

  15. Data Stream Serializer • Why? • The input for data form MTF/AE stage is 5bytes • But the output to LZ stage is 1byte • How? • Use a little buffer to save

  16. Compression flow

  17. Lempel-Ziv Encoding of Data Stream • Why? • The input data has high Repeatability • How? • Use LZ compression • Create a dictionary to save the repeat part • But don’t output the dictionary • While decompression, create a same dictionary • Don’t output every cycle

  18. Experimental Results • Benchmark : Mibench • CPU: Apple PowerMac G4 with a 1.25 GHz PowerPC 7455, 32-bit fixed instruction-length processor, Linux SMP kernel 2.6.32-24. • Simulation software: ModelSimSE-64 v6.5c

  19. Experimental Results(cont.) • Logic utilization • Usage Scenario • JTAG • software fault 10-3

  20. Conclution • This paper presented a parameterizablemicroarchitecturefor address trace compression, suited to implementation on ASICs and modern FPGAs. • Better compression ratio to others

  21. My comment • The paper use a dictionary base, multi-stage compression method, can be use to improve our tracer. • The paper give a inspiration for future work for our tracer CPU GPU P.T. P.T. Bus T.M. B.T.

More Related