1 / 18

Compiler Supports and Optimizations for PAC VLIW DSP Processors

This paper presents compiler supports and optimizations for PAC VLIW DSP processors, including optimization issues, preliminary compiler supports, and experimental results. It also discusses the PAC DSP architecture and its unique features such as the innovative register file structure.

Download Presentation

Compiler Supports and Optimizations for PAC VLIW DSP Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Supports and Optimizations for PAC VLIW DSP Processors Y.-C. Lin C.-L. Tang C.-J. Wu M.-Y. Hung Y.-P. You Y.-C. Moo S.-Y. Chen and J.-K. Lee National Tsing-Hua University Taiwan

  2. Outline • PAC VLIW DSP Architectures • Optimization Issues • Preliminary Compiler Supports • Experimental Results • Conclusion LCPC2005

  3. Introduction • Parallel Architecture Core (PAC) is designed by SoC Technology Center, ITRI, Taiwan. • 32bit, fixed-point, 5-way issue VLIW DSP • scalable architecture • optimized instruction set for audio/video/image • innovative register file structure • two generations developed • TSMC’s 0.13 μm technology (taped-out in Aug. 2005) High-performance Low-power LCPC2005

  4. Key Issues • Deploy the general-purpose high-performance open source compiler for DSP processors • ORC  PAC DSP • Address issues for fragmentary register banks of DSP processors • Methods for irregular register constraints and instruction scheduling LCPC2005

  5. Cluster Cluster Cluster Cluster Cluster B-Unit I-Unit I-Unit M-Unit M-Unit A Registers A Registers A Registers A Registers A Registers A Registers A Registers A Registers A Registers B-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit M-Unit B-Unit B-Unit B-Unit B-Unit D Registers D Registers D Registers D Registers D Registers D Registers D Registers Extend More Clusters I-Unit I-Unit I-Unit I-Unit I-Unit I-Unit I-Unit I-Unit I-Unit I-Unit I-Unit M-Unit R Registers R Registers R Registers R Registers AC Registers AC Registers AC Registers AC Registers AC Registers AC Registers AC Registers AC Registers AC Registers I-Unit I-Unit PAC DSP Overview • Cluster Design: • Scalability • Explicit Inter-Cluster Data Transfer Instructions • Five-Way Issues: • 1 Scalar/Control Unit (B) • 2 Arithmetic Unit (I) • 2 Load/Store Unit (M) • Distributed Register Files: • 5 Local Register Files (A, AC, R) • 2 Global Register Files (D) • Other Features: • 8-bit/16-bit SIMD operations • Variable instruction word/bundle length • Dynamic Power Management • Standard AMBA interface A Registers A Registers B-Unit R Registers AC Registers AC Registers LCPC2005

  6. So called as Ping-pong! Load I-Unit M-Unit Compute Load Store Compute M-Unit and I-Unit operate on different data streams at the same time! Store Ping-pong Register File Structure • Used by Global Register File (D) • Concept: • Overlap processing different data streams in a cluster • Benefit: • Decrease the port number for low-power and size LCPC2005

  7. M-Unit M-Unit M-Unit Bank 1 Bank 1 Bank 2 Bank 2 Bank 2 Bank 1 I-Unit I-Unit I-Unit Ping-pong Register Access • Each ‘D’ register file contains 2 banks. • Rules: • Access by one unit to the 2 banks is mutually-exclusivein a cycle. • M-Unit and I-Unit can only access to different banks in a cycle. Instructional Switcher Only 1 state for each cycle! LCPC2005

  8. We need to schedule into 2 bundles since they use the same bank! For compilers optimizations: Better register (file/bank) allocation  Better schedule in fewer bundles Issues for Ping-pong Registers(1) Lw D8, A0 Add D1,D0,AC0 • Example for ping-pong usage: • Able toform a bundle • Unable toform a bundle Lw D2, A0 Add D1,D0,AC0 LCPC2005

  9. Lw D8, A0 Add D1,D0,AC0 Need cross ping-pong communication! Additional copy-operation needed! Sw D1, A0 Sub D9,D8,D1 Mov AC1, D1 Sw D1, A0 Sub D9,D8,AC1 Invalid operation! Issues for Ping-pong Registers(2) • Data transfer between ping-pong banks: • For compiler optimizations: • Well-handle data-communication between ping-pong banks within any code manipulation • Generate additional copy-operation as few as possible LCPC2005

  10. A B C D Additional Cross-Cluster Copy E F Cluster2 Cluster1 G Issues for Inter-cluster Communication • To exploit cluster parallelism: • PAC needs explicit instruction to be issued for inter-cluster communication! Cluster1 Cluster2 B-Unit A B C D • Optimize code partitioning: • Fewer communication • Better scheduling E F G LCPC2005

  11. More Considerations • Two optimized codes of the same performance: • Upper  Smaller code size • Lower  Lower power consumption LCPC2005

  12. Compiler Supports for PAC DSP • Essential supports (IA-64 ORC  PAC) • New Target_Info • PAC Architecture and ISA descriptions • Complicated hazard descriptions • PAC application-binary-interface (ABI) • data type mapping • memory usage layout • register usage conventions • calling conventions • PAC code generation • 32-bit WHIRL code generation • PAC WHIRL-to-CGIR procedures • PAC assembly code emission LCPC2005

  13. Register Allocation Instruction Scheduling Code Insertion for Distributed Register Communication Simulated-Annealing (SA) Based Register Allocation Approach • Motivation: • Complex interference from: • We appreciate a machine-learning method to give a near-optimal results. • To be a base reference for developing heuristic methods! LCPC2005

  14. To Determine: Virtual Register  Register File (Bank) • Input: un-scheduled instructions • Output: a schedule of the instructions a register file assignment (RFA) map • RFA map = {(v1, f1), (v2, f2), ...} • Where vi : a virtual register, fi : a register file (bank) • PAC_Scheduler: • Graph-coloring based register allocation according to the RFA map • Instruction scheduling and code insertion for register file communication • Setup SA: • An initial random RFA map • schedule_len = PAC_Scheduler ( initial RFA map ) • SA control variables: • threshold • p_test: a probability test value (0 < p_test < 1). • energy: initial value > threshold. LCPC2005

  15. new RFA map Re-run: new_schedule_len = PAC_Scheduler (new RFA map) Randomly change: a mapping (vi, fi) yes SA stop test: energy > threshold Better result test: new_schedule_len < schedule_len new RFA map yes energy--schedule_len = new_schedule_len no no yes Random test: a random number > p_test FinalRFA map & schedule old RFA map energy++ no To Optimize: Scheduling Result LCPC2005

  16. Preliminary Experimental Results (DSPStone benchmarks) LCPC2005

  17. Related Works • Register Allocation • R. Leupers: Instruction scheduling for clustered VLIW DSPs. In Proc. Int’l Conference on Parallel Architecture and Compilation Techniques, pages 291–300, Oct. 2000 • Register File Organizations • S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens: Register organization for media processing. International Symposium on High Performance Computer Architecture (HPCA), pp.375-386, 2000 • Tay-Jyi Lin, Chin-Chi Chang. Chen-Chia Lee, and Chein-Wei Jen: An Efficient VLIW DSP Architecture for Baseband Processing. Proceedings of the 21th International Conference on Computer Design, 2003 LCPC2005

  18. Conclusion • We developed a compiler prototype for a new VLIW DSP architecture, called as PAC. • Based on ORC • New optimization issues by the irregular hardware design • Highly distributed register files • Port-access restricted ping-pong structures • A SA approach employed to obtain a preliminary result of exploiting register allocation on PAC • We will extend our works on the upcoming next version of PAC DSP. LCPC2005

More Related