1 / 25

Dynamic Optimization using ADORE Framework 10/22/2003

Dynamic Optimization using ADORE Framework 10/22/2003. Wei Hsu Computer Science and Engineering Department University of Minnesota. Background. Compiler Optimization: The phases of compilation that generates good code to make as efficiently use of the target machines as possible.

tuari
Download Presentation

Dynamic Optimization using ADORE Framework 10/22/2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Optimization using ADORE Framework10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota

  2. Background • Compiler Optimization: The phases of compilation that generates good code to make as efficiently use of the target machines as possible. • Static Optimization: Compile time optimization – one time, fixed optimization that will not change after distribution. • Dynamic Optimization: Optimization performed at program execution time – adaptive to the execution environment.

  3. Examples of Compiler Optimizations Ld R1,(R2) Add R3,R1,R4 Ld R5,(R6) Add R7,R5,R4 Ld R1,(R2) Ld R5,(R6) Add R3,R1,R4 Add R7,R5,R4 • Instruction scheduling • Cache prefetching Frequent data cache misses !! Ld R1,(R2) prefetch 256(R2) Addi R2,R2,64 Add R3,R1,R4 Ld R1,(R2) Addi R2,R2,64 Add R3,R1,R4

  4. Is Compiler Optimization Important ? • In the last 15 years, the computer performance has increased by ~1000 times. • Clock rate increased by ~100 X • Micro-architecture contributed ~5X (the number of transistors doubles every 18 months) • Compiler optimization added ~2-3X for single processors (some overlap between clock rate and micro-architectures, and some overlap between micro-architecture and compiler optimizations)

  5. Speed up from Compiler Optimization

  6. Speed up from Compiler Optimization

  7. Excellent Benchmark Performance

  8. Mediocre Application Performance • Many application binaries not optimized by compilers. • ISV releases one binary for all machines in the same architecture (e.g. P5), but the binary may not run efficiently on the user’s machine (e.g. P6). • ISV might have optimized code with some profiles exercising different parts of the application than what is actually executed. • Application is built from many shared libraries, but no cross-library optimizations. Performance not effectively delivered for end-users!!

  9. Examples of Compiler Optimizations What if the load latency is 4 clocks instead of 2? Ld R1,(R2) Add R3,R1,R4 Ld R5,(R6) Add R7,R5,R4 Ld R1,(R2) Ld R5,(R6) Add R3,R1,R4 Add R7,R5,R4 • Instruction scheduling • Cache prefetching Does the compiler know where are data cache misses? Ld R1,(R2) prefetch 256(R2) Addi R2,R2,64 Add R3,R1,R4 Ld R1,(R2) Addi R2,R2,64 Add R3,R1,R4

  10. A Case for Dynamic Optimization • Execution environment can be quite different from the assumption made at compile time. • Code should be optimized for the machine it runs on • Code should be optimized by how the code is used • Code should be optimized when all executables are available • Code should be optimized only the part that matters

  11. ADOREADaptive Object code RE-optimization • The goal of ADORE is to create a system that transparently finds and optimizes performance critical code at runtime. • Adapting to new micro-architectures • Adapting to different user environments • Adapting to dynamic program behavior • Optimizing shared library calls • A prototype ADORE has been implemented on the Itanium/Linux platform.

  12. DynOpt Thread Main Thread User Event Buffer (UEB) Trace Selector Main Program Optimizer Phase Detector Patcher Optimized Trace Pool Kernel Space System Sample Buffer (SSB) Framework of ADORE

  13. Current Optimizations in ADORE • We have implemented • Data cache prefetching • Trace selection and layout • We are investigating and testing the following optimizations • Instruction scheduling with control and data speculation • Instruction cache prefetching • Partial dead code elimination

  14. Performance Impact of O2/O3 Binary

  15. Optimizing BLAST with ADORE • BLAST is the most popular tool used in bioinformatics. Several faculty members and research colleagues are using it. • Used as a benchmark by companies to test their latest systems and processors • The performance of BLAST matters.

  16. Speedup from BLAST queries

  17. Observations from BLAST • ADORE is robust. It can handle real, large application code. • ADORE does not speed up all queries, since the code is already running quite efficiently on Itanium systems. It adds about 1-2% of profiling and optimization overhead. • ADORE does speed up one long query by 30%. • It is difficult to further improve performance of BLAST by static compilers.

  18. Future Direction of ADORE • Show more performance on more real applications • Make ADORE more transparent • Compiler independent • Exception handling • Study the impact of compiler annotations • Study architectural/Micro-architectural support for ADORE

  19. ADORE Group • Professors • Prof. Wei-Chung Hsu • Prof. Pen-Chung Yew • Dr. Bobbie Othmer • Graduate Students • Abhinav Das • Dwarakanath Rajagopal • Ananth Lingamneni • Vijayakrishna Griddaluru • Amruta Inamdar • Aditya Saxena • Howard Chen • Jiwei Lu • Jinpyo Kim • Sagar Dalvi • Rao Fu • WeiChuan Dong

  20. Summary • Dynamic Binary Optimization customizes performance delivery. • The ADORE project at U. of Minnesota is a research dynamic binary optimizer. It demonstrates a good performance potential. • With architecture/micro-architecture and static compiler support, a future dynamic optimizer could be more effective, more adaptive and more applicable.

  21. Conclusion Be Adaptive !! Be Dynamic !!

  22. Dynamic Translation • Fast Simulation • SimOS (Stanford), SHADE (SUN) • Migration • DAISY, BOA (IBM), Virtual PC, ARIES (HP), Crusoe (Transmeta) • Internet applications • Java HotSpot, MS dot NET • Performance Tools (dynamic instrumentation) • Paradyn and EEL (UW), Caliper (HP) • Optimization • Dynamo, Tinker (NCSU), Morph (Harvard), DyC (UW)

More Related