CGO 2006:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Conference Review Presented by: Ivan Matosevic PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006. Conference Review Presented by: Ivan Matosevic. Outline. Conference overview Brief summaries of sessions Keynote speeches Best paper. Conference Overview.

Download Presentation

Conference Review Presented by: Ivan Matosevic

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Conference review presented by ivan matosevic

CGO 2006:The Fourth International Symposium on Code Generation and OptimizationNew York, March 26-29, 2006

Conference Review

Presented by: Ivan Matosevic


Outline

Outline

  • Conference overview

  • Brief summaries of sessions

  • Keynote speeches

  • Best paper


Conference overview

Conference Overview

  • Primary focus: back-end compilation techniques

    • Static analysis and optimization

    • Profiling

    • Run-time techniques

  • 8 sessions, 29 papers

  • Dominating topics: multicores, dynamic compilation


Overview of session

Overview of Session

  • Dynamic Optimization

  • Object-Oriented Code Generation and Optimization

  • Phase Detection and Profiling

  • Tiled and Multicore Compilation

  • Static Code Generation and Optimization Issues

  • SIMD Compilation

  • Optimization Space Exploration

  • Security and Reliability


Session 1 dynamic optimization

Session 1: Dynamic Optimization

  • Kim Hazelwood (University of Virginia), Robert Cohn (Intel), A Cross-Architectural Interface for Code Cache Manipulation

    • Pin dynamic instrumentation system with code cache

    • The paper describes an API for various operations with the code cache (callbacks, lookups, statistics, etc.)

  • Derek Bruening, Vladimir Kiriansky, Tim Garnett, Sanjeev Banerji (Determina Corporation), Thread-Shared Software Code Caches

    • Problem: sharing a code cache across multiple threads

    • Authors propose a fine-grained locking scheme

    • Evaluation using DynamoRIO


Session 1 dynamic optimization1

Session 1: Dynamic Optimization

  • Keith Cooper, Anshuman Dasgupta (Rice Univ.), Tailoring Graph-coloring Register Allocation For Runtime Compilation

    • Problem: register allocation in JIT compilers

    • Authors propose a novel lightweight graph-colouring technique

  • Weifeng Zhang, Brad Calder, Dean Tullsen (UC San Diego), A Self Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

    • Extension of the Trident event-driven dynamic optimization framework (previously proposed by the same authors)

    • Dynamic insertion of prefetching instructions based on run-time analysis


Session 2 object oriented code generation and optimization

Session 2: Object-Oriented CodeGeneration and Optimization

  • Suresh Srinivas, Yun Wang, Miaobo Chen, Qi Zhang, Eric Lin, Valery Ushakov, Yoav Zach, Shalom Goldenberg (Intel Corporation), Java JNI Bridge: An MRTE Framework for Mixed Native ISA Execution

    • Use a dynamic translator for the execution of native calls to one ISA on a different ISA’s Java platform

  • Kris Venstermans, Lieven Eeckhout, Koen De Bosschere (Ghent University), Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing

    • Use address bits on a 64-bit architecture to encode object type in order to save memory

    • Objects of the same type allocated in a contiguous (virtual) region


Session 2 object oriented code generation and optimization1

Session 2: Object-Oriented CodeGeneration and Optimization

  • Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan (IBM Canada), Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

    • The IBM TestaRossa JIT compiler

    • This paper focuses on code patching and profiling in a multi-threaded environment with a lot of class loading/unloading

  • Lixin Su, Mikko H Lipasti (University of Wisconsin Madison), Dynamic Class Hierarchy Mutation

    • Run-time reassignment of objects from one derived class to another, changing its virtual tables

    • Offers opportunity for optimizations based on specialization


Session 3 phase detection and profiling

Session 3: Phase Detection and Profiling

  • Priya Nagpurkar, (UCSB), Michael Hind (IBM), Chandra Krintz, (UCSB), Peter Sweeney, V.T. Rajan (IBM), Online Phase Detection Algorithms

    • Detecting phase behaviour in virtual machines

    • Track dynamic program parameters (methods invoked, branch directions…) over time and apply a similarity model

  • Jeremy Lau, Erez Perelman, Brad Calder (UC San Diego), Selecting Software Phase Markers with Code Structure Analysis

    • Portions of code whose execution correlates with phase changes

    • Procedure calls and returns, loop boundaries

    • Profile-based hierarchical loop-call graph


Session 3 phase detection and profiling1

Session 3: Phase Detection and Profiling

  • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara), Profiling over Adaptive Ranges

    • Voted best paper – details later

  • Hyesoon Kim, Muhammad Aater Suleman, Onur Mutlu, Yale N. Patt (UT-Austin), 2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set

    • Predicts whether the prediction accuracy of each branch will vary across input sets

    • Heuristic approach used to derive representative profiling results from a single input set


Session 4 tiled and multicore compilation

Session 4: Tiled and Multicore Compilation

  • David Wentzlaff, Anant Agarwal (MIT), Constructing Virtual Architectures on a Tiled Processor

    • Map components of a superscalar architecture (Pentium III) onto a parallel tiled architecture (Raw) using dynamic translation

    • In a way, uses Raw as a coarse-grain FPGA

  • Aaron Smith, (UT-Austin), J. Burrill, (UMass at Amherst), J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley (UT-Austin), Compiling for EDGE Architectures

    • TRIPS EDGE (Explicit Data Graph Execution) architecture

    • This paper focuses on compilation of standard C and FORTRAN benchmarks


Session 4 tiled and multicore compilation1

Session 4: Tiled and Multicore Compilation

  • Shih-wei Liao, Zhaohui Du, Gansha Wu, Guei-Yuan Lueh (Intel), Data and Computation Transformations for Brook Streaming Applications on Multiprocessors

    • Parallel compiler for the Brook streaming language

    • An extension of C that enables specifying data parallelism

  • Michael L. Chu, Scott A. Mahlke (University of Michigan), Compiler-directed Object Partitioning for Multicluster Processors

    • Partitioning of data in clustered architectures such as Raw

    • I didn’t really understand what programming model these authors have in mind?


Session 5 static code generation and optimization issues

Session 5: Static Code Generation andOptimization Issues

  • Two papers about the HPUX Itanium compiler:

    • Dhruva R. Chakrabarti, Shin-Ming Liu (Hewlett-Packard), Inline Analysis: Beyond Selection Heuristics

      • Cross-module techniques for selection of inlined call sites and the choice of specialized function versions

    • Robert Hundt, Dhruva R. Chakrabarti, Sandya S. Mannarswamy (Hewlett-Packard), Practical Structure Layout Optimization and Advice

      • Data layout and placement on the heap to improve locality

      • Structure splitting, structure peeling, dead field removal, and field reordering


Session 5 static code generation and optimization issues1

Session 5: Static Code Generation andOptimization Issues

  • Chris Lupo, Kent Wilken (University of California, Davis), Post Register Allocation Spill Code Optimization

    • Authors propose a profile-based algorithm for placement of save/restore instructions handling spilled variables in function calls

    • Implemented as a part of GCC

  • Seung Woo Son, Guangyu Chen, Mahmut Kandemir (Pennsylvania State University), A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality

    • Goal: restructure code so that disk idle periods are lengthened

    • The approach targets array-based programs: disk layout of array data exposed to the compiler


Session 6 simd compilation

Session 6: SIMD Compilation

  • Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel China Software Center), Optimizing Dynamic Binary Translation for SIMD Instructions

    • Algorithms for dynamic binary translation of SIMD instructions in general-purpose architectures (such as MMX in x86)

    • Evaluation using IA-32 binaries on Itanium 2

  • Dorit Nuzman (IBM), Richard Henderson (Red Hat), Multi-Platform Auto-Vectorization

    • Implementation of automatic vectorizer for GCC 4.0


Session 7 optimization space exploration

Session 7: Optimization-space Exploration

  • Felix Agakov, Edwin Bonilla, John Cavazos, Bjoern Franke, Grigori Fursin, Michael O'Boyle, Marc Toussaint, John Thomson, Chris Williams (U. of Edinburgh), Using Machine Learning to Focus Iterative Optimization

    • Predictive modelling used to search the optimization space

    • Targets embedded platforms – AMD Au1500 and Texas Instruments TI C6713

  • Prasad Kulkarni, David Whalley, Gary Tyson (Florida State University), Jack Davidson (University of Virginia), Exhaustive Optimization Phase Order Space Exploration

    • Exhaustive search of the phase order space (15 phases) using aggressive pruning; takes time on the order of minutes to hours

    • Targets StrongARM SA-100


Session 7 optimization space exploration1

Session 7: Optimization-space Exploration

  • Zhelong Pan, Rudolf Eigenmann (Purdue University), Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning

    • Problem: find the optimal combination of 38 GCC O3 options, targeting Pentium IV and Sparc II

    • Proposed heuristic algorithm that provides s quality solution in time on the order of several hours


Session 8 security and reliability

Session 8: Security and Reliability

  • Edson Borin, (UNICAMP), Cheng Wang, Youfeng Wu (Intel), Guido Araujo (UNICAMP), Software-Based Transparent and Comprehensive Control-Flow Error Detection

    • Addresses the problem of soft (transient) errors that cause branches to incorrect instructions

    • Implemented in SW as a part of a dynamic binary translator

  • Tao Zhang, Xiaotong Zhuang, Santosh Pande (Georgia Tech), Compiler Optimizations to Reduce Security Overheads

    • Optimizations that specifically target techniques that implement software protection with minimal HW support


Session 8 security and reliability1

Session 8: Security and Reliability

  • Susanta Nanda, Wei Li, Tzi-cker Chiueh (State University of NY at Stony Brook), BIRD: Binary Interpretation using Runtime Disassembly

    • Goal: framework for automatic detection of vulnerabilities such as buffer overflows when the source code is not available

    • Static and dynamic disassembly and instrumentation – targets Windows x86 application


Keynote speeches

Keynote Speeches

  • Wei Li, Principal Engineer, Intel: "Parallel Programming 2.0"

  • Kevin Stoodley, Fellow and CTO of Compilation Technology, IBM: "Productivity and Performance: Future Directions in Compilers"


Wei li parallel programming 2 0

Wei Li: Parallel Programming 2.0

  • Major technological change:

    • Moore’s Law continues to increase transistor counts

    • However: power, memory latency, limits to ILP are setting an effective performance ceiling

  • General trend towards thread-level on-chip parallelism

    • SMT

    • Chip multiprocessors


Wei li parallel programming 2 01

Wei Li: Parallel Programming 2.0

  • “Parallel Programming 2.0” refers to the advent of multicores

  • A very optimistic future vision:


Wei li parallel programming 2 02

Wei Li: Parallel Programming 2.0

  • Key issue – where will the parallelism come from?

  • Parallel programming needs to become more mainstream

    • Consumer vs. HPC/server/database

    • Inclusion into education at more elementary level

    • New tools for greater ease of programming

  • Intel’s parallel programming tools

    • http://www.intel.com/software


K stoodley productivity and performance future directions in compilers

K. Stoodley:"Productivity and Performance: Future Directions in Compilers"

  • Limits to traditional static compilation

  • Overview of IBM compiler technology

    • Testarossa JIT compiler, Toronto Portable Optimizer, Tobey backend

  • Challenges at present and near future

    • Software abstraction complexity – forces the scope of compilation to higher levels

    • Maintaining high performance backwards compatibility increasingly difficult


K stoodley productivity and performance future directions in compilers1

xlc

xlC

xlf

Front

Ends

class

class

jar

W-Code

J9 Execution Engine

(Java + Others)

CPO

Toronto Portable

Optimizer (TPO)

TOBEY

Backend

Testarossa

JIT

Dynamic

Machine

Code

Binary

Translation

Profile-Directed

Feedback (PDF)

Static

Machine

Code

K. Stoodley:"Productivity and Performance: Future Directions in Compilers"

  • Future: convergence/combination of dynamic and static compilation technologies


Best paper

Best Paper

  • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara): Profiling over Adaptive Ranges


Profiling over adaptive ranges

Profiling over Adaptive Ranges

  • Problem: how to count specific events efficiently and accurately?

    • Code segments executed

    • Memory regions accessed

    • IP addresses of routed packets

  • In all cases, impossible to maintain separate counters for the entire range of values

    • Each basic block, memory address, IP address…


Trade off precision vs efficiency

Trade-off: Precision vs. Efficiency

  • Profiling with uniform ranges fails to distinguish hot code

Uniform ranges

Unlimited counters


Higher precision for hot regions

Higher Precision for Hot Regions

  • Good trade-off with limited resources:

    • High precision for hot regions

    • Low precision for colder ones, but this affects the accuracy less

  • Challenge: how to determine what exactly to count with what precision?


Solution adaptive profiling

Solution: Adaptive Profiling

  • Start with one counter; split counters as they become hot:


Solution adaptive profiling1

Solution: Adaptive Profiling

  • Start with one counter; split counters as they become hot:


Solution adaptive profiling2

Solution: Adaptive Profiling

  • Start with one counter; split counters as they become hot:


Counter merging

Counter Merging

  • Problem: what if program behaviour changes after the initialization phase?


Counter merging1

Counter Merging

  • Problem: what if program behaviour changes after the initialization phase?


Counter merging2

Counter Merging

  • Solution: perform counter merging along with splitting


Counter merging3

Counter Merging

  • Counters of merged child nodes added to the parent


Counter merging4

Counter Merging

  • Counters of merged child nodes added to the parent


Counter merging5

Counter Merging

  • Problem: how to identify nodes for merging?

    • They are by definition those ones that are not updated frequently

  • Solution: periodic batched merge operations

    • Tree depth grows at logarithmic rate  can be done at exponentially increasing intervals


Additional contributions

Additional Contributions

  • Heuristics for splitting and merging

  • Theoretical analysis of accuracy guarantees

  • Proposal for hardware implementation

  • Experimental evaluation

    • Memory requirements

    • Average and worst-case errors on benchmarks

    • Performance of HW implementation

    • Accuracies on the order of 98.0-99.8% with only 8-64K of memory


Conclusions

Conclusions

  • Highly interesting program

    • My short presentation certainly doesn’t do justice to most of the mentioned works!

  • Readings to perhaps consider for future CARG:

    • D. Wentzlaff, A. Agarwal, Constructing Virtual Architectures on a Tiled Processor

    • A. Smith et al., Compiling for EDGE Architectures

    • F. Agakov et al., Using Machine Learning to Focus Iterative Optimization

    • (Highly subjective!)


  • Login