slide1
Download
Skip this Video
Download Presentation
Conference Review Presented by: Ivan Matosevic

Loading in 2 Seconds...

play fullscreen
1 / 40

Conference Review Presented by: Ivan Matosevic - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006. Conference Review Presented by: Ivan Matosevic. Outline. Conference overview Brief summaries of sessions Keynote speeches Best paper. Conference Overview.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Conference Review Presented by: Ivan Matosevic' - duena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

CGO 2006:The Fourth International Symposium on Code Generation and OptimizationNew York, March 26-29, 2006

Conference Review

Presented by: Ivan Matosevic

outline
Outline
  • Conference overview
  • Brief summaries of sessions
  • Keynote speeches
  • Best paper
conference overview
Conference Overview
  • Primary focus: back-end compilation techniques
    • Static analysis and optimization
    • Profiling
    • Run-time techniques
  • 8 sessions, 29 papers
  • Dominating topics: multicores, dynamic compilation
overview of session
Overview of Session
  • Dynamic Optimization
  • Object-Oriented Code Generation and Optimization
  • Phase Detection and Profiling
  • Tiled and Multicore Compilation
  • Static Code Generation and Optimization Issues
  • SIMD Compilation
  • Optimization Space Exploration
  • Security and Reliability
session 1 dynamic optimization
Session 1: Dynamic Optimization
  • Kim Hazelwood (University of Virginia), Robert Cohn (Intel), A Cross-Architectural Interface for Code Cache Manipulation
    • Pin dynamic instrumentation system with code cache
    • The paper describes an API for various operations with the code cache (callbacks, lookups, statistics, etc.)
  • Derek Bruening, Vladimir Kiriansky, Tim Garnett, Sanjeev Banerji (Determina Corporation), Thread-Shared Software Code Caches
    • Problem: sharing a code cache across multiple threads
    • Authors propose a fine-grained locking scheme
    • Evaluation using DynamoRIO
session 1 dynamic optimization1
Session 1: Dynamic Optimization
  • Keith Cooper, Anshuman Dasgupta (Rice Univ.), Tailoring Graph-coloring Register Allocation For Runtime Compilation
    • Problem: register allocation in JIT compilers
    • Authors propose a novel lightweight graph-colouring technique
  • Weifeng Zhang, Brad Calder, Dean Tullsen (UC San Diego), A Self Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
    • Extension of the Trident event-driven dynamic optimization framework (previously proposed by the same authors)
    • Dynamic insertion of prefetching instructions based on run-time analysis
session 2 object oriented code generation and optimization
Session 2: Object-Oriented CodeGeneration and Optimization
  • Suresh Srinivas, Yun Wang, Miaobo Chen, Qi Zhang, Eric Lin, Valery Ushakov, Yoav Zach, Shalom Goldenberg (Intel Corporation), Java JNI Bridge: An MRTE Framework for Mixed Native ISA Execution
    • Use a dynamic translator for the execution of native calls to one ISA on a different ISA’s Java platform
  • Kris Venstermans, Lieven Eeckhout, Koen De Bosschere (Ghent University), Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing
    • Use address bits on a 64-bit architecture to encode object type in order to save memory
    • Objects of the same type allocated in a contiguous (virtual) region
session 2 object oriented code generation and optimization1
Session 2: Object-Oriented CodeGeneration and Optimization
  • Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan (IBM Canada), Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler
    • The IBM TestaRossa JIT compiler
    • This paper focuses on code patching and profiling in a multi-threaded environment with a lot of class loading/unloading
  • Lixin Su, Mikko H Lipasti (University of Wisconsin Madison), Dynamic Class Hierarchy Mutation
    • Run-time reassignment of objects from one derived class to another, changing its virtual tables
    • Offers opportunity for optimizations based on specialization
session 3 phase detection and profiling
Session 3: Phase Detection and Profiling
  • Priya Nagpurkar, (UCSB), Michael Hind (IBM), Chandra Krintz, (UCSB), Peter Sweeney, V.T. Rajan (IBM), Online Phase Detection Algorithms
    • Detecting phase behaviour in virtual machines
    • Track dynamic program parameters (methods invoked, branch directions…) over time and apply a similarity model
  • Jeremy Lau, Erez Perelman, Brad Calder (UC San Diego), Selecting Software Phase Markers with Code Structure Analysis
    • Portions of code whose execution correlates with phase changes
    • Procedure calls and returns, loop boundaries
    • Profile-based hierarchical loop-call graph
session 3 phase detection and profiling1
Session 3: Phase Detection and Profiling
  • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara), Profiling over Adaptive Ranges
    • Voted best paper – details later
  • Hyesoon Kim, Muhammad Aater Suleman, Onur Mutlu, Yale N. Patt (UT-Austin), 2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
    • Predicts whether the prediction accuracy of each branch will vary across input sets
    • Heuristic approach used to derive representative profiling results from a single input set
session 4 tiled and multicore compilation
Session 4: Tiled and Multicore Compilation
  • David Wentzlaff, Anant Agarwal (MIT), Constructing Virtual Architectures on a Tiled Processor
    • Map components of a superscalar architecture (Pentium III) onto a parallel tiled architecture (Raw) using dynamic translation
    • In a way, uses Raw as a coarse-grain FPGA
  • Aaron Smith, (UT-Austin), J. Burrill, (UMass at Amherst), J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley (UT-Austin), Compiling for EDGE Architectures
    • TRIPS EDGE (Explicit Data Graph Execution) architecture
    • This paper focuses on compilation of standard C and FORTRAN benchmarks
session 4 tiled and multicore compilation1
Session 4: Tiled and Multicore Compilation
  • Shih-wei Liao, Zhaohui Du, Gansha Wu, Guei-Yuan Lueh (Intel), Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
    • Parallel compiler for the Brook streaming language
    • An extension of C that enables specifying data parallelism
  • Michael L. Chu, Scott A. Mahlke (University of Michigan), Compiler-directed Object Partitioning for Multicluster Processors
    • Partitioning of data in clustered architectures such as Raw
    • I didn’t really understand what programming model these authors have in mind?
session 5 static code generation and optimization issues
Session 5: Static Code Generation andOptimization Issues
  • Two papers about the HPUX Itanium compiler:
    • Dhruva R. Chakrabarti, Shin-Ming Liu (Hewlett-Packard), Inline Analysis: Beyond Selection Heuristics
      • Cross-module techniques for selection of inlined call sites and the choice of specialized function versions
    • Robert Hundt, Dhruva R. Chakrabarti, Sandya S. Mannarswamy (Hewlett-Packard), Practical Structure Layout Optimization and Advice
      • Data layout and placement on the heap to improve locality
      • Structure splitting, structure peeling, dead field removal, and field reordering
session 5 static code generation and optimization issues1
Session 5: Static Code Generation andOptimization Issues
  • Chris Lupo, Kent Wilken (University of California, Davis), Post Register Allocation Spill Code Optimization
    • Authors propose a profile-based algorithm for placement of save/restore instructions handling spilled variables in function calls
    • Implemented as a part of GCC
  • Seung Woo Son, Guangyu Chen, Mahmut Kandemir (Pennsylvania State University), A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality
    • Goal: restructure code so that disk idle periods are lengthened
    • The approach targets array-based programs: disk layout of array data exposed to the compiler
session 6 simd compilation
Session 6: SIMD Compilation
  • Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel China Software Center), Optimizing Dynamic Binary Translation for SIMD Instructions
    • Algorithms for dynamic binary translation of SIMD instructions in general-purpose architectures (such as MMX in x86)
    • Evaluation using IA-32 binaries on Itanium 2
  • Dorit Nuzman (IBM), Richard Henderson (Red Hat), Multi-Platform Auto-Vectorization
    • Implementation of automatic vectorizer for GCC 4.0
session 7 optimization space exploration
Session 7: Optimization-space Exploration
  • Felix Agakov, Edwin Bonilla, John Cavazos, Bjoern Franke, Grigori Fursin, Michael O\'Boyle, Marc Toussaint, John Thomson, Chris Williams (U. of Edinburgh), Using Machine Learning to Focus Iterative Optimization
    • Predictive modelling used to search the optimization space
    • Targets embedded platforms – AMD Au1500 and Texas Instruments TI C6713
  • Prasad Kulkarni, David Whalley, Gary Tyson (Florida State University), Jack Davidson (University of Virginia), Exhaustive Optimization Phase Order Space Exploration
    • Exhaustive search of the phase order space (15 phases) using aggressive pruning; takes time on the order of minutes to hours
    • Targets StrongARM SA-100
session 7 optimization space exploration1
Session 7: Optimization-space Exploration
  • Zhelong Pan, Rudolf Eigenmann (Purdue University), Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning
    • Problem: find the optimal combination of 38 GCC O3 options, targeting Pentium IV and Sparc II
    • Proposed heuristic algorithm that provides s quality solution in time on the order of several hours
session 8 security and reliability
Session 8: Security and Reliability
  • Edson Borin, (UNICAMP), Cheng Wang, Youfeng Wu (Intel), Guido Araujo (UNICAMP), Software-Based Transparent and Comprehensive Control-Flow Error Detection
    • Addresses the problem of soft (transient) errors that cause branches to incorrect instructions
    • Implemented in SW as a part of a dynamic binary translator
  • Tao Zhang, Xiaotong Zhuang, Santosh Pande (Georgia Tech), Compiler Optimizations to Reduce Security Overheads
    • Optimizations that specifically target techniques that implement software protection with minimal HW support
session 8 security and reliability1
Session 8: Security and Reliability
  • Susanta Nanda, Wei Li, Tzi-cker Chiueh (State University of NY at Stony Brook), BIRD: Binary Interpretation using Runtime Disassembly
    • Goal: framework for automatic detection of vulnerabilities such as buffer overflows when the source code is not available
    • Static and dynamic disassembly and instrumentation – targets Windows x86 application
keynote speeches
Keynote Speeches
  • Wei Li, Principal Engineer, Intel: "Parallel Programming 2.0"
  • Kevin Stoodley, Fellow and CTO of Compilation Technology, IBM: "Productivity and Performance: Future Directions in Compilers"
wei li parallel programming 2 0
Wei Li: Parallel Programming 2.0
  • Major technological change:
    • Moore’s Law continues to increase transistor counts
    • However: power, memory latency, limits to ILP are setting an effective performance ceiling
  • General trend towards thread-level on-chip parallelism
    • SMT
    • Chip multiprocessors
wei li parallel programming 2 01
Wei Li: Parallel Programming 2.0
  • “Parallel Programming 2.0” refers to the advent of multicores
  • A very optimistic future vision:
wei li parallel programming 2 02
Wei Li: Parallel Programming 2.0
  • Key issue – where will the parallelism come from?
  • Parallel programming needs to become more mainstream
    • Consumer vs. HPC/server/database
    • Inclusion into education at more elementary level
    • New tools for greater ease of programming
  • Intel’s parallel programming tools
    • http://www.intel.com/software
k stoodley productivity and performance future directions in compilers
K. Stoodley:"Productivity and Performance: Future Directions in Compilers"
  • Limits to traditional static compilation
  • Overview of IBM compiler technology
    • Testarossa JIT compiler, Toronto Portable Optimizer, Tobey backend
  • Challenges at present and near future
    • Software abstraction complexity – forces the scope of compilation to higher levels
    • Maintaining high performance backwards compatibility increasingly difficult
k stoodley productivity and performance future directions in compilers1

xlc

xlC

xlf

Front

Ends

class

class

jar

W-Code

J9 Execution Engine

(Java + Others)

CPO

Toronto Portable

Optimizer (TPO)

TOBEY

Backend

Testarossa

JIT

Dynamic

Machine

Code

Binary

Translation

Profile-Directed

Feedback (PDF)

Static

Machine

Code

K. Stoodley:"Productivity and Performance: Future Directions in Compilers"
  • Future: convergence/combination of dynamic and static compilation technologies
best paper
Best Paper
  • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara): Profiling over Adaptive Ranges
profiling over adaptive ranges
Profiling over Adaptive Ranges
  • Problem: how to count specific events efficiently and accurately?
    • Code segments executed
    • Memory regions accessed
    • IP addresses of routed packets
  • In all cases, impossible to maintain separate counters for the entire range of values
    • Each basic block, memory address, IP address…
trade off precision vs efficiency
Trade-off: Precision vs. Efficiency
  • Profiling with uniform ranges fails to distinguish hot code

Uniform ranges

Unlimited counters

higher precision for hot regions
Higher Precision for Hot Regions
  • Good trade-off with limited resources:
    • High precision for hot regions
    • Low precision for colder ones, but this affects the accuracy less
  • Challenge: how to determine what exactly to count with what precision?
solution adaptive profiling
Solution: Adaptive Profiling
  • Start with one counter; split counters as they become hot:
solution adaptive profiling1
Solution: Adaptive Profiling
  • Start with one counter; split counters as they become hot:
solution adaptive profiling2
Solution: Adaptive Profiling
  • Start with one counter; split counters as they become hot:
counter merging
Counter Merging
  • Problem: what if program behaviour changes after the initialization phase?
counter merging1
Counter Merging
  • Problem: what if program behaviour changes after the initialization phase?
counter merging2
Counter Merging
  • Solution: perform counter merging along with splitting
counter merging3
Counter Merging
  • Counters of merged child nodes added to the parent
counter merging4
Counter Merging
  • Counters of merged child nodes added to the parent
counter merging5
Counter Merging
  • Problem: how to identify nodes for merging?
    • They are by definition those ones that are not updated frequently
  • Solution: periodic batched merge operations
    • Tree depth grows at logarithmic rate  can be done at exponentially increasing intervals
additional contributions
Additional Contributions
  • Heuristics for splitting and merging
  • Theoretical analysis of accuracy guarantees
  • Proposal for hardware implementation
  • Experimental evaluation
    • Memory requirements
    • Average and worst-case errors on benchmarks
    • Performance of HW implementation
    • Accuracies on the order of 98.0-99.8% with only 8-64K of memory
conclusions
Conclusions
  • Highly interesting program
    • My short presentation certainly doesn’t do justice to most of the mentioned works!
  • Readings to perhaps consider for future CARG:
    • D. Wentzlaff, A. Agarwal, Constructing Virtual Architectures on a Tiled Processor
    • A. Smith et al., Compiling for EDGE Architectures
    • F. Agakov et al., Using Machine Learning to Focus Iterative Optimization
    • (Highly subjective!)
ad