an im plicitly pa rallel c ompiler t echnology based on phoenix n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An IM plicitly PA rallel C ompiler T echnology Based on Phoenix PowerPoint Presentation
Download Presentation
An IM plicitly PA rallel C ompiler T echnology Based on Phoenix

Loading in 2 Seconds...

play fullscreen
1 / 19

An IM plicitly PA rallel C ompiler T echnology Based on Phoenix - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

An IM plicitly PA rallel C ompiler T echnology Based on Phoenix. For thousand-core microprocessors Wen-mei Hwu with Ryoo, Ueng, Rodrigues, Lathara, Kelm, Gelado, Stone, Yi, Kidd, Barghsorkhi, Mahesri, Tsao, Stratton, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An IM plicitly PA rallel C ompiler T echnology Based on Phoenix' - rhys


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an im plicitly pa rallel c ompiler t echnology based on phoenix

An IMplicitly PArallel Compiler Technology Based on Phoenix

For thousand-core microprocessors

Wen-mei Hwu

with

Ryoo, Ueng, Rodrigues, Lathara, Kelm, Gelado, Stone, Yi, Kidd, Barghsorkhi, Mahesri, Tsao, Stratton, Navarro, Lumetta, Frank, Patel

University of Illinois, Urbana-Champaign

background
Background
  • Academic compiler research infrastructure is a tough business
    • IMPACT, Trimaran, and ORC for VLIW and Itanium processors
    • Polaris and SUIF for multiprocessors
    • LLVM for portability and safety
  • In 2001, IMPACT team moved into many-core compilation with MARCO FCRC funding
    • A new implicitly parallel programming model that balance the burden on programmers and the compiler in parallel programming
    • Infrastructure work has slowed down ground-breaking work
  • Timely visit by the Phoenix team in January 2007
    • Rapid progress has since been taking place
    • Future IMPACT research will be built on Phoenix
the next software challenge

Big picture

The Next Software Challenge
  • Today, multi-core make more effective use of area and power than large ILP CPU’s
    • Scaling from 4-core to 1000-core chips could happen in the next 15 years
  • All semiconductor market domains converging to concurrent system platforms
    • PCs, game consoles, mobile handsets, servers, supercomputers, networking, etc.

We need to make these systems effectively

execute valuable, demanding apps.

the compiler challenge
To meet this challenge, the compiler must

Allow simple, effective control by programmers

Discover and verify parallelism

Eliminate tedious efforts in performance tuning

Reduce testing and support cost of parallel programs

The Compiler Challenge

“Compilers and tools must extend the human’s ability to manage parallelism by doing the heavy lifting.”

an initial experimental platform

GFLOPS

G80 = GeForce 8800 GTX

G71 = GeForce 7900 GTX

G70 = GeForce 7800 GTX

NV40 = GeForce 6800 Ultra

NV35 = GeForce FX 5950 Ultra

NV30 = GeForce FX 5800

An Initial Experimental Platform
  • A quiet revolution and potential build-up
    • Calculation: 450 GFLOPS vs. 32 GFLOPS
    • Memory Bandwidth: 86.4 GB/s vs. 8.4 GB/s
    • Until last year, programmed through graphics API
    • GPU in every PC and workstation – massive volume and potential impact
geforce 8800

Texture

Texture

Texture

Texture

Texture

Texture

Texture

Texture

Texture

Host

Input Assembler

Thread Execution Manager

Parallel DataCache

Parallel DataCache

Parallel DataCache

Parallel DataCache

Parallel DataCache

Parallel DataCache

Parallel DataCache

Parallel DataCache

Load/store

Load/store

Load/store

Load/store

Load/store

Load/store

Global Memory

GeForce 8800

16 highly threaded SM’s, >128 FPU’s, 450 GFLOPS, 768 MB DRAM, 86.4 GB/S Mem BW, 4GB/S BW to CPU

some hand code results
Some Hand-code Results

[HKR HotChips-2007]

computing q performance
Computing Q: Performance

446x

CPU (V6): 230 MFLOPS

GPU (V8): 96 GFLOPS

lessons learned
Lessons Learned
  • Parallelism extraction requires global understanding
    • Most programmers only understand parts of an application
  • Algorithms need to be re-designed
    • Programmers benefit from clear view of the algorithmic effect on parallelism
  • Real but rare dependencies often needs to be ignored
    • Error checking code, etc., parallel code is often not equivalent to sequential code
  • Getting more than a small speedup over sequential code is very tricky
    • ~20 versions typically experimented for each application to move away from architecture bottlenecks
implicitly parallel programming flow

Stylized C/C++ or DSL w/ assertions

Implicitly Parallel Programming Flow

Deep analysis w/ feedback assistance

Human

Concurrency discovery

For increased

composability

Visualizable concurrent form

Systematic search for best/correct code gen

Code-gen space exploration

For increased

scalability

Visualizable sequential assembly code with parallel annotations

parallel execution w/ sequential semantics

Parallel HW w/sequential state gen

For increased

supportability

Debugger

key ideas
Key Ideas
  • Deep program analyses that extend programmer and DSE knowledge for parallelism discovery
    • Key to reduced programmer parallelization efforts
  • Exclusion of infrequent but real dependences using HW STU (Speculative Threading with Undo) support
    • Key to successful parallelization of many real applications
  • Rich program information maintained in IR for access by tools and HW
    • Key to integrate multiple programming models and tools
  • Intuitive, visual presentation to programmers
    • Key to good programmer understanding of algorithm effects
  • Managed parallel execution arrangement search space
    • Key to reduced programmer performance tuning efforts
getting started with phoenix
Getting Started with Phoenix
  • Meetings with Phoenix team in January 2007
    • Determined the set of Phoenix API routines necessary to support IMPACT analyses and transformations
  • Received custom build of Phoenix that supports full type information
fulcra to phoenix action
Fulcra to Phoenix – Action!
  • Four step process:
    • Convert IMPACT’s data structure to Phoenix’s equivalents, and from C to C++/CLI.
    • Creating the initial constraint graph using Phoenix’s IR instead of IMPACT’s IR.
    • Convert the solver – pointer analysis.
      • Consist of porting from C to C++/CLI and dealing with any changes to Fulcra ported data structures.
    • Annotate the points-to information back into Phoenix's alias representation.
phoenix support wish list
Phoenix Support Wish List
  • Access to code across file boundaries
    • LTCG
  • Access to multiple files within a pass
  • Full (Source code level) type information
  • Feed results from Fulcra back to Phoenix
    • Need more information on Phoenix alias representation
  • In the long run, we need highly extendable IR and API for Phoenix
conclusion
Conclusion
  • Compiler research for many-cores will require a very high quality infrastructure with strong engineering support
    • New language extensions, new user models, new functionalities, new analyses, new transformations
  • We chose Phoenix based on its robustness, features and engineering support
    • Our current industry partners are also moving into Phoenix
    • We also plan to share our advanced extensions to the other academic Phoenix users