1 / 14

Open64: A Framework for High performance Compiler

Open64: A Framework for High performance Compiler. March 2007. Outline. Open64 History Osprey Project Research Activities Retargetability. Open64 Based Research Activites at University of Delaware. Open64 Code Porting for Large-Scale Multi-Core Architectures

liliha
Download Presentation

Open64: A Framework for High performance Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open64: A Framework for High performance Compiler March 2007

  2. Outline • Open64 History • Osprey Project • Research Activities • Retargetability

  3. Open64 Based Research Activites at University of Delaware • Open64 Code Porting for Large-Scale Multi-Core Architectures • Code Optimization for Large-Scale Multi-Core Architectures • Research on a point-to alias analysis under a SSA framework • Landing Software Pipelining on Large-Scale Multi-Core Architectures

  4. Front End IPA Machine model (ISA, uArch, ABI etc) LNO WOPT CG Tool chains Port Open64 to Cyclops64 based on Pathscale 2.2.1/x8664 Begin with gcc 3.2.1/MIPS C FE, we change the MD so that it can generate AST compatible with cyclops64’s ABI Rewrite from scratch for C64 Only dep-test are enabled, the loop transformation are not enabled because org loop transformation is not readily applicable for arch without cache Changed heavily: CGIR lowering, scheduling, EBO etc tools chains (as, ld, simulator etc) are provided by ETI.

  5. Some researches on Open64/C64 • Scratch pad utilization • Divide scratch pad memory into 3 areas: 2nd level general purpose register (L2 GPR) , software rotating register (SRR), free area • L2 GPR: further divide into caller/callee-save, color live ranges with L2 GPR when RA run out of real registers. • SRR: prefetching, improve temporal locality • E.g1 prefech 5 iterations ahead for (…) { = x; } => for (…) { rrx = x; = rrx+5 } • E.g2 improve temporal locality: for (i=0; i <10000; i++ { a[i] = b[i] + a[i-5]; } => for (i=0; i <10000; i++ { rrx = b[i] + rrx-5 ; a[i]=rrx } • Use LDM (load multiple word) to reduce bandwidth bandwidth requirement

  6. Unification based points-to analysis using SSA • Motivations • Incremental change to existing Steensgaard’s PT analysis with better precision • Retain almost linear time • Limited flow sensitivity: improve the precision of analysis of *p and *q where p and q are global variable/pointer, or it may be modified by callees. • Reduce the imprecision due to unification • Limited Flow sensitivity by SSA form: • build (preliminary) SSA form for all variables (inc global variables and local var with address taken). Do not take into account the alias. • Perform Points-to on the preliminary SSA form, update the SSA form during PT analysis p3 initially points-to n, after analyzing stmt 4, p3points to both n and z

  7. Unification based points-to analysis using SSA (cont) • Differentiate flat unification and updating unification • Flat unification: let s1=points_to(p1), s2=points_to(p2), statements p = cond ? p1 : p2 make s1 and s2 unified simply because p may points to both set. The s1 and s2 themselves don’t need updated at the moment unification happens. • Incremental update: points_to(p1) => {a, b}, “*q1 = some_ptr”, may change p’s value, hence points_to(p1) should be updated into {a,b} U points_to(some_ptr). • The final unified set encode the type of unification of smaller subset. Flat-unified sub-sets are still disjointed.

  8. Software Pipelining of Multi-Core Architectures – A Brief Introduction • Problem description • Software toolchain • Where Open64 helped • Some results.

  9. Problem Description • Software-pipelining on multi-threaded architectures • Single-dimension Software-Pipelining (SSP) • Workload distribution • Data communication • Data synchronization

  10. Software Pipelining Toolchain Based on Open64

  11. Implementation

  12. What Open64 features are used in multi-core software pipelining • Multi-dimensional dependence analysis • WHIRL clean interface • Machine model • Reservation tables • Register allocation • Modulo-scheduler • Code generator • No need to implement everything to test • Clean code despite lack of documentation!

  13. Cyclops64 architecture

  14. Some Results

More Related