1 / 11

Future mass a pps reflect a concurrent world

Future mass a pps reflect a concurrent world. Exciting applications in future mass computing market represent and model physical world. Traditionally considered “supercomputing apps” or super-apps.

khoi
Download Presentation

Future mass a pps reflect a concurrent world

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Future mass apps reflect a concurrent world • Exciting applications in future mass computing market represent and model physical world. • Traditionally considered “supercomputing apps” or super-apps. • Physiological simulation, Molecular dynamics simulation, Video and audio manipulation, Medical imaging, Consumer game and virtual reality products • Attempts to grow current architectures “out” or domain-specific architectures “in” lack success; a more broad approach to cover more domains is promising

  2. MPEG Encoding Parallelism • Independent IPPP sequences • Frames: independent 16x16 pel macroblocks • Localized dependence of P-frame macroblocks on previous frame • Steps of macroblock processing exhibit finer grained parallelism, each block spans function boundaries

  3. Alternative Forms of MPEG-4 Threading

  4. Building on HPF Compilation: what’s new? • Applicability to mass software base - requires pointer analysis, control flow analysis, data structure and object analysis, beyond traditional dependence analysis • Domain-specific, application model languages • More intuitive than C for inherently parallel problems • increased productivity, increased portability • Will still likely have C as implementation language • There is room for a new app language or a family of languages • Role for the compiler in model language environments • Model can provide structured semantics for the compiler, beyond what can be derived from analysis of low-level code • Compiler can magnify the usefulness of model information with its low-level analysis

  5. Pointer analysis: sensitivity, stability and safety Fulcra in OpenIMPACT [SAS2004, PASTE2004] and others Improved efficiency increases the scope over which unique, heap-allocated objects can be discovered Improved analysis algorithms provide more accurate call graphs (below) instead of a blurred view (above) for use by program transformation tools

  6. Thoughts from the VLIW/EPIC Experience • Any significant compiler work for a new computing platform takes 10-15 years to mature • 1989-1998 initial academic results from IMPACT • 1995-2005 technology collaboration with Intel/HP • 2000-2005 SPEC 2000, Itanium 1 and 2, open source apps • This was built on significant work from Multiflow, Cydrom, RISC, HPC teams • Real work in compiler development begins when hardware arrives • IMPACT output code performance improved by more than 20% since arrival of Itanium hardware – and much more stable • Most apps brought up with IMPACT after Itanium systems arrived: debugging! • Real performance effects can only be measured on hardware • Early access to hardware for academic compiler teams crucial and must a priority for industry development team. • Quantitative methodology driven by large apps is key • Innovations evaluated in whole system context

  7. Heavyweight loops How the next-generation compiler will do it (1) • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Acceleration opportunities: • Heavyweight loops identified for acceleration • However, they are isolated in separate functions called through pointers

  8. How the next-generation compiler will do it (2) Initialization code identified Large constant lookup tables identified • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Localize memory: • Pointer analysis identifies indirect callees • Pointer analysis identifies localizable memory objects • Private tables inside accelerator initialized once, saving traffic

  9. How the next-generation compiler will do it (3) Constant table privatized Summarize output access pattern Summarize input access pattern • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Streaming and computation overlap: • Memory dataflow summarizes array/pointer access patterns • Opportunities for streaming are automatically identified • Unnecessary memory operations replaced with streaming

  10. How the next-generation compiler will do it (4) • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Achieve macropipelining of parallelizable accelerators • Upsampling and color conversion can stream to each other • Optimizations can have substantial effect on both efficiencyand performance

  11. Memory dataflow in the pointer world Array of constant pointers • Arrays are not true 3D arrays (unlike in Fortran) • Actual implementation: array of pointers to array of samples • New type of dataflow problem – understanding the semantics of memory structures instead of true arrays Row arrays never overlap

More Related