1 / 20

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation. Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University. Motivation. Languages, compilers, and runtime systems for high-end computing

eliora
Download Presentation

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University

  2. Motivation • Languages, compilers, and runtime systems for high-end computing • Typically focus on scientific applications • Can commercial applications benefit ? • A majority of top 500 parallel configurations are used as database servers • Is there a role for parallel systems research ? • Parallel relational databases – probably not • Data mining, OLAP, decision support – quite likely

  3. Data Mining • Extracting useful models or patterns from large datasets • Includes a variety of tasks - mining associations, sequences, clustering data, building decision trees, predictive models - several algorithms proposed for each • Both compute and data intensive • Algorithms are well suited for parallel execution • High-level interfaces can be useful for application development

  4. Project Overview

  5. Project Components • A middleware system called FREERIDE (Framework for Rapid Implementation of Datamining Engines) (SDM 01, SDM 02) • Performance modeling and prediction (for parallelization strategy selection) SIGMETRICS 2002 • Runtime and compiler support for shared memory parallelization (LCPC 02) • Translation from mining operators (not yet ) • Focus on language and compiler support for distributed memory parallelization in this talk

  6. Common Processing Structure • Structure of Common Data Mining Algorithms {* Outer Sequential Loop *} While () { { * Reduction Loop* } Foreach (element e) { (i,val) = process(e); Reduc(i) = Reduc(i) op val; } } • Applies to major association mining, clustering and decision tree construction algorithms • Parallelization approach • Compute local copy of reduction objects • Perform global reduction

  7. Middleware Support for Distributed Memory Parallelization • Interface Requires: • Specification of an iterator and termination condition • Local reduction for each parallel loop • Global reduction for each loop • Functionality • Fetch data elements chunk by chunk, apply local reduction • Broadcast the reduction object after finishing one pass on data • Perform global reduction, broadcast the results • Check termination condition, move to next iteration

  8. Compilation Approach • Support a general high-level language • Use middleware functionality in compilation • Exploit the domain-specific common structure • Reduction loop with associative and commutative operations • Disk-resident input datasets, smaller output

  9. Language Support ·A data parallel dialect of Java: to give compiler information about independent collections of objects, parallel loops and reduction operations — domain & rectdomain — foreach loop — reduction variables: - can only be updated inside a foreach loop by operations that are associative & commutative - intermediate value of the reduction variables may not be used within the loop, except for self-updates

  10. Example code public class kNN { static buffer kbuffer; public static void main(String[] args) { double dis; Point<3>lowend = … Point<3> hiend = … Point<3> p; RectDomain<3>InputDomain=[lowend:hiend]; kPoint[3d]Input=new kPoint[InputDomain]; foreach (p in InputDomain) { if (Input[p].inRange(R)) { dis=Input[p].distance(W); kbuffer.insert(Input[p],dis); }

  11. Compilation Task • Extract local reduction function • Simple from body of data parallel loop • Extract an iterator and termination condition • Simple from the overall code • Extract a global reduction function • Can be quite challenging in the presence of complex control flow and data-structures • A new algorithm developed

  12. Extracting Global Reduction from Local Reduction : Motivating Example For( j = 0; j < k ; j++) { I = k – 1 ; While (buf.dis[j] < distance) && I >= 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = buf..x1[j] ; x2[I+1] = buf..x2[I] ; … } } I = k – 1 ; While (newdis < distance) && I >= 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = kpoint.x1 ; x2[I+1] = kpoint.x2 ; … } I = k – 1 ; While (kpoint.dis < distance) && I >= 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = kpoint.x1 ; x2[I+1] = kpoint.x2 ; … }

  13. Overall Approach • Classify each assignment to a data member of reduction object into following types: • O.x =g(e), where e is the input element • O.x = O.x op g(e), op is an associative and commutative operator • Expression involving loop constants and other members of the reduction object • Classify control dependence on any of the above assignment statements as: • Loop constant • Non-loop constant

  14. Code Generation: Handling Different Types of Assignment Statements • Three types of assignment statements: • O.x = g(e) (Type a) If x can represent many fields, iterate over all of them • O.x = O.x op g(e) (Type b) Replace by O.x = O.x op O1.x If x can represent many fields, iterate over all of them • Expression involving loop constants and other data members (Type c) Keep as it is

  15. Handling Control Flow • Control predicates for Type (b) assignments: • Remove non-loop constant control predicates • Keep loop constant control predicates • Control predicates for Type (a) and Type (c) statements: • Keep loop constant control predicates • Classify non-loop constant into two types: • Predicate involves a value that is assigned to a data member Replace that value by the data member • Other predicates - Simply remove

  16. Experimental Platform Cluster of Workstations • Sun Ultra Enterprise 450 • 250 MHz Ultra-II processors • 1 GB of 4-way interleaved main memory • Myrinet as the interconnect

  17. Results from k-means clustering 1 GB dataset with 3 dimensional points K = 3

  18. Results from Apriori Association Mining 3 GB dataset

  19. Results from k-nearest neighbors 1 GB dataset 3 dimensional pts. k = 100

  20. Summary • Focus on a new class of applications • Exploit the common structure within the class • Develop a runtime system supporting this structure • Use it as a compiler target • Very simple compiler implementation (< 1000 lines of code) • A new algorithm for synthesizing global reduction functions • Performance of compiler generated code is very competitive

More Related