Domain-Specific Languages for Ubiquitous Parallelism

Domain-Specific Languages for Ubiquitous Parallelism Calvin Lin University of Texas at Austin September 6, 2007

Domain-Specific Languages • Sam Midkiff: “Domain-specific languages are good” • Jim Larus: “We don’t want to rely on people like Sam to build domain-specific languages” • How can we simplify the creation of Domain-Specific Languages?

Libraries are Domain-Specific Languages • Two issues • No new syntax • No compiler support • Libraries encapsulate domain-specific semantics • This semantic information provides many opportunities for analysis and optimization • This information is unavailable to conventional compilers c = a * b; /* language primitive */ bnMultiply(c,a,b) /* library call*/

Integrated App and Library Application Our Solution Extends power of compilers to library operations Broadway Compiler Library Annotations Domain-specific information

Integrated App and Library Application Hard parts are reused many times Separation of Concerns Extends power of compilers to library operations Mortal programmers Compiler writer Broadway Compiler Library One compiler for all libraries Annotations Domain Expert One set of annotations per library Hard parts are hidden from the mortals

Outline • Motivation • Our Solution • Example: Optimizing PLAPACK applications • Results • Looking to the Future

Birdseye View of PLAPACK PLAPACK: Dense parallel linear algebra library • Developed by van de Geijn, et al • Designed for high performance •  40,000 lines of C code [van de Geijn 1997] Applications: LU, QR, Cholesky, ... Parallel BLAS 3 Parallel BLAS 2 BLAS3 Parallel BLAS 1 BLAS2 BLAS1 MPI Utils

Typical PLAPACK Application while (True) { PLA_Obj_global_length(ABR, &length); if (length == 0) break; PLA_Obj_split_4(ABR, nb, nb, &A11, &A12, &A21, &A22); Cholesky(A11); PLA_Trsm(PLA_SIDE_RIGHT, PLA_LOW_TRIAN, PLA_TRANS, PLA_NONUNIT_DIAG, one, A11, A21); PLA_Syrk(PLA_LOW_TRANS, PLA_NO_TRANS, minus_one, A21, one, ABR); } “views” of the data

Views in PLAPACK • The notion of views can be used to perform optimizations • Views can have special properties • These properties can be reasoned about by programmers • These properties can be exploited by using special algorithms These properties are an example of domain-specific information local distributed PLA_Trsm_local(PLA_SIDE_RIGHT, PLA_LOW_TRIAN, PLA_TRANS, PLA_NONUNIT_DIAG, one, A11, A21);

PLA_Trsm() and PLA_Syrk() are overly general– they work for any distribution View-Based Optimizations Given the original program: PLA_Obj_view_all(A, &ABR) while (True) { PLA_Obj_length(ABR, &b); b = min(b, nb); if (b==0) break; PLA_Obj_split_4(ABR,b,b,&A11.. &A21, &ABR); Cholesky(A11); PLA_Trsm(PLA_SIDE_RIGHT, …) PLA_Syrk(PLA_LOW_TRIAN, …) } • Compiler analyzes the flow of view information through the program • Compiler determines when specialized routines can be used

What Information Is Needed? • Define special properties “Views can be local or distributed” • Specify how the library routines affect these properties “Which routines create views, shrink views, etc” • Specify when special routines can be used “How can view information be used to invoke specialized routines”

How Do We Convey This Information? • Define special properties • Specify how the library routines affect these properties • Specify when special routines can be used property Distribution = {Local, Distributed, Empty}; procedure PLA_Obj_split_4(obj, length, ...) { analyze Distribution { (view == Distributed) ==> view11 = Local; } } procedure PLA_TRSM(...) { specialize { (view == Local) ==> replace ”PLA_Trsm_Local”;

Other Annotations • Basic Annotations convey dependence information • defs and uses of procedure parameters • pointer relationships • These annotations are not domain-specific modify {}; access {view}; on_entry {obj--> view}; on_exit {A11-->view11, A12-->view12, A21-->view21, A22-->view22};

Broadway Baseline Guru-optimized Does It Work? • Comparison against guru-optimized version written by PLAPACK development team [Baker, et al98] Cholesky (3072×3072) Cholesky (3072  3072) 3000 MFLOPS 0 0 40 Processors [Guyer and Lin 1999]

Cray T3E PLA_Trsm() and PLA_Gemm() are specialized for their specific calling contexts Trsm Cholesky Cholesky Lyapunov Lyapunov Gemm A Closer Look at Performance • Improvement over clean, high quality PLAPACK programs 400 300 200 % Improvement 100 0 250 750 1500 2500 Problem size

MPI_Send is specialized • Broadcast is specialized Cray T3E Asynchronous Send Pipelined Broadcast Pipelined Broadcast Rank K algorithm Rank K algorithm PLA_Gemm PLA_Gemm Scalability is improved A Closer Look at Gemm • The Gemm algorithm is specialized 4000 3000 MFLOPS Rank K algorithm 2000 PLA_Gemm 1000 0 4 16 36 Number of Processors

PLA_Gemm() Broadway Rank K algorithm MPI_Send() C compiler There is great benefit to optimizing at multiple levels of abstraction Optimizing at Multiple Levels Levels of Abstraction in PLAPACK Should program at this level global matrix operations global matrix operations Global Global Explicitly parallel Explicitly parallel matrices + high level communication matrices + high level communication Local Local MPI + local BLAS MPI + local BLAS C language C language C primitives C primitives

What About Other Domains? • Is our approach general?

Security Analysis as Data Flow Analysis • Example: • Vulnerability: executes any remote command • What if this program runs as root? • Requirement: • This is a domain-specific analysis • We can use the same annotation language and compiler int sock; char buffer[100]; sock = socket(AF_INET, SOCK_STREAM, 0); read(sock, buffer, 100); execl(buffer); ! Data from an Internet socket should not specify a program to execute

Generating Tainted Data • Any external input is tainted • Examples: read(), fscanf(), readdir() • Taintedness is a property of the buffer, not the surface variable procedure read(fd, buffer_ptr, size) { on_entry { buffer_ptr --> buffer } analyze Taint { buffer <- Tainted } }

Transmitting Taintedness • String manipulation can transmit taintedness • Examples: strcpy(), strdup(), strcat(), sprintf() procedure strcpy(dest, src) { on_entry { src --> src_string dest --> dest_string analyze Taint { dest_string <- src_string } }

Reporting Vulnerabilities • Test the flow values • Tainted strings should not be passed to execl() • Reports the exact location of the problem procedure execl(buffer_ptr) { on_entry { buffer_ptr --> buffer } report if (Taint : buffer is-exactly Tainted) “Vulnerability at “ ++ @context ++ “: Argument; “ ++ [ buffer_ptr ] ++ “ is tainted.\n”; }

Security Analysis with Broadway • Tested on actual programs that were distributed with the bug Results for Format String Vulnerability

Beyond Static Analysis Original Code Inserted Code • Compiler can dynamically enforce security policies int sock; char buffer[100]; sock = socket(AF_INET, SOCK_STREAM, 0); read(sock, buffer, 100); execl(buffer); int vs, vb; vs = Tainted; vb = Tainted; if (vb!=Tainted) { execl(buffer); } Data from an Internet socket should not specify a program to execute

Sequential Library Broadway Compiler Parallel Library Annotations Looking to the Future Sequential Library Parallel Library Broadway Compiler Sequential App and Library Parallel App and Library • Key questions: • What information is needed? • How do we express this information? • How should libraries be structured? Annotations

What Information is Needed? Sequential Library Broadway Compiler Parallel Library • Dependence information • e.g. FLAME pointers are well-behaved • e.g. Which operations commute • Hints on what to parallelize • Hints on granularity of parallelism • Machine information • Guidance for empirical tuning . . . Annotations

Possible Test Cases • Parallelize van de Geijn’s FLAME library • Parallelize graphics rendering engine • Support Galois-style optimistic dynamic parallelization [Pingali, et al]

Conclusions • Long term vision • Create tools and frameworks that help domain experts create parallel DSL’s • Use Broadway to move us towards this goal • Broadway approach works with existing libraries, new libraries, program generators, . . . • Ignore issues of syntax • Focus on parallelization issues

Domain-Specific Languages for Ubiquitous Parallelism

Domain-Specific Languages for Ubiquitous Parallelism

Presentation Transcript

The Claytronics Project and Domain-Specific Languages

Domain-Specific Modeling Languages and Generators - Examples

Internal Domain-Specific Languages

Developing Domain-Specific Languages for the JVM

Domain Specific Languages

How domain specific are Domain Specific Languages?

Internal Domain-specific Languages in C#

Domain-Specific Languages for Composing Signature Discovery Workflows

Automata Based Method for Domain Specific Languages Definition

Ubiquitous Parallelism

Lecture 2 Domain Specific Embedded Languages

Unit Testing for Domain-Specific Languages

Domain-Specific Languages:

Framework for Domain - Specific Visual Languages

Agile Development with Domain-Specific Languages

Domain Specific Languages (Part 2)

Domain Specific Languages

IMPROVING PROGRAM COMPREHENSION TOOLS FOR DOMAIN-SPECIFIC LANGUAGES

Domain Specific Languages

Domain Specific Languages