Memory Efficient Software Synthesis from Dataflow Graph

Memory Efficient Software Synthesis from Dataflow Graph Wonyong Sung, Junedong Kim, Soonhoi Ha Codesign and Parallel Processing Lab. Seoul National University

Contents • Introduction • Code Generation from Block Diagram Specification • Synchronous Data Flow and Single Appearance Schedule • Proposed Strategies • Optimization 1 : code sharing optimization • Optimization 2 : minimize buffer requirement • Experiments • Conclusions

Introduction • Motivations • Embedded system has limited amount of memory • large program = memory cost, performance penalty, power consumption • New trend of software development : high level design methodology • growing complexity, fast design turn-around time, limited budget, etc. • Goal of Research • Reduce the code and data size of automatically generated software • In an automatic software synthesis environment • Specification = Dataflow graph with SDF(Synchronous DataFlow) semantics

Software Synthesis from SDF graph main(){ for(i=0;i<6;i++){A} for(i=0;i<4;i++){B} for(i=0;i<3;i++){C} for(i=0;i<2;i++){D} } main(){ for(i=0;i<2;i++){ for(j=0;j<3;j++){A} for(j=0;j<2;j++){B} } for(i=0;i<3;i++){C} for(i=0;i<2;i++){D} } B 3 1 2 2 A D 1 3 2 2 C Possible Schedules : = AABCABACDABABCD = (6A)(4B)(3C)(2D) = (2(3A2B))(3C)(2D) Single Appearance Schedule (SAS)

Previous Efforts • Single Appearance Schedule (SAS): APGAN,RPMC • [by Battacharyya et. al.] in Ptolemy Group • SAS guarantees the minimum code size (without code sharing) • APGAN,RPMC : heuristics to find data minimized SAS schedule • ILP formulation for data memory minimization • [by Ritz et. al.] in Meyr Group • flat single appearance schedule + sharing of data buffer • Rate optimal compile time schedule • [by Govindarajan et. al.] in Gao Group • tried to minimize the buffer requirement using linear programming • An algorithm to compute the smallest data buffer size • [by Ade et. al.] in GRAPE group

Proposed Strategies • Coding style • not stuck to one coding style, hybrid approach • generated code is a mixture of inlines and functions • Optimization 1: Code Sharing • Multiple instances of a same kernel treated as different node in SAS • Code sharing optimization has gain(block size) and cost(context size) • Optimization 2: Schedule Adjustment • give up single appearance schedule to reduce the data size • (1) represents schedule information with BTLC data structure • (2) find possible location for adjustment • (3) schedule adjustment

BTLC Flowchart of Optimization Procedure Get SAS schedule [RPMC,APGAN] code-block size context size Code sharing optimization Schedule Adjustment C code generation

Example of Code Sharing (CD2DAT) ramp sine  fir1 fir2 fir3 fir4 xgraph ramp’ sine’ xgraph Code before sharing for(int i=0;i<2;i++) { { /* code for fir1 */ ……………… out = tap*input[i]’ ……………… } } /* code for fir 2 */ …………….. Code after sharing for(int i=0;i<2;i++) fir(1); for(int i=0;i<3;i++) fir(2); …………… void fir(int context){ ……………… context_FIR[context].out... ……………… } context definition typedef struct{ double *out; int output_ofs; int output_bs; int output_nx; …………. double decimation; double tap; }context_FIR;

Code Size Overhead (in Sparc/Solaris) without context with context ….. = value; ….. = *(context_CGCRamp[context].value); ldd [%fp + -336],%o0 sethi %hi(0x20800),%o1 ld [%o1+0x3c8], %o0 mov %o0, %o2 sll %o2, 2, %o1 add %o1, %o0, %o1 sll %01, 3, %o0 add %fp, -424, %o1 add %o1, %o0, %o2 ld [%o2 + 0x1c], %o0 ldd [%o0], %o2 4 bytes 40 bytes Reference Overhead = 36 bytes!

Optimization 1 : Code sharing • Multiple instances of a same kernel have their own contexts • Kernel code should be transformed into shared version function • Shared Version • references are only through context variable • Gain and cost of sharing • Gain = (# instances -1)  (code block size) • Cost = (#instances)  (context variable size) + (code block overhead) • Code sharing is performed only when the gain is larger than the cost

 1 > (-1)  (-1)  >    >  +    > context + reference +    > context + shared Decision Formula (1)  = code sharing overhead = context + reference (2) context = pi(pi), pi  ports where, (x) = 3*sizeof(int) + sizeof(pointer) (3) reference = t {S,C,AS,AP}((t)(t)) (t) = reference count (t) = unit overhead t = type of reference (4)  = code block size (5)  = number of instances

C A B 3 7 5 6 A B C Optimization 2 : Adjusting SAS • Adjusting Single Appearance Schedule • 2(7A3B)5C ==> 51 • 2(7A3B2C)C ==> 39 • give up single appearance schedule • BTLC (Binary Tree with Leaf Chain) G 5 2 [6,0,0] = [input, inside, output] 3 [0,0,21] 7 [21,0,15] [7,0,5] [0,0,3]

BTLC Flowchart of Schedule Adjustment SAS schedule Construct BTLC Compute buffer requirement Find candidate for adjustment no found yes Adjust schedule (split a chain) Done code generation

G 2 5 7 3 C A B Splitting A Chain • Finding split candidate • a chain which has the largest number • in this example BC is selected • Schedule after splitting • 2(7A3B2C)C • In general, for a schedule that has two clusters aCabCb(a and b are loop counts) new schedule is defined as • a(Ca(b/a)Cb)(b%a)Cb) , if a<b • (a%b)Ca b((b/a)CaCb ), otherwise [0,30,0] [30,0,0] [0,21,30] [21,0,15] [0,0,21] [6,0,0] 30 21 Split point [0,0,3] [7,0,5] Schedule = 2(7A3B)5C

Decision Formula G [0,6,0] [0,12,6] [6,0,0] 2 1 [12,0,0] [0,21,15] 1 2 C |Cluster| = |W| value of the cluster [6,0,0] [21,0,15] 6 • New Schedule • 2(7A3B2C)C • Gain = 12 [0,0,21] 7 3 C [6,0,0] 12 21 A B [7,0,5] [0,0,3]

[0,280,0] G [0,56,280] [280,4,0] 7 40 [56,0,40] [0,6,56] [7,0,4] [4,0,0] F4 4 7 8 [6,0,8] [0,1,6] 280 4 3 2 F3 X1 [1,0,0] [1,0,2] [7,0,5] [0,1,1] 56 1 F1 [0,35,0] 6 1 [0,1,2] G [3,0,4] F2 1 X2 [1,0,0] [0,35,35] 1 [0,2,1] [35,4,0] 7 5 1 fork [1,0,2] [0,56,40] [35,4,0] 1 [0,1,2] 1 5 [7,0,4] [4,0,0] F4 4 1 M [2,0,1] [4,0,0] [56,35,40] 2 [0,6,56] [0,0,2] 4 [7,0,4] F4 4 7 8 1 S2 [1,0,1] X1 [1,0,0] [6,0,8] 35 1 35 [0,1,1] 4 2 1 R2 [0,0,1] F3 X1 [7,0,5] [1,0,0] 0 [0,0,1] 56 1 R1 S1 [1,0,1] [3,0,4] F2 Experiment : CD2DAT

Experimental Result Program size after each optimization CD2DAT Filter Bank SAS 13672 28512 Code Sharing 12768 22024 Schedule Adjustment 12296 22024 Memory behavior of CD2DAT in ARM7 Fetches Miss SAS 17098177 57189 Code Sharing 17573923 52867 Schedule Adjustment 17499386 54331

Conclusion • Our Environment • PeaCE : Ptolemy extension as Codesign Environment • Optimization Techniques in Software Synthesis • For automatic code generation from dataflow graph • Joint minimization of code and data size • Selective application code sharing and schedule adjustment to SAS • Future works • Clustering : multiple fine grain nodes into a large one • increase chance of code sharing • Buffer sharing • further reduce the buffer size and increase the cache effect

Thank You !

Memory Efficient Software Synthesis from Dataflow Graph

Memory Efficient Software Synthesis from Dataflow Graph

Presentation Transcript

Compiler and Runtime Support for Efficient Software Transactional Memory

Efficient Virtual Memory for Big Memory Servers

Synthesis of Memory Fences

Software Synthesis

Efficient Use of Memory

Dataflow Analysis for Software Product Lines

Synthesis on Distributed-memory System

Software and dataflow organization

Intraprocedural Dataflow Analysis for Software Product Lines

Dataflow Analysis for Software Product Lines

Buffer Issues of Hardware synthesis from SDF graph

Compiler and Runtime Support for Efficient Software Transactional Memory

Dataflow I: Dataflow Analysis

Test Synthesis from UML Model of Distributed Software

I/O-Efficient Graph Algorithms

Dataflow

Software Synthesis from Hybrid Automata

Efficient Mining of Graph-Based Data

Efficient Software Test Case Generation Using genetic Algorithm Based Graph Theory

Compiler and Runtime Support for Efficient Software Transactional Memory

I/O-Efficient Graph Algorithms

Benefitting from Efficient Insurance Brokers Software