1 / 20

Memory Efficient Software Synthesis from Dataflow Graph

Memory Efficient Software Synthesis from Dataflow Graph. Wonyong Sung, Junedong Kim, Soonhoi Ha Codesign and Parallel Processing Lab. Seoul National University. Contents. Introduction Code Generation from Block Diagram Specification Synchronous Data Flow and Single Appearance Schedule

rue
Download Presentation

Memory Efficient Software Synthesis from Dataflow Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Efficient Software Synthesis from Dataflow Graph Wonyong Sung, Junedong Kim, Soonhoi Ha Codesign and Parallel Processing Lab. Seoul National University

  2. Contents • Introduction • Code Generation from Block Diagram Specification • Synchronous Data Flow and Single Appearance Schedule • Proposed Strategies • Optimization 1 : code sharing optimization • Optimization 2 : minimize buffer requirement • Experiments • Conclusions

  3. Introduction • Motivations • Embedded system has limited amount of memory • large program = memory cost, performance penalty, power consumption • New trend of software development : high level design methodology • growing complexity, fast design turn-around time, limited budget, etc. • Goal of Research • Reduce the code and data size of automatically generated software • In an automatic software synthesis environment • Specification = Dataflow graph with SDF(Synchronous DataFlow) semantics

  4. Software Synthesis from SDF graph main(){ for(i=0;i<6;i++){A} for(i=0;i<4;i++){B} for(i=0;i<3;i++){C} for(i=0;i<2;i++){D} } main(){ for(i=0;i<2;i++){ for(j=0;j<3;j++){A} for(j=0;j<2;j++){B} } for(i=0;i<3;i++){C} for(i=0;i<2;i++){D} } B 3 1 2 2 A D 1 3 2 2 C Possible Schedules : = AABCABACDABABCD = (6A)(4B)(3C)(2D) = (2(3A2B))(3C)(2D) Single Appearance Schedule (SAS)

  5. Previous Efforts • Single Appearance Schedule (SAS): APGAN,RPMC • [by Battacharyya et. al.] in Ptolemy Group • SAS guarantees the minimum code size (without code sharing) • APGAN,RPMC : heuristics to find data minimized SAS schedule • ILP formulation for data memory minimization • [by Ritz et. al.] in Meyr Group • flat single appearance schedule + sharing of data buffer • Rate optimal compile time schedule • [by Govindarajan et. al.] in Gao Group • tried to minimize the buffer requirement using linear programming • An algorithm to compute the smallest data buffer size • [by Ade et. al.] in GRAPE group

  6. Proposed Strategies • Coding style • not stuck to one coding style, hybrid approach • generated code is a mixture of inlines and functions • Optimization 1: Code Sharing • Multiple instances of a same kernel treated as different node in SAS • Code sharing optimization has gain(block size) and cost(context size) • Optimization 2: Schedule Adjustment • give up single appearance schedule to reduce the data size • (1) represents schedule information with BTLC data structure • (2) find possible location for adjustment • (3) schedule adjustment

  7. BTLC Flowchart of Optimization Procedure Get SAS schedule [RPMC,APGAN] code-block size context size Code sharing optimization Schedule Adjustment C code generation

  8. Example of Code Sharing (CD2DAT) ramp sine  fir1 fir2 fir3 fir4 xgraph ramp’ sine’ xgraph Code before sharing for(int i=0;i<2;i++) { { /* code for fir1 */ ……………… out = tap*input[i]’ ……………… } } /* code for fir 2 */ …………….. Code after sharing for(int i=0;i<2;i++) fir(1); for(int i=0;i<3;i++) fir(2); …………… void fir(int context){ ……………… context_FIR[context].out... ……………… } context definition typedef struct{ double *out; int output_ofs; int output_bs; int output_nx; …………. double decimation; double tap; }context_FIR;

  9. Code Size Overhead (in Sparc/Solaris) without context with context ….. = value; ….. = *(context_CGCRamp[context].value); ldd [%fp + -336],%o0 sethi %hi(0x20800),%o1 ld [%o1+0x3c8], %o0 mov %o0, %o2 sll %o2, 2, %o1 add %o1, %o0, %o1 sll %01, 3, %o0 add %fp, -424, %o1 add %o1, %o0, %o2 ld [%o2 + 0x1c], %o0 ldd [%o0], %o2 4 bytes 40 bytes Reference Overhead = 36 bytes!

  10. Optimization 1 : Code sharing • Multiple instances of a same kernel have their own contexts • Kernel code should be transformed into shared version function • Shared Version • references are only through context variable • Gain and cost of sharing • Gain = (# instances -1)  (code block size) • Cost = (#instances)  (context variable size) + (code block overhead) • Code sharing is performed only when the gain is larger than the cost

  11. 1 > (-1)  (-1)  >    >  +    > context + reference +    > context + shared Decision Formula (1)  = code sharing overhead = context + reference (2) context = pi(pi), pi  ports where, (x) = 3*sizeof(int) + sizeof(pointer) (3) reference = t {S,C,AS,AP}((t)(t)) (t) = reference count (t) = unit overhead t = type of reference (4)  = code block size (5)  = number of instances

  12. C A B 3 7 5 6 A B C Optimization 2 : Adjusting SAS • Adjusting Single Appearance Schedule • 2(7A3B)5C ==> 51 • 2(7A3B2C)C ==> 39 • give up single appearance schedule • BTLC (Binary Tree with Leaf Chain) G 5 2 [6,0,0] = [input, inside, output] 3 [0,0,21] 7 [21,0,15] [7,0,5] [0,0,3]

  13. G [0,30,0] 5 2 [30,0,0] [0,21,30] 3 [0,0,21] 7 [21,0,15] C In general [7,0,5] [0,0,3] 30 21 W  A B W = |OL  IR| I =  | IL  IR - W | O =  | OL  OR -W | L R [I,W,O] Computation of Buffer Requirements 2 7 3 7 5 3 A B 21 30

  14. BTLC Flowchart of Schedule Adjustment SAS schedule Construct BTLC Compute buffer requirement Find candidate for adjustment no found yes Adjust schedule (split a chain) Done code generation

  15. G 2 5 7 3 C A B Splitting A Chain • Finding split candidate • a chain which has the largest number • in this example BC is selected • Schedule after splitting • 2(7A3B2C)C • In general, for a schedule that has two clusters aCabCb(a and b are loop counts) new schedule is defined as • a(Ca(b/a)Cb)(b%a)Cb) , if a<b • (a%b)Ca b((b/a)CaCb ), otherwise [0,30,0] [30,0,0] [0,21,30] [21,0,15] [0,0,21] [6,0,0] 30 21 Split point [0,0,3] [7,0,5] Schedule = 2(7A3B)5C

  16. Decision Formula G [0,6,0] [0,12,6] [6,0,0] 2 1 [12,0,0] [0,21,15] 1 2 C |Cluster| = |W| value of the cluster [6,0,0] [21,0,15] 6 • New Schedule • 2(7A3B2C)C • Gain = 12 [0,0,21] 7 3 C [6,0,0] 12 21 A B [7,0,5] [0,0,3]

  17. [0,280,0] G [0,56,280] [280,4,0] 7 40 [56,0,40] [0,6,56] [7,0,4] [4,0,0] F4 4 7 8 [6,0,8] [0,1,6] 280 4 3 2 F3 X1 [1,0,0] [1,0,2] [7,0,5] [0,1,1] 56 1 F1 [0,35,0] 6 1 [0,1,2] G [3,0,4] F2 1 X2 [1,0,0] [0,35,35] 1 [0,2,1] [35,4,0] 7 5 1 fork [1,0,2] [0,56,40] [35,4,0] 1 [0,1,2] 1 5 [7,0,4] [4,0,0] F4 4 1 M [2,0,1] [4,0,0] [56,35,40] 2 [0,6,56] [0,0,2] 4 [7,0,4] F4 4 7 8 1 S2 [1,0,1] X1 [1,0,0] [6,0,8] 35 1 35 [0,1,1] 4 2 1 R2 [0,0,1] F3 X1 [7,0,5] [1,0,0] 0 [0,0,1] 56 1 R1 S1 [1,0,1] [3,0,4] F2 Experiment : CD2DAT

  18. Experimental Result Program size after each optimization CD2DAT Filter Bank SAS 13672 28512 Code Sharing 12768 22024 Schedule Adjustment 12296 22024 Memory behavior of CD2DAT in ARM7 Fetches Miss SAS 17098177 57189 Code Sharing 17573923 52867 Schedule Adjustment 17499386 54331

  19. Conclusion • Our Environment • PeaCE : Ptolemy extension as Codesign Environment • Optimization Techniques in Software Synthesis • For automatic code generation from dataflow graph • Joint minimization of code and data size • Selective application code sharing and schedule adjustment to SAS • Future works • Clustering : multiple fine grain nodes into a large one • increase chance of code sharing • Buffer sharing • further reduce the buffer size and increase the cache effect

  20. Thank You !

More Related