1 / 36

Muhammad Noman Ashraf

eldon
Download Presentation

Muhammad Noman Ashraf

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Data-Flow Computations Using Canonical TED RepresentationM. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation” , in IEEE Transactions on Computer-Aided design of Integrated Circuits and SystemsECE 667 Synthesis and Verification of Digital SystemsSpring 2011 Muhammad Noman Ashraf Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  2. Overview • Motivation • TED Review • Related Work • TED Decomposition System • TED Linearization • Product Term Extraction • Sum-Term Extraction • Reordering • DFG Generation • Replacing constant multipliers by Shifters • Conclusion • References

  3. Motivation F=a⋅ f⋅ g+a⋅ fd⋅ c+a⋅ c⋅ e⋅ g Minimum number of operations: 5MPY, 2ADD 8MPY, 2ADD number of operations: 6MPY, 2ADD F=a⋅ (f⋅ (g+d⋅ c)+c⋅ e⋅ g) F=(a⋅ f)(g+d⋅ c)+(a⋅ c)⋅ e⋅ g L=3MPY+1ADD 5 L = 3MPY+2ADD 4 Res: 2MPY,1ADD 4 Res: 2MPY,1ADD 3 3 2 2 1 1 Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  4. TED Review [Construction] x(zu+qw) (zu+qw) + zu + Canonical for the given order: x,z,u,q,p,y,w + NON-LINEAR Notation: qw 2 w pw 2 1 w ^2 1 yw Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  5. RELATED WORK • HDL Compilers • High level synthesis systems – Cyber, Spark, Catapult C – Lacks local optimility • Kernel based decomposition[Hosangadi et al, Optimizing Polynomial Expressions by algebraic factorization and cse, IEEE Transactions 2005] • Lacks canonicity • Cut based decomposition (TED based) [Askar et al. “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007] • Limitation – only applicable to TEDs with disjoint decomposition property

  6. Cut based decomposition (Related Work) • Top down approach • Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs • Different sequence of cuts results in different DFG Sequence - A3,A1,M1,A2

  7. Cut based decomposition (Related Work) • Top down approach • Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs • Different sequence of cuts results in different DFG Sequence - A3,A1,M1,A2 Sequence – A1,A3,M1,A2

  8. TED decomposition [TDS] • Cut based decomposition mentioned earlier only works for TEDs with disjoint decomposition property • Many TEDs don’t have this property • New approach – Bottom up • Identify algebraic operations and extract from the graph • Also works for TEDs without disjoint decomposition property • TED based factorization, CSE, and decomposition jointly referred asTED decomposition • Systematically involves • Linearization • Product-term extraction • Sum-term extraction • Reordering • DFG generation

  9. TDS System Overview TED-based Transformations C, Behavioral HDL Matrix transforms, Polynomials Variable ordering TED linearization DFG extraction Common subexpression elimination (CSE) TDS netlist TED factorization & decomposition Structural elements Functional TED Original DFG Constant multiplication & shifter generation Optimized DFG Structural DFG DFG-based Transformations High Level Synthesis (GAUT) TDS netlist Static timing analysis Resource constraints Design constraints Design objectives Latency optimization RTL VHDL TDS flow Behavioral transformations Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) HLS flow

  10. TED Linearization • TED naturally represents polynomial in its factored form • This efficiency is missing when considering non-linear expressions a could be factored out F=a2c+abc F=a1(a2+b)c split a^2 into a1 and a2

  11. TED Linearization [back to previous example] split w^2 into w1 and w2 TED Decomposition Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  12. TED Linearization [Concept] • split xk = x1.x2.x3…..xk , where xi =xj for all i,j • iteratively perform splitting on high order nodes • above substitution results in Horner form which contains minimum no. of multiplications x ^0 ^n ….. Fn F0 F1 ^1 x1 ^0 x2 ^1 F0 ^0 ^1 F1 xn ^1 ^0 Fn Fn-1

  13. Product Term Extraction • Extractable Product Term – product of variables which appear in expression only once • Can be extracted from TED without duplicating any of it’s variables • Set of nodes connected by a series of multiplicative edges only • starting and ending nodes can have incident additive edges • Starting and ending nodes can have more than one incoming or outgoing multiplicative edge • Ending node can be terminal node 1 • [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node use depth first approach for including nodes in product term

  14. P1 z has only one * parent …YES z has only one * child path …NO P2 zu BACKTRACK u has only one * parent …YES u has only one child path …YES CONTINUE Product-Term Extraction [back to example] start Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  15. Sum Term Extraction • Extractable Sum Term – sum of variables which appear in expression only once • Can be extracted from TED without duplicating any of it’s variables • “Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only” • [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from the list if connected by additive edges only • [TDS] Uses associativity property of addition

  16. Keep support (irreducible) S1 Sum-Term Extraction [back to example] start Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  17. Sum Term Extraction • Extractable Sum Term – sum of variables which appear in expression only once • Can be extracted from TED without duplicating any of it’s variables • “Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only” • [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from the list if connected by additive edges only • [TDS] Uses associativity property of addition

  18. Example to illustrate Associativity* S2=a+c S1=b+d

  19. If Sum term extraction results in more product terms, go back Stop when TED is Irreducible. Sum-Term Extraction [cont. – back to example] Now generate DFG – (to be explained later) Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  20. Stop when TED is Irreducible. P5 S3 S2 P3 P4 Reordering [Back to previous example -> Iteration 2 extraction] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  21. DFG Generation and Optimization • Transform each irreducible TED into simple DFG • Additive edge -> addition operation • Multiplicative edge -> multiplication operation • Break multiple operands operations into chain of operations • [TDS] maintain a hash table for DFG nodes keyed by the corresponding function • Helps in reusing the node, if same function/expression found again • Captures redundancy due to poor variable order during factorization • DFG is not unique • Can be restructured and balanced to minimize cost

  22. Data Flow Graph 4 total: 5MPY, 3ADD 3 Reordering cost 2 L=2MPY+2ADD 1 Req 3MPY, 2ADD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  23. Cost involves Reordering of variable Extraction DFG generation Annotating Latency and resource requirements S3 P4 P3 S2 Reordering [-> Iteration 3 extraction] L=2MPY+2ADD Req 3MPY, 2ADD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  24. F = S3 = P4+P3 = w⋅S2+x⋅P1 = w⋅(q+S1)+x⋅(z⋅u) = w⋅(q+P2+y)+x⋅z⋅ u = w⋅(q+p⋅w+y)+x⋅z⋅u 1+ 1× F 1+ 4 5 1+ 1× 1× 1× 3 4 3 2 1 2 total: 4MPY , 3ADD Reordering cost Previous cost L=2MPY+3ADD L=2MPY+2ADD 1 Generating and evaluating new Data Flow Graph [Iteration 3] Req 1MPY,1ADD L=2MPY+2ADD L=2MPY+2ADD Req=3MPY,2ADD Req 2MPY, 1ADD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  25. Design Space Exploration 4 3 2 1 Reordering [-> Iteration 4 extraction,DFG generation] Through reordering all cases can be obtained Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  26. Replacing constant multipliers* • By shifters • Transform constant multiplications into shifters, while considering factorization involving shifters • Steps • Represent constant in CSD format – Use shift variable Li (instead of 2i for shifting i bits • Generate TED with shift variables, linearize it and perform decomposition • Replace terms involving shift variables (Li) by i-bit shifters 7a + 6b ((a+b)<<3) – (a+(b<<1)) L3(a+b) - L.b - a (L3-1)a+(L3-L)b

  27. RECAP Read in the CDFG file (cdfg) or polynomial expression (poly) or using pre-coded DSP transforms (tr) Translate into functional TED (dfg2ted) and structural elements (comparators etc.) Linearize its data path (linearize) Iterate Iterate Product term extraction Sum term extraction Reorder to minimize latency (reorder) Set of irreducible TEDs Produce Final DFG (ted2dfg)and annotate back the CDFG file (write) Data flow and computation intensive designs - DSP TDS – TED Decomposition System Design Space Exploration Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  28. Conclusion • Results in the paper show 15% Latency improvement and 7% area reduction when using DFG generated from TDS instead of using KBD • Far better results when compared to original DFG • TDS – front end to GAUT • Fundamental limitation – decomposition dependent upon variable reordering which is an expensive operation

  29. REFERENCES • M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems • M. Ciesielski, S. Askar, D. Gomez-Prado, J. Guillot, and E. Boutillon, “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007, pp. 455–460 • TDS—TED-Based Dataflow Decomposition System, Univ. Massachusetts,Amherst, MA. [Online]. Available: http://www.ecs.umass.edu/ece/labs/vlsicad/tds.html

  30. QUESTIONS?

  31. Experiment Setup* TED-based Transformations KBD ORIGINAL C, Behavioral HDL Matrix transforms, Polynomials Variable ordering TED linearization TED DFG extraction Common subexpression elimination (CSE) TDS netlist TED factorization & decomposition Structural elements Functional TED Original DFG Constant multiplication & shifter generation Optimized DFG Structural DFG DFG-based Transformations High Level Synthesis (GAUT) TDS netlist Static timing analysis Resource constraints Design constraints Design objectives Latency optimization RTL VHDL TDS flow Behavioral transformations HLS flow Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  32. Results* KBD KBD KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  33. Results: Quintic Spline* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  34. Results: Quartic spline* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

  35. Improvement over KBD and Original* KBD KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

More Related