1 / 25

Optimizing Address Assignment for Scheduling Embedded DSPs

Optimizing Address Assignment for Scheduling Embedded DSPs. Chun Xue, Zili Shao, Dr. Edwin H. M. Sha Dept. of Computer Science University of Texas at Dallas Dr. Bin Xiao Dept. of Computing Hong Kong Polytechnic University. Outline. Introduction Motivating Examples The Algorithms

suki
Download Presentation

Optimizing Address Assignment for Scheduling Embedded DSPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing Address Assignment for Scheduling Embedded DSPs Chun Xue, Zili Shao, Dr. Edwin H. M. Sha Dept. of Computer Science University of Texas at Dallas Dr. Bin Xiao Dept. of Computing Hong Kong Polytechnic University

  2. Outline • Introduction • Motivating Examples • The Algorithms • Experimental Results • Conclusion

  3. Motivation • DSP processors provide dedicated Address Generation Units (AGUs). • AGUs can reduce address arithmetic instructions by modifying address register in parallel with the current instruction • Three modes: Auto-increment, Auto-decrement, and using Modify Register • Subsuming the address arithmetic instructions into indirect address modes improves code size and performance

  4. Load *(AR0) ADAR AR0, 1 3 Add *(AR0) 4 ADAR AR0, 1 5 Stor *(AR0) 1 Load *(AR0)+ 2 Add *(AR0)+ 3 Stor *(AR0) AGU Example To Calculate: C = A + B Assembly Code without AGU Memory Layout Low A AR0 Assembly Code with AGU B C High The address arithmetic instructions can be reduced by modifying address register in parallel with the current instruction by AGU

  5. Address Assignment Optimization • With a careful placement of variables in memory, • total number of address instructions can be reduce • Both code size and timing performance is improved • Address assignment – the optimization of memory layout of program variables • For single functional unit processors, this problem has been studied extensively. • However, little research has been done for multiple function units architecture like TI C6x VLIW processors.

  6. The Previous Work – Single FU Processor • Address Assignment is first studied by Bartley and Liao. • They modeled the program as a graph theoretic optimization problem. • The problem is proved to be NP-hard. • An efficient algorithm is used to find the Maximum Weighted Path Covering

  7. The Previous Work – Single FU Processor • Leuper and Marwedel proposed a tie-breaking heuristic and a variable partitioning method • Gebotys modeled the problem as a network flow problem • All these works have been done on Single Functional Unit with fixed schedule.

  8. The Previous Work – Single FU Processor • Some work has been done on combining scheduling and address assignment • Rao et al. suggested modifying variable access sequence using expression tree transformations • Choi and Kim proposed an algorithm that tightly couples address assignment and scheduling. • All these algorithms target on single FU processors and can not be directly applied on multiple FU processors.

  9. Example for Multiple-FU-Processor A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g (a) An input DAG (b) The Computation in each node

  10. Fix Schedule with Variables Placing by Alphabetic-order

  11. Fix Schedule with the Solve-SOA Address Assignment Optimization

  12. Schedule Length CAN NOT be reduced • After the Solve-SOA algorithm (by Liao et. al.) is applied, the variables are re-arranged in memory • The number of address instructions is reduced • However, the total schedule length is not reduced as much • Why? Because of dependency constraints in the fixed schedule

  13. Address Assignment + Scheduling

  14. Address Assignment with Scheduling • In this example, • we first obtain a nice address assignment • then we schedule based on the obtained address assignment • Therefore, both schedule length and the number of address instructions can be reduced.

  15. Our Basic Idea • Address assignment with scheduling for multiple function units architecture • Construct a nice address assignment first • Perform scheduling based on the obtained address assignment • The experimental results show • 14-18% improvement over list scheduling • 7-10% improvement over Solve-SOA

  16. MFSchSOA Algorithm • Get address assignment by mSOA( ), a modified Solve-SOA. Take a partial access sequence as input and generate an Address Assignment • Perform a multi-FU list scheduling with schedule length and address operation minimization • Assign the longest path from this node to leaf node as priority • Schedule based on a weighted bipartite matching graph

  17. Partial Access Sequence A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g (a) An input DAG (b) The Computation in each node Node A | B | C | D | E | F | G | H d h a | a h e | e f d | b f b | b e f | a b g | f a h | b g b Variable (c) Partial Access Sequence

  18. Address Assignment by mSOA Algorithm d h a | a h e | e f d | b f b | b e f | a b g | f a h | b g b (a) Partial Access Sequence d g d 1+1+1=3 1 h b a 1 1 1+1=2 g h 1 b f 1+1+1=3 1 f 1+1=2 e a 1 e (b) Access Graph: edge e(u,v) denotes u and v are adjacent to each other w(e) times in the partial access sequence. (c) The Address Assignment by Maximum Weight Path Cover from the Solve-SOA

  19. Scheduling on 2-FU Processor d A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g h a g b f e (b) The computation in each node (c) The access sequence (a) An input DAG FU1 4 3 3 4 FU1 Step Node G C C G D F 1 Priority (e) Ready list (f) The schedule in the first step (d) The priority of each node

  20. Scheduling on 2-FU Processor A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g d G: fah D bfb h 3 FU1 2 a 2 g 3 F abg FU2 3 b 1 C:efd f A dha e (a) The computation in each node (b) The access sequence (d) The access sequence FU1 FU1 Step 3 3 3 G 1 C D F A 2 F A (c) Ready list (e) The schedule in the second step

  21. Weighted Bipartite Graph • The weight between FUi and the ready node u is calculated as follows: Z -2 if (distance between X & Y =0) W= Z – 1 if (distance between X & Y =1) Z otherwise Where: Z = the priority of u X = the last Variable accessed in FUi Y = the first Variable accessed in u

  22. Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 2 functional units.

  23. Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 3 functional units.

  24. Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 4 functional units.

  25. Conclusions • In this paper, we propose an approach to optimize address operations on Muti-FU architecture by considering address assignment and scheduling together. • In our approach, we construct a nice address assignment first and then perform scheduling based on the obtained address assignment • The experimental results show our approach can greatly reduce code size and schedule length comparing with the previous work. * 14-18% improvement over list scheduling * 7-10% improvement over directly using Solve-SOA

More Related