180 likes | 293 Views
Last Time Interactions of scheduling and register usage. Today Interactions of scheduling and instruction level parallelism. 380C. Shape of Expressions. Proebsting & Fischer assume a fixed expression tree Hunt et al. reorganize commutative and associative operations in expression trees to
E N D
Last Time Interactions of scheduling and register usage Today Interactions of scheduling and instruction level parallelism 380C CS 380C
Shape of Expressions • Proebsting & Fischer assume a fixed expression tree • Hunt et al. reorganize commutative and associative operations in expression trees to • Increase ILP • Decrease critical path length • Group constants
Motivation • Long pipelines and fine grain parallel processors (e.g., SuperScalar RISC, VLIW & EDGE) benefit from instruction level parallelism. • Decreasing critical path length improves loop performance • Grouping constants improves constant propagation.
Example • Let M denote intermediate values we need to preserve. • Let I denote associative operations whose intermediate values we do not need to preserve.
Example • What should we do to balance this tree?
Baer & Bovet: Balance Subtree Approach • Given a tree of associative and commutative operators, and other operators • Rearrange the tree to make it more balanced • Caveats • Preserve intermediate values in the expression tree that are used elsewhere • Preserve subtrees rooted by non-associative operations
Problem - unbalanced • Although each preserved node has a balanced sub-tree, the whole tree isn’t very balanced. • Note that preserved nodes with many leaves can be closer to the root.
Solution – Huffman Coding • Give constants weight 0 • Give other leaves weight 1 • Give interior nodes weight by summing their leaves • Put them all in a sorted worklist • Take two lowest weight nodes out of the worklist until the worklist is a singleton • Combine them in a subtree • Weigh this interior node by summing its leaves, insert it in the worklist • Weigh preserved nodes by summing subtrees • Guarantees optimally balanced tree
Results • Mixed • Improves a few programs by a lot, but not a lot of programs on TRIPS simulator • Huffman minimizes the sum of the tree • Baer and Bovet minimize the length of the critical path • In practice, they often attain the same result for expression reduction • For software fanout trees, Huffman seems to tolerate unknown latencies through the program better than Hartley and Casavant, which minimizes the length of the critical path given non-unit weights
Summary • Reorganize trees of commutative and associative operations. • Use Huffman coding to produce an overall balanced tree • Improves ILP • Decrease critical path length • Group constants
Next Time • P. Briggs, Register Allocation via Graph Coloring, PhD dissertation, Rice University, April 1992, Chapters 1, 2, 3, 6, 7, 8 & 9 • Skim and/or cherry pick depending on your interests