1 / 64

Evaluation of Offset Assignment Heuristics

Evaluation of Offset Assignment Heuristics. Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite de Versailles, France. Outline. Background Traditional Approach to Offset Assignment Simple Offset Assignment

donkor
Download Presentation

Evaluation of Offset Assignment Heuristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Offset Assignment Heuristics Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite de Versailles, France

  2. Outline • Background • Traditional Approach to Offset Assignment • Simple Offset Assignment • Address-Register Assignment • Improving the Problem Model • Optimal Address-Code Generation • Memory Layout Permutations • Evaluating Current Heuristics • Methodology • Results • Conclusions and Future Work

  3. Outline • Background • Traditional Approach to Offset Assignment • Simple Offset Assignment • Address-Register Assignment • Improving the Problem Model • Optimal Address-Code Generation • Memory Layout Permutations • Evaluating Current Heuristics • Methodology • Results • Conclusions and Future Work

  4. Background • Digital Signal Processors (DSPs) have few general purpose registers • Program variables kept in memory • Address Registers (AR) used to access variables • After a variable is accessed, the AR can be auto-incremented (or decremented) by one word in the same cycle.

  5. Processor Model • Texas Instruments TMS320C54X DSP family: • Accumulator-based DSP • 8 Address Registers • Initializing an address register requires 2 cycles of overhead • Explicit address computations require 1 cycle of overhead • Using auto-increment (or auto-decrement) has no overhead.

  6. $AR0 = &A $ACC = *$AR0 $AR0 = $AR0 + 2 $ACC += *$AR0 $AR0 = &A $ACC = *$AR0++ $ACC += *$AR0 Processor ModelExample: add ‘A’ and ‘B’, store in accumulator 0x1000 0x1001 0x1002 0x1000 0x1001 0x1002 Auto-Increment Explicit address computation

  7. $AR0 = &A $ACC = *$AR0 $AR0 = $AR0 + 2 $ACC += *$AR0 $AR0 = &A $ACC = *$AR0++ $ACC += *$AR0 Processor ModelExample: add ‘A’ and ‘B’, store in accumulator 0x1000 0x1001 0x1002 0x1000 0x1001 0x1002 Auto-Increment Explicit address computation

  8. The Offset-Assignment Problem • Given k address registers and a basic block accessing n variables, find a memory layout that minimizes address-computation overhead. • How should the variables be placed in memory? • Which register should access each variable?

  9. Outline • Background • Traditional Approach to Offset Assignment • Simple Offset Assignment • Address-Register Assignment • Improving the Problem Model • Optimal Address-Code Generation • Memory Layout Permutations • Evaluating Current Heuristics • Methodology • Results • Conclusions and Future Work

  10. Address Register Assignment Sub-Sequence Sub-Sequence Sub-Sequence Simple Offset Assignment Simple Offset Assignment Simple Offset Assignment Sub-Layout Sub-Layout Sub-Layout Address-Code Generation Address-Computation Overhead Traditional Approach to Offset Assignment Basic Block Generate Access Sequence Access Sequence

  11. Traditional Approach:Simple Offset Assignment (SOA) • In 1992, Bartley introduced the simplest form of the offset assignment problem: Given a single address register and basic block with n variables, find a memory layout that minimizes overhead. • Equivalent to finding a maximum weight path cover (NP-complete) • Many researchers have proposed heuristics for this problem: • Liao et. al. (1996) • Leupers and Marwedel (1996) • Sugino et. al. (1996)

  12. Simple Offset Assignment (SOA) • Fix the access sequence • Assume only one address register (k = 1) • Find an ordering of variables in memory (memory layout) that has minimum overhead. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layout: C F 2 2 2 D E

  13. Simple Offset Assignment (SOA) • Create Access Graph G = (V, E) • V = variables • weight of edge is the frequency of consecutive accesses • A path defines a memory layout -- Find the Maximum Weight Path Cover • NP-Complete! B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layout: C F 2 2 2 D E

  14. Simple Offset Assignment (SOA) • Create Access Graph G = (V, E) • V = variables • weight of edge is the frequency of consecutive accesses • A path defines a memory layout -- Find the Maximum Weight Path Cover • NP-Complete! B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layout: C F 2 2 2 D E

  15. Traditional Approach:General Offset Assignment (GOA) • Problem presented by Liao et. al. in 1996. • Given k address registers, and a basic block with n variables, find an assignment of variables to address registers that minimizes the total overhead of all registers. • This problem formulation is more accurately described as Address-Register Assignment (ARA). • Consists of SOA problems, and is at least NP-hard. • Many researchers have proposed heuristics for address-register assignment: • Leupers and Marwedel (1996) • Sugino et. al. (1996) • Zhuang et. al. (2003)

  16. General Offset Assignment (GOA) • Fix the access sequence • Allow multiple address registers (k>1) • Find an ordering of variables in memory (memory layout) that has minimum overhead. • Assign each variable to an address register to form access sub-sequences. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Sub-sequence1: ‘a b c b c a’ Sub-sequence2: ‘d e f e f d’ C F 2 2 2 D E

  17. General Offset Assignment (GOA) • Each sub-sequence can be viewed as an independent SOA problem. • Solve each sub-sequence as independent SOA problems. • More appropriate to call this problem the Address Register Assignment (ARA) problem. • Requires solving SOA instances, so is at least NP-hard. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Sub-sequence1: ‘a b c b c a’ Sub-sequence2: ‘d e f e f d’ C F D E 2

  18. General Offset Assignment (GOA) • Each sub-sequence can be viewed as an independent SOA problem. • Solve each sub-sequence as independent SOA problems. • More appropriate to call this problem the Address Register Assignment (ARA) problem. • Requires solving SOA instances, so is at least NP-hard. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: C F D E 2

  19. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: C F D E 2 AR1 AR0

  20. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: C F D E 2 AR1 AR0

  21. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘ad b e c f b e c f a d’ Memory Layouts: C F D E 2 AR1 AR0

  22. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d be c f b e c f a d’ Memory Layouts: C F D E 2 AR1 AR0

  23. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d b e cf b e c f a d’ Memory Layouts: C F D E 2 AR1 AR0

  24. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d b e c f be c f a d’ Memory Layouts: C F D E 2 AR1 AR0

  25. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d b e c f b e cf a d’ Memory Layouts: C F D E 2 AR1 AR0

  26. Address-Code Generation • Recall that variables are assigned to address registers. • There is nothing left to decide – each address register has a defined sequence of accesses. • Imposes a restriction that all access to a variable is done by a single address register. B A 2 Ex. Access Sequence: ‘a d b e c f b e c f ad’ Memory Layouts: C F *Requires Explicit Address Computations D E 2 AR1 AR0

  27. Traditional Approach to Offset Assignment ‘a d b e c f b e c f a d’ Address Register Assignment ‘d e f e f d’ Sub-sequence and memory layout accessed by AR0 ‘a b c b c a’ Sub-sequence and memory layout accessed by AR1 Simple Offset Assignment Simple Offset Assignment [a, b, c] [d, e, f]

  28. Outline • Background • Traditional Approach to Offset Assignment • Simple Offset Assignment • Address-Register Assignment • Improving the Problem Model • Optimal Address-Code Generation • Memory Layout Permutations • Evaluating Current Heuristics • Methodology • Results • Conclusions and Future Work

  29. OptimalAddress-Code Generation • Given a fixed access sequence and memory layout, it is possible to generate optimal addressing-code in polynomial time: • Minimum-Cost Circulation (Gebotys, 1997) • Minimum-Weight Perfect Matching (Udayanarayanan, 2000)

  30. Outbound edges from S Cost = 0 S Access Sequence A a1 D All vertices require one unit of flow a2 B a3 E a4 C a5 F a6 B Edge costs Dependent on distance Between variables accessed a7 E a8 C a9 F a10 A a11 D a12 Inbound edges to T Cost = 0 Capacity = number of ARs Cost = initialization overhead T AR1 AR2 B C A D E F Memory Layout OptimalAddress-Code Generation • Build a network-flow graph • Vertices represent variable accesses • For each access ai that occurs before another aj, there is an edge (ai,aj) (not all shown the graph). • Edges represent an opportunity for a register to access variables. • Each unit flow represents the accesses performed by an address register. • Optimal Address-Code is found by finding a minimum-cost circulation.

  31. Traditional Approach to Offset Assignment Access Sequence Address Register Assignment NP-Hard Sub-Sequence Sub-Sequence Sub-Sequence Simple Offset Assignment Simple Offset Assignment Simple Offset Assignment NP-Complete Sub-Layout Sub-Layout Sub-Layout Address-Code Generation Solved, but not used! Address-Computation Overhead

  32. Memory Layout Permutations (MLP) • Since optimal address-code generation algorithms exist, they can be applied after a memory layout is formed (by traditional approaches). • However, the traditional approach generates multiple sub-layouts that were originally assumed to be independent. • How is a single memory layout formed from a set of sub-layouts?

  33. Memory Layout Permutations • Let Mibe a memory sub-layout. • Let Mir be the reciprocal of Mi • Given an access sequence and m memory sub-layouts, arrange {(M1|M1r),…,(Mm|Mmr)}, such that overhead is minimum when the sub-layouts are placed contiguously in memory.

  34. ‘a d b e c f b e c f a d’ Memory Layout Permutations Example: Address Register Assignment This is an optimal address register assignment These are optimal simple offset assignments All possible Memory Layout Permutations (all have cost > 4) Optimal Layout: {b, c, a, d, e, f} with cost = 4 is not found ‘d e f e f d’ ‘a b c b c a’ Simple Offset Assignment Simple Offset Assignment {a, b, c} {d, e, f} Memory Layout Permutations [a, b, c, d, e, f], [f, e, d, c, b, a] [c, b, a, d, e, f], [f, e, d, a, b, c] [a, b, c, f, e, d], [d, e, f, c, b, a] [c, b, a, f, e, d], [d, e, f, a, b, c]

  35. Outline • Background • Traditional Approach to Offset Assignment • Simple Offset Assignment • Address-Register Assignment • Improving the Problem Model • Optimal Address-Code Generation • Memory Layout Permutations • Evaluating Current Heuristics • Methodology • Results • Conclusions and Future Work

  36. Basic Block Compile with gcc Access Sequence Compute Overhead of All Layouts using Minimum-Cost Flow Experimental MethodologyEvaluating the Solution Space • Testcases are DSP code kernels from the UTDSP benchmark suite. • Use gcc to obtain access sequences. • The quality of a memory layout is evaluated using the minimum-cost circulation technique. • The entire solution space is found for each access sequence, to be used as a point of reference.

  37. Experimental MethodologyEvaluating Current Heuristics Access Sequence • Identified and implemented three Address-Register Assignment heuristic algorithms: • Leupers • Sugino • Zhuang Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

  38. Experimental MethodologyEvaluating Current Heuristics Access Sequence • Identified and implemented five Simple Offset Assignment heuristic algorithms: • Liao • Leupers • ALOMA • Order-First Use (OFU) • Branch and Bound (B&B) Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

  39. Experimental MethodologyEvaluating Current Heuristics Access Sequence • Each combination of ARA and SOA algorithm generates a set of sub-layouts. • All possible memory layout permutations are generated, forming a set of memory layouts. • Each memory layout is evaluated using the Minimum-Cost Circulation technique. Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

  40. Results • The 15 combinations of algorithms produce 15 distributions overhead values. • The distributions are aggregated into one distribution. • The aggregate distributions represent the solution space of all current algorithms.

  41. Results • Memory layouts have a significant impact on overhead. • Some layouts have 100% higher overhead than the minimum. • Over 99% of all layouts have an overhead that is 50% higher than the minimum.

  42. Results • Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself. • In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.

  43. Results • Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself. • In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.

  44. Distribution of Overhead ValuesTestcase: iir_arr_swp -- infinite impulse response filter

  45. Exhaustive Solution SpaceTestcase: iir_arr_swp -- infinite impulse response filter

  46. Algorithmic Solution SpaceTestcase: iir_arr_swp -- infinite impulse response filter

  47. Efficiency of SOA Algorithms Access Sequence • For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. • The distributions can be aggregated to form a single distribution. Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

  48. Efficiency of SOA Algorithms Access Sequence • For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. • The distributions can be aggregated to form a single distribution. Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

  49. Efficiency of SOA Algorithms Access Sequence • For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. • The distributions can be aggregated to form a single distribution. Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

  50. Efficiency of SOA Algorithms Access Sequence • For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. • The distributions can be aggregated to form a single distribution. Leupers Sugino Zhuang Sub-Sequences Liao Leupers ALOMA OFU B&B Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

More Related