1 / 13

A Graph Based Algorithm for Data Path Optimization in Custom Processors

A Graph Based Algorithm for Data Path Optimization in Custom Processors. J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems University of California, Irvine. Outline. Introduction Design Methodology Initial Allocation Architecture Wizard

veata
Download Presentation

A Graph Based Algorithm for Data Path Optimization in Custom Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems University of California, Irvine

  2. Outline • Introduction • Design Methodology • Initial Allocation • Architecture Wizard • Critical Path Extraction • Spill Algorithm • Results • Conclusion

  3. Introduction (1of 2) • Complexity of SoC rising • Short time to market • Need for processors specialized for different application domains • General purpose processors • Often slow and power hungry • Full HW design • Expensive and rigid for debugging and feature extension • Custom processor • Adapt the data path to a given application • Need for automatic generation of application specific architectures

  4. Introduction (2 of 2) • Previous work in High Level Synthesis • Integer linear programming [Landwehr et al.] • Force driven scheduling [Paulin and Knight] • Finding minimal cliques [Tseng and Seiwiorek] • Branch-and-bound [Marwedel] • Proposed methodology separates the allocation from scheduling and binding

  5. Design Methodology • Define application’s maximum requirements • ALAP schedule • Initial Allocation chooses from Component DB (CDB) • Select as many units as needed for ALAP • Architecture Wizard (AW) analyzes component utilization • Based on the schedule and profiling data • Optimized Architecture • Using the design constraints

  6. Initial Allocation and Component Selection • Define max requirement • Based on the statistics for operators and data transfer • Finding “the best fit” in CDB for given requirements • Storage (RF and Memory) • Min difference in number of ports • Functional units: • The most general unit executing given operation • Buses: • Source buses: • N, if N is even • (N+1), if N is odd • Where N = # RF output ports • Destination buses = #RF in ports

  7. Architecture Wizard - Overview • Goal of Phase II • Reducing number of used resources • Under performance and utilization constraints • Inputs: • Schedule for the Max Configuration • Execution frequencies (Profiler) • Utilization and performance constraints (Designer) • Component Data Base (CDB) • Outputs: • Architecture Net-List • Report

  8. Architecture Wizard: Tool Flow • Histograms for • A functional unit type • Group of in/out ports of a storage unit • For the basic blocks (BB) in the critical path, for each histogram • Vary number of units • Estimate execution and utilization • Allocate data path • when constraints satisfied • Use the same heuristics as for the initial allocation

  9. 1 2 3 Critical Path Extraction • Critical Path: • A sequence of BB from start to end that contributes the most to the execution time • Start with the graph of the application • Create direct acyclic graph • Create dual graph •  edge ex, create a node Ex •  node By, create (input X output) # of edges • Transform to the shortest path problem • Compute weights as 1/wi or Wmax-wi • Find the shortest path

  10. “Spill” - Flattening Algorithm • Utilization profile for each • FU type and in/out port of storage unit • Type and number of instances of other components is unchanged • For chosen number of FUs • Estimate extra cycles (Δ) by postponing operations into empty slots • Maximize component utilization • Utilization = ΣUsed FUs / (choden# * Exec. Time) • Compute global Δ and utilization • Per block estimation • Execution frequencies

  11. Results • Application: bdist2 (MPEG2 encoder), OnesCounter, Sort (bubble sort), dct32 (MP3) • Δ= 20%, Utilization = 75%

  12. Conclusion • Automatic generation of data path • Separate allocation from scheduling and binding • Initial Allocation – creates dense architecture • Architecture Wizard – refines architecture for given constraints • Future work and issues • Reduce area • Reduce complexity of FU • Further reduce interconnect • Features • Pipelining, chaining, forwarding, special function units

  13. Thank You!

More Related