A Graph Based Algorithm for Data Path Optimization in Custom Processors

A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems University of California, Irvine

Outline • Introduction • Design Methodology • Initial Allocation • Architecture Wizard • Critical Path Extraction • Spill Algorithm • Results • Conclusion

Introduction (1of 2) • Complexity of SoC rising • Short time to market • Need for processors specialized for different application domains • General purpose processors • Often slow and power hungry • Full HW design • Expensive and rigid for debugging and feature extension • Custom processor • Adapt the data path to a given application • Need for automatic generation of application specific architectures

Introduction (2 of 2) • Previous work in High Level Synthesis • Integer linear programming [Landwehr et al.] • Force driven scheduling [Paulin and Knight] • Finding minimal cliques [Tseng and Seiwiorek] • Branch-and-bound [Marwedel] • Proposed methodology separates the allocation from scheduling and binding

Design Methodology • Define application’s maximum requirements • ALAP schedule • Initial Allocation chooses from Component DB (CDB) • Select as many units as needed for ALAP • Architecture Wizard (AW) analyzes component utilization • Based on the schedule and profiling data • Optimized Architecture • Using the design constraints

Initial Allocation and Component Selection • Define max requirement • Based on the statistics for operators and data transfer • Finding “the best fit” in CDB for given requirements • Storage (RF and Memory) • Min difference in number of ports • Functional units: • The most general unit executing given operation • Buses: • Source buses: • N, if N is even • (N+1), if N is odd • Where N = # RF output ports • Destination buses = #RF in ports

Architecture Wizard - Overview • Goal of Phase II • Reducing number of used resources • Under performance and utilization constraints • Inputs: • Schedule for the Max Configuration • Execution frequencies (Profiler) • Utilization and performance constraints (Designer) • Component Data Base (CDB) • Outputs: • Architecture Net-List • Report

Architecture Wizard: Tool Flow • Histograms for • A functional unit type • Group of in/out ports of a storage unit • For the basic blocks (BB) in the critical path, for each histogram • Vary number of units • Estimate execution and utilization • Allocate data path • when constraints satisfied • Use the same heuristics as for the initial allocation

1 2 3 Critical Path Extraction • Critical Path: • A sequence of BB from start to end that contributes the most to the execution time • Start with the graph of the application • Create direct acyclic graph • Create dual graph •  edge ex, create a node Ex •  node By, create (input X output) # of edges • Transform to the shortest path problem • Compute weights as 1/wi or Wmax-wi • Find the shortest path

“Spill” - Flattening Algorithm • Utilization profile for each • FU type and in/out port of storage unit • Type and number of instances of other components is unchanged • For chosen number of FUs • Estimate extra cycles (Δ) by postponing operations into empty slots • Maximize component utilization • Utilization = ΣUsed FUs / (choden# * Exec. Time) • Compute global Δ and utilization • Per block estimation • Execution frequencies

Results • Application: bdist2 (MPEG2 encoder), OnesCounter, Sort (bubble sort), dct32 (MP3) • Δ= 20%, Utilization = 75%

Conclusion • Automatic generation of data path • Separate allocation from scheduling and binding • Initial Allocation – creates dense architecture • Architecture Wizard – refines architecture for given constraints • Future work and issues • Reduce area • Reduce complexity of FU • Further reduce interconnect • Features • Pipelining, chaining, forwarding, special function units

Thank You!

A Graph Based Algorithm for Data Path Optimization in Custom Processors

A Graph Based Algorithm for Data Path Optimization in Custom Processors

Presentation Transcript

A Practical Quicksort Algorithm for Graphics Processors

A Shortest Path Algorithm

Graph-Based Data Mining

A Simple Genetic Algorithm for Function Optimization

A Shortest-Path-Based Topology Control Algorithm in Wireless Multihop Networks

A Distributed Control Path Architecture for VLIW Processors

An Agent-Based Algorithm for Generalized Graph Colorings

A shortest path based topology control algorithm in Wireless Multihop Networks.

Ant Based Optimization for Multiway Graph Partition

Cost-based Optimization of Graph Queries

A Tensor-Based Algorithm for High-Order Graph Matching

A Region Based Stereo Matching Algorithm Using Cooperative Optimization

Graph Algorithm

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch

A multiagent algorithm for graph partitioning

A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

Data Structures and Algorithm Analysis Graph Algorithms

A Tensor-Based Algorithm for High-Order Graph Matching

Graph Algorithm

Graph Algorithm