1 / 16

Codesigned On-Chip Logic Minimization

Codesigned On-Chip Logic Minimization. Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine

harken
Download Presentation

Codesigned On-Chip Logic Minimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and a Department of Education GAANN fellowship

  2. 1 Initialize Minimizer 2 Execute Minimizer 3 Indicate Completion Introduction(On-chip Logic Minimization) MEM Proc. I$ D$ ARM7 DMA MEM System-On-Chip On-chip Minimizer

  3. 138.23.16.9 138.23.16.x Port 7 138.23.x.x Port 5 125.x.x.x Port 3 On-Chip Minimization Applications (IP Routing Table Reduction) • IP routing table reduction • Routing tables of large network routers have over 30,000 entries • Fast IP routing lookup is difficult without using large hardware resources • Ternary CAM (McAuley & Francis, 1993) • TCAM can be used to perform routing table lookup in single cycle • Requires large resources and large power consumption • Mask Extension (Liu, 2002) • Uses two-level logic minimization to reduce the size of the routing table • Good results but did not considering off-chip communication Incoming IP packet Destination IP 138.23.16.9 Prefix Next hop Lookup IP in Routing Table Longest Prefix Match Port 7

  4. Type Protocol In IP In Port Out IP Out Port Action On-Chip Minimization Applications (Access Control List Reduction) • Access Control List (ACL) • Used to restrict IP traffic through network routers • ACL size can range anywhere from from 300 (UCR CS&E Dept.) to 10,000 (AOL) • Common use is to block a particular protocol or port number to avoid attacks such as Denial of Service attacks • ACL Minimization • Similar approach as used for IP routing table reduction • However, order of the list must be preserved ACL Input Format

  5. On-Chip Minimization Applications (Dynamic Hardware/Software Partitioning) • Dynamic hardware/software partitioning (JIT compilation for FPGAs) • Dynamically detects frequently executed loop and re-implements the software loops using on-chip configurable logic • Requires logic synthesis tools to embedded on-chip Profiler MIPS/ARM I$ Warp Processor Warp Processor Warp Processor D$ Dynamic Partitioning Module Configurable Logic Warp Processor Warp Processor Warp Processor

  6. ROCM • On-chip Logic Minimization Requirements • Limited data and instruction memory available • Quality of results must still be close to optimal • Execution time should remain reasonable • On-chip Logic Minimization Goal • Focus on developing an on-chip logic minimization tool that produces acceptable results with reasonable increases in execution time while using limited memory resources • ROCM – Riverside On-Chip Minimizer • Two-level minimization tool • Utilized a combination of approaches from Espresso-II (Brayton, et al. 1984) and Presto (Svoboda & White, 1979) • Eliminate the need to computer the off-set to reduce memory usage • Utilizes a single expand phase instead of multiple iterations • On average only 2% larger than optimal solution

  7. ROCM executing on 40MHz ARM7 requires less than 1 second • Small code size of only 22 kilobytes • Average data memory usage of only 1 megabyte ROCM Results(Performance/Memory Usage) 40 MHz ARM 7 (Triscend A7) 500 MHz Sun Ultra60

  8. Codesign ROCM(Hardware Coprocessor) • Customized ROCM enables us to develop an efficient hardware coprocessor • Profiled the execution of ROCM-32 and ROCM-128 using ARM port of the SimpleScalar simulator • Determine critical loops/functions that are suitable for implementation in hardware • Identified six critical kernels that comprised 91% of the total execution time but only 2% of the code size

  9. data addr Proc/Mem Interface Tautology.1 IsCov SetLit Min. Coproc. Min. Coproc. Cofactor.1 DoesInter GetLit Minimization Coprocessor Codesign ROCM(Minimization Coprocessor) ARM7 MEM On-Chip Minimizer

  10. aImpl dImpl numLits 64 64 5 << 1 << 32 (odd) 32 (even) Does Intersect DoesInter == 0 DoesIntersect retVal Codesign ROCM(Minimization Coprocessor) data addr Proc/Mem Interface Tautology.1 IsCov SetLit Cofactor.1 GetLit Minimization Coprocessor

  11. Codesign ROCM Results(Execution Time) • Average speedup of 7.8

  12. Codesign ROCM Results(Energy Consumption) • Average energy reduction of 59.2%

  13. Codesign ROCM(Minimization Coprocessor) • Software modifications were required to achieve speedup of 7.8 • Data structures/algorithms not suitable for hardware implementation • Reorganized data structures • Customized width of data items • Eliminate memory allocation within critical regions • Not automated with current hardware/software partitioning tools

  14. AddImplicant(cofactor, &coImplicant); 28.5% of total exec. time Only 3.5% of total exec. time Requires dynamic memory allocation Codesign ROCM(Minimization Coprocessor) for(i=0; i<F->numImplicants; i++) { if( !DoesIntersect(implicant, xj) ) continue; for(k=0; k<xj->numLiterals; k++) { // determine coImplicant ... } AddImplicant(cofactor, &coImplicant); } Move to HW Original C Code

  15. Codesign ROCM(Minimization Coprocessor) // determine size of cofactor initially cofactorSize = 0; for(i=0; i<F->numImplicants; i++) { if( !DoesIntersect(implicant, xj) ) continue; cofactorSize++; } // allocate all memory outside of main loop cofactor->implicants = malloc(…); for(i=0; i<F->numImplicants; i++) { if( !DoesIntersect(implicant, xj) ) continue; for(k=0; k<xj->numLiterals; k++) { // additional initialization code need for each iterations coImplicant = &(cofactor->implicants[index++]); ... } } // determine size of cofactor initially // allocate all memory outside of main loop // additional initialization code need for each iterations Modified C Code

  16. Conclusions & Future Work • Developed codesigned on-chip logic minimization • Performance improvement of nearly 8X compared to earlier software only implementation • Energy reduction of almost 60% • New directions in hardware/software partitioning • Designer effort was required to rewrite algorithms and fine tune data structures • Could better hardware/software partitioning tools automate this?

More Related