1 / 21

Codesign Extended Applications

Codesign Extended Applications. Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine

dionysus
Download Presentation

Codesign Extended Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine This work was supported in part by the National Science Foundation and by NEC C&C Research Labs

  2. Outline • Introduction: Hardware/Software Partitioning • And the common assumption of a single specification • Different Algorithms in Hardware/Software • Codesign Extended Applications • Experiments • Future Work and Conclusions CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  3. Introduction – Hw/Sw Partitioning • Hw/sw partitioning can speedup software • Shown by numerous researchers • E.g., Balboni, Fornaciari, Sciuto CODES’96; Eles, Peng, Kuchchinski, Doboli DAES’97; Gajski, Vahid, Narayan, Gong Prentice-Hall 1997; Grode, Knudsen, Madsen DATE’98; many others • 1.5 to 10x common • Some examples like image processing get 100-800x speedup • E.g., Cameron project, FCCM’02 • Can reduce energy too • E.g. • Henkel, Li CODES’98 • Wan, Ichikawa, Lidsky, Rabaey CICC’98 • Stitt, Grattan, Villarreal, Vahid FCCM’02 • 60-80% energy savings measured on real single-chip uP/FPGA devices CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  4. Hw/Sw Partitioning on Single-Chip Platforms Configurable logic • Numerous single-chip commercial devices with uP and FPGA • Triscend E5 (shown) • Triscend A7 • Atmel FPSLIC • Xilinx Virtex II Pro • Altera Excalibur • More sure to come… • Make hw/sw partitioning even more attractive uP and peripherals Cache/memory CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  5. Hw/Sw Partitioning – Commercial Tools Evolving • Commercial products evolving • Synopsys’ Nimble compiler (2000) attempt • Proceler • Microprocessor Report’s 2001 Technology of the Year Award • Others coming… CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  6. Hw/Sw Partitioning – Single-Spec Assumption • Assumption – Start from a single specification • Typically sw source • Partitioning • Find critical sw kernels, map some to hw • This assumption is made in most research efforts as well as commercial tools Specification Hw/sw partitioner Sw Hw Compilation Synthesis Binaries Netlists CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  7. Digital Camera Example • Developed with intent of exploring hw/sw tradeoffs • Captures images, compresses, uploads to PC • Soon found that a single specification wasn’t reasonable • Two key functions had different hw/sw algorithms • CRC • DCT DCT Huffman encoder DCT Huffman Encoder Controller Controller Communications CCD CCD Pre-Processor CRC CRC Pre - Process calculation CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  8. Digital Camera Example • Results in weak hw design • We would have written CRC and DCT differently had we known they’d be mapped to hw • Yet, we’d keep the original algorithms if they ended up in software Spec: DCT, Huffman, CRC, CCD, Ctrl Hw/sw partitioner Sw: Huff., CCD, Ctrl Hw: CRC, DCT Compilation Synthesis Binaries Netlists Weak CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  9. Different Algorithms in Hw vs. Sw • The single-specification assumption doesn’t always hold • Key observation • Designers often use very different algorithms if a behavior is mapped to hardware versus if that behavior is mapped to software • Widely known by designers • In textbooks • Also known in parallel processing – sequential and parallel algorithms CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  10. Different Algorithms – Sorting Example • Suppose desired behavior fills a buffer, sorts the buffer, and transmits the sorted list Fill() Sort() Transmit() • Sort() in software –QuickSort • Simple and fast in sw • Poor in hw, can’t be parallelized well • Sort() in hardware – Parallel Mergesort • Very fast in hardware • Slow in sw (if sequential) due to overhead • Derive one from the other? Quicksort MS MS MS MS MS MS … CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  11. Different Algorithms – CRC Example • CRC – Cyclic Redundancy Check • Used for error checking during communication, stronger than parity • Mathematically, divides a constant into the data and saves the remainder Main Function … calls crc() with parameters: init_crc-initial value *data-pointer to data len-length of data jinit-initializing options crc() returns: value of CRC for given data crc/data/data/data CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  12. Different Algorithms – CRC in Hardware char crc_hw(…) { unsigned short j , crc_value = init_crc; unsigned short new_crc_value; if (jinit >= 0) crc_value=((uchar) jinit) | (((uchar) jinit) << 8); for (j=1;j<=len;j++) { new_crc_value = bit(4,data[j]) ^ bit(0,data[j]) ^ bit(8,crc_value) ^ bit(12,crc_value); // bit 0 new_crc_value = new_crc_value | (bit(5,data[j])^bit(1,data[j])^bit(9,crc_value)^bit(13,crc_value))<<1; new_crc_value = new_crc_value | (bit(6,data[j])^bit(2,data[j])^bit(10,crc_value)^bit(14,crc_value))<< 2; . … continue for bits 3 through 7 … . } return (new_crc_value); } • Hardware Version • Knowing the generator polynomial, one can calculate the XOR’s for each individual bit • Each CRC value is the result of bit-wise XOR’s with the data and the previous CRC value • Synthesizes to hw very nicely; but getting bits and shifting are inefficient in sw CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  13. Software Version Before doing any calculations, create an initialization table that calculates the CRC for each individual character Use data as index into initialization table and execute two XOR’s Requires lookups, but faster for a sequential calculation Different Algorithms – CRC in Software char crc_sw(…) // Source: Numerical Recipes in C { unsigned short initialize_table(unsigned short crc, unsigned char one_char); static unsigned short icrctb[256]; unsigned short tmp1, j , crc_value = init_crc; if (!init) { init=1; for (j=0;j<=255;j++) { icrctb[j]=initialize_table(j << 8,(uchar)0); } } if (jinit >= 0) crc_value=((uchar) jinit) | (((uchar) jinit) << 8); for (j=1;j<=len;j++) { tmp1 = data[j] ^ HIBYTE(crc_value); crc_value = icrctb[tmp1] ^ LOBYTE(crc_value) << 8; } } return (crc_value); } CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  14. Different Algorithms -- DCT • DCT – Discrete Cosine Transform • Computationally intensive, numerous matrix multiplies • Accounts for perhaps 70% of JPEG encoding time • Dozens of possible algorithms • Best algorithm depends largely on computational resources • Certainly different for sw and hw • Doing multiplications in floating-point vs. fixed-point • Multiplication by a constant can be efficiently mapped to hardware, but accuracy will be lost by not using floating-point CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  15. Codesign Extended Applications (CEAs) main() { … crc(); … } char crc(…) { #ifdef cea_crc_hw crc_hw(…); #else crc_sw(…); #endif } % gcc –Dcea_crc_hw main.c • Basic idea: • Write two versions of certain functions • Only the critical functions, and • Only those with different sw and hw algorithms • Typically only a handful of these • Most time is spent in just a few critical functions • Include both function versions in the specification • But use compiler flags to include either sw or hw version CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  16. CEAs when using C/C++ and VHDL VHDL code if (rst = '1') then crc <= "0000000000000000"; done <= '0'; elsif (clk'event and clk = '1') then if (enable = '1') then if done = '0' then crc <= nextCRC16_D8(input,crc); done <= '1'; end if; else done <= '0'; output <= crc; end if; end if; C code crc_hw(…inputs…) /* Hardware crc... */ for (j=1;j<=len;j++) { TSHORT(to_hw)= data[j]); TBYTE(enable) = 1; TBYTE(enable) = 0; } crc_value=TSHORT(result); return (crc_value) CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  17. CEAs Enable Hw/Sw Partitioning Tool Specification • Traditional hw/sw partitioner • Compiler, estimators, search heuristics, technology files, etc. • Drawback: heavy impact on tool flow • CEAs plus platforms result in simple partitioner • Script uses existing compiler, synthesis, and evaluation (simulation or physical measurement) • Drawbacks: must write two versions of critical functions, script may use simpler search function • Different partitioners for different domains Essentially a compiler, search heuristic, and estimator. Heavy-duty tool. Hw/sw partitioner Sw Hw Compilation Synthesis Binaries Netlists CEA Search heuristic and tool control. Lightweight tool. Script Sw Hw Compilation Synthesis Binaries Netlists Evaluator CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  18. Size (Blocks) Size (Assembly Lines) Delay (clock cycles/character) Clock Cycles Software CRC Algorithm 1061 180,000 Hardware CRC algorithm 19 1 Hardware CRC Algorithm 1298 814,000 Software CRC algorithm 44 3 Experiments Sw and hw CRC algorithms in FPGA. • Compared hw and sw CRC algorithms • Synthesized to FPGA • Compiled to MIPS uP • Demonstrates need for different algorithms Sw and hw CRC algorithms on a microprocessor. CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  19. Partitioning Energy (Joules) on E5 device Multiply Sum Bit-Share SW SW SW 12.4 SW SW HW 8.6 SW HW SW 8.8 HW SW SW 8.0 SW HW HW 4.8 HW SW HW Does not Route HW HW SW Does not Route HW HW HW Does not Route Experiments • Wrote small signal processing example as CEA • Wrote sw and hw versions of core functions • In this case, algorithms were similar • Setup power measurement for two real platforms • XS40 (board with microcontroller chip and Xilinx FPGA chip) • E5 (single chip with microcontroller and FPGA) • Partitioning script automatically partitioned and measured power and cycles (overnight – due to place & route time) • Demonstrates how CEAs enable simple yet practical hw/sw partitioning • Easily migrates to different platforms, different chips CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  20. Issues and Future Work • Issues • What if hw versions not used after partitioning? Wasted effort? • Verification of all possible combinations? • Must use wisely or problem grows unwieldy • Future work • More examples, more platforms • Several versions of the same function • One hardware area-conscious • One hardware speed-conscious • One software code-size-conscious • One software speed-conscious • …more… • Experimenting with communication between hardware and software • DMA transfer, wide-access memories, … CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

  21. Conclusions • Basic hw/sw partitioning assumption of a single specification doesn’t always hold • Codesign Extended Applications help support different algorithms • CEAs enable hw/sw partitioning in existing tool flows • Utilizes existing compilation, synthesis, mapping, evaluation tools, and platforms • Simple yet effective approach to hw/sw partitioning CODES’02 – Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid, Univ. of California, Riverside

More Related