1 / 10

The Philips TriMedia

The Philips TriMedia. A VLIW Architecture. By Jurjen Westra. TM-1 Block Diagram. SDRAM. Main Memory Interface. Image Coprocessor. Video In. VLD Coprocessor. Au d io In. Video Out. Audio Out. Timers. I2C Interface. Sync Serial Interface. VLIW CPU. 32K I$. 16K D$. PCI interface.

santo
Download Presentation

The Philips TriMedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Philips TriMedia A VLIW Architecture By Jurjen Westra

  2. TM-1 Block Diagram SDRAM Main Memory Interface Image Coprocessor Video In VLD Coprocessor Audio In Video Out Audio Out Timers I2C Interface Sync Serial Interface VLIW CPU 32K I$ 16K D$ PCI interface TM has 128 general purpose 32 bit Registers

  3. VLIW means relying on compiler techniques Only Cache-misses are run-time handled Compiler • Scheduling / Instruction Level Parallelism • Operation guarding • Speculation • Profiling for recompiling • Grafting (loop unrolling) • Alias analysis

  4. Traditional Scheduling VLIW Scheduling A B C D B C A D A B C D C B A D

  5. Instruction Cache Issue Slot 1 Issue Slot 2 Issue Slot 3 Issue Slot 4 Issue Slot 5 Execution Unit 1 Execution Unit 2 Execution Unit 27 But not all Issue Slots have access to all (types of) Execution Units!

  6. Issue slot latency 1 2 3 4 5 CONST x x x x x ALU x x x x x SHIFTER x x FALU 3 x x DSPALU 2 x x DSPMUL 3 x x BRANCH 3 x x x IFMUL 3 x x FCOMP x DMEM 3 x x DMEMSPEC 3 x FTOUGH 17/16 x

  7. Guarding C-code If(R2>R3) R4=R4+R5; Else R4=R4+R6; Assembly igtr R7 R2 R3 add R4 R4 R6 … … IF R7 add R4 R4 R5 … … … ...

  8. Characteristics (1) • Custom Ops => loss of VLIW-character • Big or Little Endian • R0 and R1 have values 0 and 1 respectively • Geen Integer-Status-Flags but case-specific bit-patterns • 32 Interrupt-vectors • Interrupts are delayed

  9. Characteristics (2) • 11 cycle read-miss-penalty • 3 cycle write-miss-penalty • Functional units require 1 cycle recovery time • Byte-addressable; 8-, 16- and 32-bit Loads and Stores • Register File supports up to 5 Writes per cycle (Latency) • Register File supports up to 15 Reads per cycle • Paging (64 bytes) • Instruction Length: 2-23 bytes; compressed

  10. Example: MPEG-2 decoder • DVD-batman bitstream (4-9 Mbits/s) • 7 % Instruction-cache misses • 27% Data-cache misses • CPI (clock cycles/VLIW instruction): 1.37 • Total performance: 2,9 ops/clock

More Related