1 / 20

Intro to the “c6x” VLIW processor

Texas Instruments TMSC6000 series TMSC6700 subseries – include floating point VLIW = V ery L ong I nstruction W ord. Intro to the “c6x” VLIW processor. Operations in Parallel. registers. Function units. Operations in Parallel. registers.

alize
Download Presentation

Intro to the “c6x” VLIW processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Texas Instruments TMSC6000 series TMSC6700 subseries – include floating point VLIW = Very Long Instruction Word Intro to the “c6x” VLIW processor

  2. Operations in Parallel registers Function units

  3. Operations in Parallel registers bypassing Function units

  4. Non-orthogonal registers registers Bypass Function units

  5. Non-orthogonal B A registers registers Bypass Function units L1 S1 M1 D1 L2 S2 M2 D2 *** See TI's picture ***

  6. Specialized Function Units • L units: arithmetic, compare, and logical ops • S units: arithmetic, logical, branches, constant generation • M units: multiplies • D units: address generation / memory accesses

  7. Complicated hardware registers registers

  8. Explicit parallelism registers registers

  9. Simple VLIW encoding • Slots that cannot be utilized are filled with no-ops • Bad for code density, cache utilization, energy, ...

  10. C6X: Packets • One bit of each instruction indicates whether next instruction can be executed in parallel (0 = “EOP”) • Any slot can go to any function unit 0 1 0 1 1 1 1 1

  11. C6X: Packets • One bit of each instruction indicates whether next instruction can be executed in parallel • Any slot can go to any function unit 0 1 0 1 1 1 1 1

  12. C6X: Packets • One bit of each instruction indicates whether next instruction can be executed in parallel • Any slot can go to any function unit 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 • Packet cannot cross an 8-word boundary • Resources constrain which instructions can be combined in the same packet • You can branch into the middle of a packet!

  13. Explicit scheduling Delay slots must be respected – no HW interlocks or scoreboarding Multiply – 1 delay slot Load – 4 delay slots Branch – 5 delay slots B5 := B3 * B2 B5 := B3 * B2 B7 := B5 + B1 B7 := B5 + B1 Right Wrong

  14. Predicated execution Why? To get rid of branches (5 delay slots * 8 wide ....) Basic idea: a comparison result is stored to a condition register ; this register is then used as an operand of other instructions, and its value causes those operations to be selectively enabled or squashed. [Condition registers: A1, A2, B0, B1, B2] Example: If (B3<B4) B3++ else B4++

  15. Predicated execution With branches: With predicates: cmp B3, B4 bge L2 <nop> B3 := B3+1 b DONE <nop> L2: B4 := B4+1 DONE: cmplt B3, B4 B0 [B0] B3 := B3+1 [!B0] B4 := B4+1 ...and the last two canbe issued in parallel! Control dependencyhas been converted to data dependency...

  16. Assembly details .text .align 32 .global proc proc: mvk 4, b3 mvk 5, b4 cmpgt b3, b4, b0 [ b0] mvk.S2 9, b5 || [!b0] mvk.S1 8, a5 stw a5, *-a15[4] .....

  17. Fetch/execute pipeline PG generate program address PS program address send PW program memory access PR fetch reaches CPU boundary DP instruction dispatch DC instruction decode E1 execute 1 E2 execute 2 E3 execute 3 E4 execute 4 E5 execute 5

  18. Addressing Modes C equivalent *R (*R) *+R[ucst5] (R[ucst5]) *-R[ucst5] (R[-ucst5]) *+R[offsetR] (R[offsetR]) *-R[offsetR] (R[-offsetR]) Special case: 15b offsets: *+B15[ucst15] *+B14[ucst15]

  19. Addressing Modes Pre/post increment/decrement *++R , *R++ *++R[ucst5], *R++[ucst5] *--R[ucst5], *R--[ucst5] *++R[offsetR], *R++[offsetR] *--R[offsetR], *R--[offsetR]

  20. Resources http://www.cs.cmu.edu/~tcal/15745/

More Related