1 / 49

GenTera’s I M A G I N E 3

GenTera’s I M A G I N E 3. Introducing:. GenTera’s I M A G I N E 3. HANS DE VRIES. GenTera’s I M A G I N E 3. Building Blocks. PCI/AGP Bus interface. Imagine 3 Core Processor Multi-Stream (32) Scalar / Vector Processor 80 Billion operations / second. 128 bit DDR- SDRAM Bus.

rusti
Download Presentation

GenTera’s I M A G I N E 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GenTera’s IMAGINE 3 Introducing: GenTera’s IMAGINE3 HANS DE VRIES

  2. GenTera’s IMAGINE 3 Building Blocks PCI/AGP Bus interface Imagine 3 Core Processor Multi-Stream (32) Scalar / Vector Processor 80 Billion operations / second 128 bit DDR- SDRAM Bus 0.5 Gigabyte/s 4.2 Gigabyte/s Advanced High Quality 3D Graphics / Volume processing Pipelines 220 Billion operations / second Data (Video) Input Data (Video) Output 160 Megabyte/s 1.0 Gigabyte/s Data flow Ring Input Data flow Ring Output Graphics Mask Generator Motion Estimator 100 Billion op/s 2.0 Gigabyte/s 2.0 Gigabyte/s

  3. GenTera’s IMAGINE 3 Core Processor HISC™ processor architecture 120 General Purpose registers (2x32 bit) 256 Vector registers (2x32 bit) 256x4 MAC Vector registers (2x32 bit) 128 Special Purpose control registers. (2x32 bit), 1200 control table registers (2x32 bit) 80 Billion operations per second (320 operations per cycle) 10 Giga Byte per second streaming I/O (memory & processor I/O) including 64 Multiply Accumulates per cycle with saturate. 40 Conditional operations per cycle. 24 internal addresses per cycle 32 simultaneous concatenated vector streams (32 bit) (128 in byte mode) Single cycle 2D and 3D addressing modes. (1D, 2D and 3D memory management) C and C++ compiler, Image Processing Library Assembler, Linker, Debugger 3D graphics Library Visual Simulator Multi Media Library Soft In circuit Emulator Machine Vision Library

  4. GenTera’s IMAGINE 3 HISC Processor Architecture HISC: Hierarchical Instruction Set Computer RISC LEVEL: provides C and C++ compatibility VLIW LEVEL: A moderate length VLIW instruction word plus fully programmable bus interconnect directly controlled by the instruction code. VARIABLE LENGTH VECTOR PROCESSING: Enables up to 32 simultaneous and concatenated Vector Processing Streams. Word based Vector Processing (32, 2x16, 4x8) is symmetrically applied throughout the entire architecture. EXTENDED VECTOR PROCESSING: Numerous function specific Control Register add extended functionality that is activated by the of group extended operations (as opposed to the basic operations) This increases the effective instruction word for vector operations to 1000+ bits

  5. GenTera’s IMAGINE 3 Core Processor Examples of Basic Processor Stream performance (from external memory to external memory) Standard GUI functions: Screen to Screen Copy 2000 Mega pixels/s 8 bit pixels 500 Mega pixels/s 32 bit pixels 3 operand ROPS 1000 Mega pixels/s 8 bit pixels Bitmap to Color expansion 2000 Mega pixels/s 8 bit pixels Windows Direct Draw GUI functions: Pseudo to True Color 500 Mega pixels/s 8 bit pseudo to 16 bit or 32 bit colors True Color to Pseudo 500 Mega pixels/s 32,16 bit color to 8 bit pseudo color Z buffer aware copy 666 Mega pixels/s 8 bit pixels, 16 bit Z buffer 500 Mega pixels/s 16 bit pixels, 16 bit Z buffer Alpha Blended Copy 250 Mega pixels/s 32 bit ARGB pixels

  6. GenTera’s IMAGINE 3 Core Processor Examples of Core Processor stream performance (2) (from external memory to external memory) Multi Media Functions: (numbers in result pixels/s) YUV to RGB conversion 500 Mega pixels/s ( 32 bit color, 16 bit hi-color, 8 bit pseudo) DCT and IDCT (8x8 blocks) 167 Mega pixels/s ( 16 bit values, 32 bit calculations) DCT and IDCT (8x8 blocks) 667 Mega pixels/s ( 8 bit values, 16 bit calculations) Photo shop type Image Processing Functions: (numbers in result pixels/s) 3x3 kernel convolution 2000 Mega pixels/s (8 bit pixels, 16 bit calculations) 7x7 kernel convolution 500 Mega pixels/s (8 bit pixels, 16 bit calculations) Bi-cubic Rotation 1000 Mega pixels/s (8 bit pixels, 16 bit calculations) Bi-cubic Scaling 1000 Mega pixels/s (8 bit pixels, 16 bit calculations) 3D graphics Geometry: (4x4) homogeneous transformations plus perspective divides for X , Y and Z for meshed triangles in 32 bit floating point (IEEE): 50 Million triangles/s

  7. GenTera’s IMAGINE 3 Core Processor Data Read Ports REG A0 VIO 0 A0 REG A1 VIO 1 A1 REG B0 DIO 0 B0 REG B1 DIO 1 B1 Data Write Ports Data Write Ports VIO WR Interconnect (100 % connectivity) REG WR0 DIO WR REG WR1 X0 MACX0 ALU X0 X1 MAC X1 ALU X1 Y0 MACY0 ALU Y0 Y1 MAC Y1 ALU Y1 Data Processing Units

  8. GenTera’s IMAGINE 3 Core Processor Control Register Busses Control reg bus 1 bits [63:32] A1/0 DIO A0/1 I3D1 A1/0 MES1 B1 RING1 B1 REG A1B1 ALU X1 ALU Y1 MAC X1 MAC Y1 VIO 1 B1/0 MSK1 VAU 1 bus interconnect VAU 0 A0/1 I3D0 B0 MES0 B0 RING0 A0B0 REG X0 ALU Y0 ALU X0 MAC Y0 MAC B0/1 VIO 0 MSK0 Control reg bus 0 bits [31:0] SEQ MTAB EMI

  9. GenTera’s IMAGINE 3 Instruction Word Highly orthogonal VLIW instruction word Data Processing Functions 36 24 12 0 63 59 48 Dd Wr0 B0 A0 Y0 X0 ND0= 0 Da Wr1 B1 A1 Y1 X1 127 123 112 100 88 76 64

  10. GenTera’s IMAGINE 3 Interconnect A0 A1 B0 B1 X0 X1 Y0 Y1 A0 A1 B0 B1 X0 X1 Y0 Y1 Select path 1 Select path 2 Data Processing Unit A0 A1 B0 B1 X0 X1 Y0 Y1 Instruction Word provides 8-way Interconnectivity In Scalar-Processing Mode Select path Data Write Port

  11. GenTera’s IMAGINE 3 Interconnect A0REG A0REG A0REG A0MEM A0MEM A0MEM B0REG B0REG B0REG B0MEM B0MEM B0MEM X0A L U X0A L U X0A L U X0MAC X0MAC X0MAC Y0A L U Y0A L U Y0A L U Y0 MAC Y0 MAC Y0 MAC A1 REG A1 REG A1 REG A1MEM A1MEM A1MEM B1REG B1REG B1REG B1 MEM B1 MEM B1 MEM X1 A L U X1 A L U X1 A L U X1 MAC X1 MAC X1 MAC Y1 A L U Y1 A L U Y1 A L U Y1 MAC Y1 MAC Y1 MAC Select path 1 Select path 2 Data Processing Unit Instruction Word provides 100% Interconnectivity In Vector Processing Mode Select path 2 Data Write Port

  12. GenTera’s IMAGINE 3 Instruction Word Data processing instruction fields 24 20 16 12 8 4 0 Y0 X0 1 MAC path 1 path 2 1 MAC path 1 path 2 0 0 ALU path 1 path 2 0 0 ALU path 1 path 2 0 1 Shift, Ufu path 1 path 2 0 1 Shift, Ufu path 1 path 2

  13. GenTera’s IMAGINE 3 Instruction Word Data read ports instruction fields 32 28 24 48 44 40 36 A0 B0 memory port memory port 0 0 0 0 DIO read size 0 0 0 0 VIO function size register port register port 0 0 Be31 0 0 Be20 16 bit imm. [15:8] 16 bit imm. [7:0] size 0 1 register size 0 1 register 1 11 bit signed immediate size 1 0 control register

  14. GenTera’s IMAGINE 3 Instruction Word DIO address / data and (control-) register write ports fields 123 59 56 52 48 63 127 62 58 ND 0 DIO address DIO rd/wr Wr0 DIO address select DIO data select register port Non data- processing function wr addr x wr data path 0 register size rd addr x rd addr path 1 control register size

  15. GenTera’s IMAGINE 3 Parallel Conditional Processing 64 bit Uniform Status Register [63:56] [55:48] [47:40] [39:32] [31:24] [23:16] [15:8] [7:0] X1 Y1 X1 Y1 X1 Y1 X1 Y1 X0 Y0 X0 Y0 X0 Y0 X0 Y0 Status for Byte 7 Status for Byte 6 Status for Byte 5 Status for Byte 4 Status for Byte 3 Status for Byte 2 Status for Byte 1 Status for Byte 0 ALU Status: Overflow, Carry, Minus, Zero (ALU, Shifts, Unary functions) S0 C0 M0 Z0 MAC Status: Wrong, Lower, Higher, Inside W0 L0 H0 I0

  16. GenTera’s IMAGINE 3 Parallel Conditional Processing Status: Generation, Collection and Application X1 ALU MAC 3 Y1 ALU MAC 3 X1 7 7 A1 B1 VEC. REG. 3 V1 MSK VAU 3 Y1 7 2 2 X1 6 6 2 2 Y1 6 1 1 X1 5 5 1 1 Y1 5 0 0 X1 4 4 0 0 Y1 4 X0 ALU MAC 3 Y0 ALU MAC 3 X0 3 3 A0 B0 VEC. REG. 3 V0 MSK VAU 3 Y0 3 2 2 X0 2 2 2 2 Y0 2 1 1 X0 1 1 1 1 Y0 1 0 0 X0 0 0 0 0 Y0 0

  17. GenTera’s IMAGINE 3 Register File ADDRESSES GENERAL PURPOSE REGISTERS, VECTOR REGISTERS DATA PORTS I N T E R N A L B U S M A T R I X 8 x Write Indices Write Port C Input BUS select Write Data 2,4,8 x 256 vector registers 2 x 32 bit wide 4 x 16 bit wide 8 x 8 bit wide up to 24 independent and conditional byte addresses up to 8 independent and conditional byte write enables Write Port C Vector Index generators 8 x Read A Indices Read Port A Vector Index generators Read Port A output BUS register Read A Data 2,4,8 x A1 8 x Read B Indices Read Port B Vector Index generators A0 120 general registers 2 x 32 bit / 4 x16 bit / 8 x 8 bit Read Port B output BUS register 2 x Write Address Read B Data 2,4,8 x General Register Addresses From the Instruction Code B1 2 x Read A Address B0 2 x Read B Address

  18. GenTera’s IMAGINE 3 Function Units MAC Vector Registers 256 words x 64 bit A L U Arithmetic, Boolean, Shift / Rotate, Unary Functions 4 x 8, 2 x 16, 1 x 32 32 bit float MULTIPLIER (un)signed x (un)signed binary point at: end, middle or top graphics formats ( 0.0..1.0 == 00..ff ) 4 x 8, 2 x 16, 1 x 32 32 bit float ACCUMULATOR Variable Range Clamp

  19. GenTera’s IMAGINE 3 Multiplier / Accumulator 8 bit Matrix functions: Quad Inproduct (16 multiplies & 12 adds per MAC) Matrixvec (16 multiplies & 12 adds per MAC) 32 bit input data into a 4 tab shift register (4 times for each byte) 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bit 32 bit input data distributed to all four columns ( 4 times for 4 bytes ) 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bit

  20. GenTera’s IMAGINE 3 Multiplier / Accumulator 8 bit Matrix functions: Open GL Blend Function ( 8 multiplies & 4 adds per MAC) Coefficients fixed or derived from the input operands: 32 bit input data into a 4 tab shift register (4 times for each byte) 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 32 bit input data into a 4 tab shift register (4 times for each byte) 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 8 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bit 0 BLEND_CONSTANT 1 BLEND_ZERO 2 BLEND_ONE 3 SRC_COLOR 4 INV_SRC_COLOR 5 SRC_ALPHA 6 INV_SRC_ALPHA 7 DST_ALPHA 8 INV_DST_ALPHA 9 DST_COLOR 10 INV_DST_COLOR 11 SRC_ALPHA_SATURATE 12 BOTH_SRC_ALPHA (source) BOTH_SRC_ALPHA (dest) 13 BOTH_INV_SRC_ALPHA (source) BOTH_INV_SRC_ALPHA (dest) 14 MAX_INTENSITY (source) MAX_INTENSITY (dest) 15 MIN_INTENSITY (source) MIN_INTENSITY (dest)

  21. GenTera’s IMAGINE 3 Multiplier / Accumulator 16 bit Matrix functions: Convolute (4 multiplies & 2 adds per Multiplier) Transform (4 multiplies & 2 adds per Multiplier) 32 bit input data into a 2 tab shift register (2 times for each 16 word) 16 bit 32 bit 32 bit 16 bit 32 bit 32 bit 16 bit 32 bit 32 bit 16 bit 32 bit 32 bit 16 bit 16 bit 32 bit input data distributed to both columns ( 2 times for each 16 word) 16 bit 32 bit 32 bit 16 bit 32 bit 32 bit 16 bit 32 bit 32 bit 16 bit 32 bit 32 bit 16 bit 16 bit Mix: MH [63:32] =Coef 10[31:0] . Mb [31:16] + Coef 11[31:0] . Ma [31:16] ML [ 31:0 ] =Coef 00[31:0] . Mb [ 15:0 ] + Coef 01[31:0] . Ma [ 15:0 ] Merge: MH [63:32] =Coef 10[31:0] . Ma [31:16] + Coef 11[31:0] . Ma [ 15:0 ] ML [ 31:0 ] =Coef 00[31:0] . Mb [31:16] + Coef 01[31:0] . Mb [ 15:0 ]

  22. GenTera’s IMAGINE 3 Multiplier/Accumulator Single Multiplier/Accumulator handles all with the same hardware! 32 x 32 bit extern 32 x 32 bit intern 64 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate Single Multiplier/Accumulator handles all with the same hardware! 32 x 32 bit extern 32 x 32 bit intern 64 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate Single Multiplier/Accumulator handles all with the same hardware! 32 x 32 bit extern 32 x 32 bit intern 64 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate Each of the 4 Multiplier/Accumulators handles all operations by utilizing the same hardware! 32 x 32 bit extern 32 x 32 bit intern 32 x 32 bit floating point 64 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 16 x 16 bit extern 16 x 32 bit intern 32 bit accumulate 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern Imagine 3 operations per cycle: 64: 8x16 bit: quad in-product (4 comp.) 64: 8x16 bit: 4x4 matrix x vector 32: 8x16 bit: Open GL blending functions 16: 16x16 bit: in-product, cross-product 16: 16x16 bit: complex product 16: 16x32 bit: FIR filter 16: 16x32 bit: in-product, cross-product 16: 16x32 bit: complex product 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern 8 x 8 extern 8 x16 intern

  23. GenTera’s IMAGINE 3 Vector processing Variable length vector processing made simple. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 genad(A0) B0=input A0=rd4x8(ri) X0=mult(A0,B0,nuu) genad(A1) A1=rd4x8(ri) Y0=subsat(X0,A1) B1=rd(RING_Data) X1=mult(Y0,B1,nus) DA=again D0=word4x8(uI) X0=addsat(X1,D0) Y0=matxvec(X0) Y1=inproduct(X0) X1=addsat(Y0,Y1) outputV1 ACTUAL ASSEMBLY CODE FOR THE EXAMPLE ABOVE: repeat, graph (label_1);;; label_1: genad(A0) => B0=input, A0=rd4x8(ri) => X0=mult(A,V,nuu ) ===> genad(A1) =>A1=rd4x8(ri) => Y0=subsat(X0,A1), B1=rd4x8(RING_Data) => X1=mult(Y0,B1,nus) ===> DA=Again ==> D0=word4x8(uI), X0=addsat(X1,D0) => Y0=matxvec(X0), Y1=inproduct(X0) =====> X1=addsat(Y0,Y1) => outputV1;

  24. GenTera’s IMAGINE 3 10 Gigabyte Streaming I/O The Imagine 3 core can stream data from memory or other processors at 10 GByte/sec. (Compared to 0.48 GByte/sec. for the Imagine 1 ) VECTOR UNITS: Simultaneous input and output to and from memory IMAGINE 3 Internal Data Processing Core Dataflow Ring input Dataflow Ring output DATA CACHE or 3D GRAPHICS /VOLUME pipelines INPUT AND OUTPUT

  25. GenTera’s IMAGINE 3 Non-aligned S I M D SIMD processing made simple with non-aligned memory accesses (No complex time-consuming shift-mask-merge operations needed) 8 bit 8 bit 8 bit 8 bit 8 bit 8 bit 8 bit 8 bit 32 bit memory word 32 bit memory word 32 bit memory word 32 bit word

  26. GenTera’s IMAGINE 3 Non Aligned Vector Accesses 2 Input and 2 output vectors simultaneous 32 bit words 2 x 16 bit words 16 bit words 4 x 8 bit words 8 bit words 2 x 8 bit words

  27. GenTera’s IMAGINE 3 Memory Vector Accesses I m a g i n e 3 P r o c e s s o r C o r e E x t e r n a l M e m o r y I n t e r f a c e Vector I/O Vector Access Units: up to 32 vectors in flight data/color input conversion 2D restructuring Vector pipeline 2 kB Vector pre-fetch buffer data/color input conversion 2D restructuring Vector pipeline 2 kB Vector pre-fetch buffer data/color output conversion 2D restructuring Vector pipeline 2.25 kB Vector write buffer data/color output conversion 2D restructuring Vector pipeline 2.25 kB Vector write buffer Mask Unit 256 pixels / voxels Mask Unit 256 pixels / voxels

  28. GenTera’s IMAGINE 3 1, 2and3D memory management 1 M Byte PAGE 1 M Byte PAGE 1 M Byte PAGE X 512 x 1024 16 bit pixel TILE 256 x 1024 32 bit pixel TILE 1024 x 1024 8 bit pixel TILE Y Z X 256 x 128 x 128 8 bit voxel BRICK 128 x 128 x 128 16 bit voxel BRICK 64 x 128 x 128 32 bit voxel BRICK Y

  29. GenTera’s IMAGINE 3 3D texture/volume Hardware Very High Quality 220 Billion operations/sec:2 x 440 operations per cycle (4 ns) Texture Quality: BI linear, TRI Linear and QUAD interpolation. Texture Types: 32 bit ARGB, 16 bit (4 types), 8,4,2 and 1 bit pseudo color 16 bit and 32 bit greyscale (signed and unsigned), 2x16 bit complex Texture Size: 16,384 x 16,384 max (2d)2048 x 2048 x 2048 max (3d) Texture Dimension: 1, 2 and 3 dimensional textures. Texture Clamping: Clamp and Wrap for all 3 co-ordinates. Texture Border: 0 or 1 pixels texture borders, Border Color supported. Texture MIP maps up to 16 levels: selection made for each individual pixel. Perspective division for al 9 parameters: S, T, R, Alpha, Red, Green, Blue, Fog, Z Perspective Correct Texture Mapping, Perspective Correct Texture Lighting, Perspective Correct Linear and Exponential (2 types) Fog, Perspective Correct Depth Buffering,

  30. GenTera’s IMAGINE 3 3D graphics Pipelines Perspect. MIP map processing pipeline Texel Interp./ Lighting control unit 3D graphics pipeline control unit External Memory with MIP Map Textures 4 - 6 stages Memory Access Input Fifo / Port Select Memory Access Re-order buffers Perspective 3D co-ordinate Generator 5 stages Perspective MIP Map Addresses Calculations 2 stages Texel Selection / Expansion Texel Color Look Up Memory Access Data Load unit D BUS Bressenham Edge Start Interpolators(Q,R,S,T,Z-1) (F,A,R,G,B) Pixel Value Interpolators(Q,R,S,T,Z-1) (F,A,R,G,B) Vector Start Interpolators(Q,R,S,T,Z-1) (F,A,R,G,B) Texel Interpolation / Lighting Summation stage Texel Interpolation / Lighting Multiply stage Perspective Interpolation Coefficients Memory Access Internal Delay Line for Interpolation, Lighting & Fog Coefficients 3 - 17 stages Perspective 3D correct Lighting 5 stages Texel Interpolation / Lighting coefficients generator Perspective Lighting & Fog Coefficients

  31. GenTera’s IMAGINE 3 3D texture/volume Hardware 3D graphics Pipeline + Core stream performance (from external memory to external memory) Direct Draw functions: (numbers in result pixels/s) Bilinear Image Scale: 333 Mega pixels/s (32 bit gray scale or 32 bit color pixels ) Bilinear Image Rotate: 333 Mega pixels/s (32 bit gray scale or 32 bit color pixels ) Bilinear Affine Transform: 333 Mega pixels/s (32 bit gray scale or 32 bit color pixels ) MPEG functions: (numbers in result pixels/s) Bilinear Scaling plus kYUV to αRGB 333 Mega pixels/s (32 bit αRGB pixels) 3D functions: (numbers in result pixels/sec) Z-buffered, Perspective Correct, Bilinear Interpolated Texture mapping with perspective correct lighting and exponential fog (Texture size up to 16k x 16k), MIP-Mapping: 300 Mega pixels/sec. (32 bit αRGB pixels, 16 bit hi-color, 8 bit pseudo, 16 bit Z values)

  32. GenTera’s IMAGINE 3 Fan Beam Back projection The 3D Texture/Volume pipelines and the Multiplier / Accumulators in the Imagine 3 can handle eight 16 bit linear interpolated samples per cycle with 32 bit accuracy. Vector Direction Back Projection Direction

  33. GenTera’s IMAGINE 3 Cone beam reconstruction The Back projection in cone beam systems requires the: Inverse perspective mapping from filtered images back to a 3D volume. The Imagine 3 performs this directly with it’s 3D volume pipelines.

  34. GenTera’s IMAGINE 3 De-blur filtering FIR filter performance (16 bit input, 32 bit calculations) 128 Tab: 32 Mega-pixels / second 256 Tab: 16 Mega-pixels / second 512 Tab: 8 Mega-pixels / second Filtered Backprojection for Medical Imaging 324 x 512 to 256 x 256 De-blur filtering 10 ms (256 tabs) Backprojection 11 ms Reconstruction 21 ms 256x256 result image 324 projections 512 values 512 x 512 result image Filtered Backprojection for Medical Imaging 840 x 928 to 512 x 512 De-blur filtering 100 ms (512 tabs) Backprojection 108 ms Reconstruction 208 ms 840 projections 928 values

  35. GenTera’s IMAGINE 3 De-blur filtering (FFT) Complex input Fast Fourier Transform performance (vectorized) 32 bit Floating Point 32 bit Integer 16 bit Integer 256 Point: 8 μs 4 μs 2.0 μs 512 Point: 18 μs 9 μs 4.4 μs 1024 Point: 40 μs 20 μs 10 μs 2048 Point: 88 μs 44 μs 22 μs 4096 Point: 192 μs 96 μs 48 μs 8192 Point: 436 μs 218 μs 109 μs 16384 Point: 896 μs 448 μs 224 μs 1200 projections of 960 values 512 x 512 result image Filtered Back-projection for Medical Imaging 1200 x 960 to 512 x 512 FFT filtering 106 ms (2048 point FP) Back-projection 157 ms Reconstruction 263 ms

  36. GenTera’s IMAGINE 3 Radar Display Processing Cartesian to Polar conversion with bi-linear interpolation 32 bit colors: 250 Mega-pixels /second

  37. GenTera’s IMAGINE 3 Motion Estimators Motion Estimation Unit for MPEG1…MPEG4 video encoding 100 Billion operations / second - software controllable, - arbitrary MxN kernel sizes up to 256 by 256 - arbitrary search space sizes up to 4096 by 4069 for HDTV and higher - allows optimizing algorithms (reduced search space) - forward and backward prediction - vector processing co-operation with core for bi-cubic pixel interpolation / rotation Performance: Compare a 16x16 pixel block with any other 16x16 pixel block (half, quarter, 1/8th, 1/16th pixels with bi-cubic interpolation) 120 Million Block Compares / second

  38. GenTera’s IMAGINE 3 Graphics Mask Generators Generates Transparent and Opaque Masks for 512 pixels multiple units work in parallel: Window Mask Generator Automatically clips pixels outside the View Port (scissoring) Span line Mask Generator for Concave Polygons and arbitrary Objects Range Mask generator for Depth Buffer Tests, Stencil Buffer Tests, Alpha Test, Chroma Keying Tests et cetera Complex Mask Generator for Concave and Complex Polygons according to the odd/even or winding rules Alpha Mask Generator For objects with partially covered pixels

  39. GenTera’s IMAGINE 3 Graphics Mask Generators The Window is defined by the Window registers Window Y min /max Window X min /max Complex mask 3 Complex mask 2 Complex mask 1 Complex mask 0 Range mask 3 Range mask 2 Range mask 1 Range mask 0 Spanline Delta Start Spanline Address Overlap triangle Spanline Length (-1) The Complex Mask is used in this example to hold the Polygon Stipple pattern The Range Mask contains the result of the Depht buffer test (overlapping triangle) Spanline Y min / max Spanline 3 Start/ End Spanline 2 Start/ End Spanline 1 Start/ End Spanline 0 Start/ End The Spanline registers define the outlines of the triangle Spanline Delta End

  40. GenTera’s IMAGINE 3 Multi media I/O units Video Output (Α), R, G, B outputs with 330 MHz dot clock for 1800 x 1400 screen format at 90 Hz. 12 (16) bit video out for Studio Quality video processing. Interface to DVI-TFT transmitters for high resolution, high quality LCD displays. Video Input CCIR 656: 8 bit digital video input for NTSC, PAL, SECAM, HDTV and custom formats Audio Codec 97 Interface Standard from Intel, Creative Labs, Yamaha, Analog Devices and Nat.Semiconductor Supports Analog speakers, Microphone, Headphone + Headphone micro, Telephony and Modem signals, CD analog audio in, Analog Video Sound In, PC beep in, et cetera Digital Audio: 4 stereo serial I/O ports (I2S type and S type emulation capabilities) Supports CD , DVD and Dolby AC3 input or output External Device Control 8 bit classic μP interface bus and I2C type emulation capability MIDI interface (Input and output for synthesizers and keyboards)

  41. GenTera’s IMAGINE 3 Real Time Support MULTI MEDIA REAL TIME SUPPORT Level 1 Events (1 micro second response time requirement) Horizontal Sync interrupts, Video I/O interrupts, Register Virtualization interrupts. Level 2 Events(2 - 100 micro second response time requirement) Communication Fifo interrupts, Mailbox Interrupts, I2S Fifo Interrupts, Ac97 Fifo Interrupts Midi Interrupt, I2C interrupt, Vertical Sync Interrupts, Scheduler Clock Tick, et cetera Threads( 100 micro - 10 millisecond response time requirement) Host Command Queues Manager Audio Stream managers Modem Stream managers User definable threads

  42. GenTera’s IMAGINE 3 High-end Board 8 Processors: 3.2 Tera operations/s 4 GigaByte memory IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3

  43. GenTera’s IMAGINE 3 High-end Board 8 Imagine 3 processors, 3200 Billion operations per second 32 GigaByte per second Memory Bandwidth 16 GigaByte per second Inter-Processor Bandwidth - Perspective Volume Rendering: 1000 x 1000 x 1000 at 15 frames/second (based on 25% volume traversal) - Cone Beam Reconstruction: 512 x 512 x 512 from 10002x128 in 4 seconds - Real Time 3D ultra sound reconstruction and visualization - Real Time HDTV MPEG 4 video encoding - Advanced Radar Processing

  44. GenTera’s IMAGINE 3 High Speed Dataflow Ring Up to 2 Gigabyte per second Dataflow Ring (SSTL-2) Point-to-point with Broadcast options and auto configuration IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3

  45. GenTera’s IMAGINE 3 High Speed System I/O The Dataflow Ring also provides very high speed System I/O. Entry level system can use the programmable Video Data I/O for general purpose I/O. ( 160 MB/s per processor, 1 GB/s per processor ) Video out 1 GB/s Video In 160 MB/s Data- flow input: Up to 2.0 GB/s IMAGINE 3 Optional System I/O FPGA e.g: Xilinx Virtex II IMAGINE 3 IMAGINE 3 IMAGINE 3 Data- Flow Output: Up to 2.0 GB/s IMAGINE 3 IMAGINE 3 IMAGINE 3 IMAGINE 3

  46. GenTera’s IMAGINE 3 Pipeline Processing The Dataflow Ring allows long vector processing pipelines over multiple processors. Here an example with just 2 processors Vector Read from memory Vector Write to memory Vector Read from memory Vector Write to memory 256 entry vector register ALU MAC as FIR filter ALU Dataflow Ring Dataflow Ring Dataflow Ring MAC as 3D blend unit ALU Bi linear Interpolated Data from the Graphics pipeline Bi linear Interpolated Data from the Graphics pipeline

  47. GenTera’s IMAGINE 3 128 bit memory bus (reads) 16 kbyte 1st Level data cache 16 kbyte 1st Level instruction cache 128 bit Dual 3D-graphics pipelines Dual 128 word x 128 bit Vector input fifo’s PCI/AGP Memory Read access Video Output 128 word x 128 bit fifo 4.2 Gigabyte /second Memory Bus: 128 bit PC2100

  48. GenTera’s IMAGINE 3 128 bit memory bus (writes) 4.2 Gigabyte /second Memory Bus. (128 bit PC2100) 16 kbyte 1st level data cache 16 word x 128 bit write buffer PCI/AGP Memory Write access 128 bit Dual 128 word x 128 bit Vector output fifos 8-fold address interleaved memory reads and writes. Out of order accesses with coherency checking

  49. GenTera’s IMAGINE 3 END GenTera’s IMAGINE3 HANS DE VRIES

More Related