multi core soc for future media processing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Multi-core SOC for Future Media Processing PowerPoint Presentation
Download Presentation
Multi-core SOC for Future Media Processing

Loading in 2 Seconds...

play fullscreen
1 / 17

Multi-core SOC for Future Media Processing - PowerPoint PPT Presentation


  • 148 Views
  • Uploaded on

Multi-core SOC for Future Media Processing. Qin Xing, Yan Xiaolang The Institute of VLSI Design, Zhejiang University. Outline. Opportunities & challenges from media processing Multimedia algorithm characteristics & mapping Multi-core SOC architecture & technology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multi-core SOC for Future Media Processing' - mistico-harold


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multi core soc for future media processing

Multi-core SOC for Future Media Processing

Qin Xing, Yan Xiaolang

The Institute of VLSI Design, Zhejiang University

outline
Outline
  • Opportunities & challenges from media processing
  • Multimedia algorithm characteristics & mapping
  • Multi-core SOC architecture & technology
  • Benchmarking results
  • Project status
  • Future work

The Institute of VLSI Design, Zhejiang Univ.

opportunities
Opportunities
  • Video conference
  • IP-phone
  • Smart terminal
  • PDA
  • Video camera
  • HDTV
  • Set-top box

The Institute of VLSI Design, Zhejiang Univ.

challenges multiple standards
Challenges—multiple standards

1st MPEG-2 Encoder

6

MPEG-2

MPEG-4

2nd Generation Encoder

5

H.26L

H.263

H.264

3rd Generation Encoder

WMV

4

VP3

AVS

4th Generation Encoder

3

Mbit/s

5th Generation Encoder

WMV

2

VP3

AVS

1

H.264 /

MPEG-4 part 10

0

The Institute of VLSI Design, Zhejiang Univ.

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

challenges excellent hardware
Challenges — excellent hardware
  • Very high computation complexity
    • H.264 encoding of 720 x 576 pixels @ 30 frames/s needs up to 30 GOPS
  • Multiple standards co-exist
    • Demands of flexibility & programmability
  • Low power
  • Low cost

Best choice : Application Specific Instruction Processor

The Institute of VLSI Design, Zhejiang Univ.

multimedia algorithm characteristics
Multimedia algorithm characteristics
  • Outer-loop and inner loop
    • Outer loop:

Interface (GUI)

Os (Linux)

Bit-stream parsing

(park/unpack, VLC, CABAC)

Data transferring

    • Inner loop:

Regular algorithms

(Prediction, FIR, DCT,

motion estimation)

The Institute of VLSI Design, Zhejiang Univ.

multimedia algorithm mapping
Multimedia algorithm mapping
  • Programmable and heterogeneous processors are the preferred choice for the implementation
    • General MCU (RISC core) — outer loop
    • Enhanced DSP(EDSP, +bit wise operation) —outer loop
    • Vector processor(VP, VLIW+SIMD) — inner loop

The Institute of VLSI Design, Zhejiang Univ.

multi core soc architecture
Multi-core SOC architecture
  • Top level

Media processing kernel

The Institute of VLSI Design, Zhejiang Univ.

inside the media processing kernel
Inside the media processing kernel

GAG2

GAG1

GAG4

GAG3

GDM

V-DM1

V-DM2

V-DM3

GTM

V-DM4

EDSP-control path

Vector control path

DMA and off chip memories

2D crossbar connection network

E-DP

V-DP1

V-DP2

V-DP3

V-DP4

The Institute of VLSI Design, Zhejiang Univ.

technologies specified instruction set
Technologies— specified instruction set

__asm{ mov edx, mptr

movdqu xmm1, [edx]

packssdw xmm1,xmm1// read m50] from memory to xmm1}

__asm{ movdqu xmm4, [edx +48]

packssdw xmm4,xmm4// read m5[3] from memory}

__asm{ movq xmm5,xmm1

psubw xmm1,xmm3 //m6[1]=(m5[0]-m5[2]);

paddw xmm3,xmm5 //m6[0]=(m5[0]+m5[2]);

movq xmm5, xmm2

psraw xmm2,1

psubw xmm2,xmm4 //m6[2]=(m5[1]>>1)-m5[3]

psraw xmm4,1

paddw xmm4,xmm5 //m6[3]=m5[1]+(m5[3]>>1)}

for (j=0;j<BLOCK_SIZE;j++){

for (i=0;i<BLOCK_SIZE;i++){

m5[i]=img->cof[i0][j0][i][j];

}

m6[0]=(m5[0]+m5[2]);

m6[1]=(m5[0]-m5[2]);

m6[2]=(m5[1]>>1)-m5[3];

m6[3]=m5[1]+(m5[3]>>1);

}

Our IS

6 cycles

Integer IDCT in H.264

Intel MMX:13 cycles

The Institute of VLSI Design, Zhejiang Univ.

technologies instruction mergence
Technologies—instruction mergence

Load/Store

30%

result = 0;

pres_y = dy == 1 ? y_pos : y_pos+1;

pres_y = max(0,min(maxold_y,pres_y));//load

for(x=-2;x<4;x++) //control

{

pres_x = max(0,min(maxold_x,x_pos+x));//load

result += imY[pres_y][pres_x]*COEF[x+2];

// computation, permutation and load

}

result1 = max(0, min(255, (result+16)/32));//computation

Permutation

25%

Computation

35%

Control

10%

Ld/St and Perm. Merged

Computation

6 – tap sub- pixels interpolation

Control

The Institute of VLSI Design, Zhejiang Univ.

Reduce a half of time

benchmarking results for cpu core
Benchmarking results for CPU core
  • CK520

The Institute of VLSI Design, Zhejiang Univ.

simulation results for dsp performance
Simulation results for DSP performance
  • Enhanced DSP
    • CAVLC(context adaptive variable length coding)
    • OGG(new audio standard)

The Institute of VLSI Design, Zhejiang Univ.

simulation results for dsp performance1
Simulation results for DSP performance
  • Vector processor
    • H.264 baseline decoder

The Institute of VLSI Design, Zhejiang Univ.

project status
Project status
  • Finished 2 versions of CPU Core
  • Released DSP instruction set
  • Writing and verifying RTL of the enhanced DSP
  • Benchmarking vector processor
  • Developing software tools

The Institute of VLSI Design, Zhejiang Univ.

future work
Future work
  • Scheduling for task level parallelism(TLP) between heterogeneous processors
  • Simulation/debugging tools for heterogeneous processors
  • Methodologies for design space exploration

The Institute of VLSI Design, Zhejiang Univ.

slide17

Thank you!

The Institute of VLSI Design, Zhejiang Univ.