This presentation is the property of its rightful owner.
Sponsored Links
1 / 237

第 9 章 互连网络 和多处理 PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on
  • Presentation posted in: General

第 9 章 互连网络 和多处理. 7.1 互连网络的基本概念 7.2 互连网络的结构. 互连网络 是一种由开关元件按照一定的拓扑结构和控制方式构成的网络,用来实现计算机系统中结点之间的相互连接。 结点: 处理器、存储模块或其他设备。 互连网络在系统中的位置,如图所示。 在拓扑上, 互连网络为输入结点到输出结点之间的一组互连或映象。. 7.1 互连网络的基本概念. 7.1.1 互连网络的功能和特性. 7.1 互连网络的基本概念. 互连网络在系统中的位置. 7.1 互连网络的基本概念. 可以从 4 个不同的方面来描述互连网络

Download Presentation

第 9 章 互连网络 和多处理

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


9

9


9

7.1

7.2


9

7.1

7.1.1


9

7.1


9

7.1

  • 4

      • SIMD


9

7.1


  • 9

    7.1


    9

    7.1

    7.1.2

    xx=01N1

    f(x)

    fxf(x)


    9

    7.1

    • f(x)

      x0 x1 x2 xj-1

      f(x0)=x1f(x1)=x2f(xj-1)=x0

      j

    • k


  • 9

    7.1

    • nlog2N

      N

    • N8n3


    9

    7.1

    N=8


    9

    7.1


    9

    7.1


    9

    7.1

    • N=8

    N=8


    9

    7.1


  • 9

    7.1

        • N8B(x)R(x)


    9

    7.1

    • N8

    N=8


    9

    7.1

    • PM2I

      • PM2I

      • PM2+i(x) x2i mod N

        PM2-i(x) x2i mod N

        0xN10in1nlog2NN

      • PM2I2n


    9

    7.1

    • N86PM2I

      • PM2+00 1 2 3 4 5 6 7

      • PM2-07 6 5 4 3 2 1 0

      • PM2+10 2 4 6 1 3 5 7

      • PM2-16 4 2 07 5 3 1

      • PM220 41 52 63 7


    9

    7.1

    N=8 PM2I


    9

    • ILLIAC

      PM20PM2n/2

    ILLIAC


    9

    7.1

    7.1.3


    9

    7.1

    • b


    9

    7.1

    • Bbw

    • w


  • 9

    7.2

    • N

    7.2.1


    9

    7.2

    • NN-1

    • 1

    • 2

    • N1

    • b=1


    9

    7.2


    9

    7.2

    2.

    • 2

    • N/2

    • N


    9

    7.2


    9

    7.2

      • 15

      • 1


    9

    3.

    • 2

    N=16

    • 7

    • 2


    9

    7.2

    • j-i=2rr=0,1,2,,n-1n=log2Nij

      • 2n1

      • n/2


    9

    7.2

    4.

    • 531

      kN=2k-1

      • 3

      • 2(k-1)

      • N1

      • 2


    9

    7.2


    9

    7.2

    5.


    9

    7.2

    6.

      • 33

      • N=nk k2kk(n-1)

      • nn

        • 4

        • 2n/2


    9

    7.2


    9

    7.2

    7.

    • n

    • nN=2nn

      83

      4

    • nn12n-1

    • nnn


    9

    7.2


    9


    9

    6-1

    n4166425610244096


    9

    • d=2D=N/2N

    • d = 4N>4D = 2 (-1)8 (-1)N = 4d=24

    • d=log2ND=log2N(log2N)2


    9

    6-2

    3r

    • (1)

    • (2)

    • (3)

    • (4)

    • (5)

    = r3

    = 3(r-1)

    = 3(r-1)r2

    = r2

    = 6


    9

    6-3

    Cube8zxy3


    9

    7.2

    7.2.2


    9

    7.2


    9

    7.2

    2.

    • MIMDSIMDMINMultistage Interconnection Network

      • ab

      • ab

        • ab

        • abab2ab2kk1


    9

    7.2


    9

    7.2


    9

    7.2

    • 22

      224


    9

    7.2

        • ii10in1n


    9

    7.2

      • Omega

    88Omega


    9

    7.2

    • NOmegalog2NN/222Nlog2N/2


    9

    7.2

      • 22

      • C0C1C2

        ABCDC0

        EFGHC1

        IJKLC2


    9

    7.2


    9

    7.2

    • 33

      n


    9

    7.2

    3.

    • nnn!

    • C.mmp


    9

    7.2


    9

    crossbar

    • N

    • 1


    9


    9

    • (IQ)

    • (OQ)

    • (VOQ)

    • (CICQ)

    • (CIOQ)


    9

    VOQ


    9

    (CICQ)


    9

    P1

    P2

    P16

    M2

    M1

    M16

    6-16

    2014/8/31

    64


    9

    7.2


    9

    2

    5

    3

    1

    4

    7.3.1 ()

    1

    6-36

    6-36

    2014/8/31

    66


    9

    64512

    25681~2256

    2014/8/31

    67


    9

    2

    3-37

    2014/8/31

    68


    9

    1

    TCS

    2

    D

    3

    4

    6-37

    T = ( Lt / B )D + L / BLtBDL

    2014/8/31

    69


    9

    (packet switch)

    MPP


    9

    1

    TSF

    D

    2

    3

    4

    VLSI

    T = ( L / B )D + L / B = ( D + 1 )L / B

    6-38

    2014/8/31

    71


    9

    (virtual cut through)

    T = ( Lh / B )D + L / B = ( LhD + L ) / B,LhL>>LhDTL/B

    2014/8/31

    72


    9

    VLSI


    9

    R/A

    R/A

    R/A

    N4

    N0

    N2

    N3

    Flit

    2014/8/31

    74


    9

    • (R/A)(D)(a)R/A

    • (S)(b)R/Ai

    • D(c)R/A

    • i D (d)i+1


    9

    """"


    9

    A

    BAB""""


    9

    T = TfD + L / B = ( Lf / B )D + L / B = ( LfD + L ) / B

    LfTfLfDLT = L / B

    2014/8/31

    80


    9

    • VLSI

    • ""


    9

    IBM SP2/


    9

    3.3


    9

    4.6.1

    Skew/(Ready/Acknowledge)


    9

    4.6.2

    4.20(a),nnn-VLSI4.20(b)n3n4.20(c)


    9

    4.20


    9

    4.6.3

    (Fabric)SRAMDRAMVLSI


    9

    1.

    FIFO4.21,FIFO


    9


    9

    4.21


    9

    Head-Of-Line


    9

    2.

    3.42FIFO100%


    9

    4.22FIFOn


    9

    4.22


    9

    3.

    Shared Poolnn2n2n2nSRAM


    9

    4.

    3.34Expected Coverage of Outputs


    9

    4.6.4

    n4.23Grant(Assert)


    9


    9

    4.7

    IBM SP-1SP-2IBM SP-1SP-2840MB/s

    1SPRack84-2-161616


    9

    1 SP


    9

    25523-40-MHz1082/


    9

    31FIFO167FIFO1288-Chunk2RAM25FIFO


    9

    2 IBM SP(vulcan)


    9

    8888FIFORAM8--8-DeserializerSerializer


    9

    LRUFIFOLRU

    SP64CRCCRCCRCCRC


    9

    H3C 12500


    9

    SIMD

    SIMD


    9

    SIMD

    SIMDSIMD


    9

    CU

    PE

    M

    IN

    N

    M

    7.2 SIMD

    --SIMD

    1.

    PE/PEM

    PE

    PE


    9

    /

    /

    CU

    CU

    PE0

    PE

    PE

    PEn-1

    P0

    Pn-1

    M0

    Mn-1

    IN

    2.

    /

    PE


    9

    CU

    PE0

    PE1

    PEn-1

    IN

    M0

    M1

    Mm-1

    PEINPE

    PEM

    3.

    C=<NFIM>

    PE

    /


    9

    4.

    PE

    • PE

    • PE

    • INPE

    • +

    • PE+CU+


    9

    • /

    • /


    9

    I(i,j+1)

    I(i+1,j+1)

    I(i-1,j+1)

    I

    Ii,j

    Si,j

    I(i-1,j)

    I(i+1,j)

    512*512

    512*512

    I(i,j)

    I(i-1,j-1)

    I(i+1,j-1)

    I(i,j-1)

    5.

    0255

    S

    =Iij8


    9

    512

    PE0 PE1 PE31

    PE32

    PE992 PE1023

    512

    16

    PEJ

    16

    32*32

    16*16=256

    512*512=262144

    262144/256=1024


    9

    1

    1

    16

    PEJ

    16

    16

    16

    1

    1

    • PE

    4*16+1*4=68

    PE1

    262144/256+68=809


    9

    PE 1 2 3

    0 A0 A0+A1 A0+A1+A2+A3 A0+A1++A7

    1 A1

    2 A2 A2+A3

    3 A3

    4 A4 A4+A5 A4+A5+A6+A7

    5 A5

    6 A6 A6+A7

    7 A7

    N

    1

    N-1

    2


    9

    PE0

    PEM0

    PE63

    PEM63

    PE

    APPA

    I/O

    B6500

    PE

    CU

    4.SIMD

    ILLIAC-IV

    19651972

    SIMD

    2K64

    64PE


    9

    i-8

    PE0

    PE0

    PE0

    i-1

    i

    i+1

    PE0

    PE0

    PE0

    i+8

    PE0

    PE0

    PE0

    • PE


    9

    PE

    PE

    BSP

    1979

    SIMD

    +


    9

    17MM

    NW2

    NW1

    CU

    16PE

    5

    512KW

    PEMM

    MM


    9

    • SIMD

    • PECU

    • PE

    • PE


    9

    1.

    =2


    9

    0 1 2 3 4

    a0

    a1

    a2

    a3

    a4

    a5

    a6

    a7

    a8

    a9

    a10

    a11

    a12

    a13

    a14

    =5

    =5

    5


    9

    0 1 2 3

    a00

    a01

    a02

    a03

    a10

    a11

    a12

    a13

    4*4

    a20

    a21

    a22

    a23

    a30

    a31

    a32

    a33

    2.

    1

    =

    =4


    9

    0 1 2 3

    a00

    a01

    a02

    a03

    a13

    a10

    a11

    a12

    4*4

    a22

    a23

    a20

    a21

    a31

    a32

    a33

    a30

    =4

    2

    --/

    =


    9

    M>N

    d1d2=m

    m=22P+1P>0 d1=22P

    d2=1

    BSPm=17

    n=16m>nP=2 m=22P+1 d1= 4

    d1=22Pd2=1


    9

    j=a mod m

    0 1 2 3 4 5 6

    a00

    a10

    a20

    a30

    a01

    a11

    *

    0123

    i= a/n

    4*5

    a31

    a02

    a12

    a22

    a32

    *

    a21

    a23

    a33

    a04

    a14

    *

    a03

    a13

    *

    a24

    a34

    *

    m=7n=6

    7


    9

    []

    S=(i=1-8)ai*bi

    ()

    (1)PESISD;

    (2)SISD;

    (3)8SIMD;

    (4)8MIMD

    24SIMDMIMD()1SISDSIMDPEPEPEMIMDPEPE


    9

    []

    (1)PESISD46(87


    9

    (2)SISD15


    9

    (3)8SIMD

    13


    9

    (4)8MIMD13


    9


    9

    8.1

    8.2

    8.3

    8.4

    8.5

    8.6


    9

    8.1

    1.

    2.

      • 3.(128)


    9

    8.1

    • Flynn

      SISDSIMDMISDMIMD

    • MIMD

      • MIMD

      • MIMD

        clusterMIMD

    8.1.1


    9

    8.1

    • MIMD

        • Cache

        • SMP

          Symmetric shared-memory MultiProcessor

        • UMAUniform Memory Access


    9

    8.1


    9

    8.1

        • IO


    9

    8.1


    9

    8.1

      • 28


    9

    8.1

    8.1.2

        • DSM: Distributed Shared-Memory)

          NUMA NUMA: Non-Uniform Memory Access)


    9

    8.1

      • -

    • loadstore


  • 9

    8.1

      • (RPCRemote Process Call)


    9

    8.1

  • 3


  • 9

    8.1


  • 9

    8.1

    • Cache


  • 9

    8.1


    9

    8.1

    8.1.3

    =


    9

    8.1

    • 8.110080

      Amdahl

    0.9975


    9

    8.1

    2.

    • 1001000


    9

    8.1


    9

    8.1

    8.232400 ns1 GHzIPC2Cache0.2%?


    9

    8.1

    CPI 1/IPC=1/2=0.5

    0.2%CPI

    CPICPI

    0.50.2%

    /400 ns/1 ns400

    CPI0.50.2%4001.3

    0.2%1.3/0.5=2.6


    9

    8.1


  • 9

    8.1


    9

    • 8.1

    • Cache

    • Cache

      Cache

    8.2


    9

    8.2

    • Cache

      • IO

        CacheIO/

      • Cache

    8.2.1 Cache


    9

    8.2

    ABCache


    9

    8.2

        • What:

        • When:

        • PXXXP


    9

    8.2

    • PXQXQP

    • ( )


  • 9

    8.2

    Cache

    • Cache

    8.2.2


    9

    8.2

    • Cache

        • directory

        • snooping

          • Cache


    9

    8.2

    • CacheCache

    • ()


  • 9

    8.2

    Cache


    9

    8.2

    • Cache

      Cache


    9

    8.2

      • Cache

        Cache

      • AB


    9

    8.2

    8.2.3

      • Cache

      • CacheCache


    9

    8.2

    • Cache

      • Cachetag

      • valid

      • Cache

        • shared clean

        • exclusive dirty

          CacheCache


    9

    8.2

    • Cache

    • CacheCPUCache

      • Cache


    9

    8.3

    • CacheCacheCache

    • (()Cache)

      Cache?


    9

    8.3

    Cache

    8.3.1 Cache

      • Cache


    9

    8.3


    9

    8.3


    9

    8.3

      • clean

    • Cache

      • Cache


    9

    8.3

    • Cache


  • 9

    8.3

    CacheCache


    9

    P

    CPU

    Cache

    Cache

    C

    A

    K

    B

    (Home)


    9

    8.3

      • Cache

      • /


    9

    8.3

      • Cache


    9

    8.3

    8.3.2


    9

    8.3


    9

    8.3


    9

    8.3


    9

    8.3

      • Cache


    9

    8.3

    • Cache

    • Cache


    9

    8.4

    8.4.1


    9

    8.4

    • atomic exchange

        • 0

        • 1

    • 101


    9

    8.4

    • test_and_set

    • 1fetch_and_increment

      • LL(load linkedload locked)

      • SC(store conditional)


    9

    8.4

      • LLSCSC

      • SC

      • SC

        • 1

        • 0

      • LL


    9

    8.4

    R1

    tryORR3, R4, R0 // R4R3

    LLR2, 0R1

    // 0R1R2

    SCR3, 0R1// 0R1R3

    // R310

    BEQZR3, try// R30

    MOVR4, R2// R4

    R4R1LLSCSC0R3


    9

    8.4

    • LL/SC

      fetch_and_increment

      tryLL R2, 0R1

      DADDIUR2, R2, #1

      SC R2, 0R1

      BEQZR2, try

    • LL


    9

    8.4

    8.4.2


    9

    8.4

    • Cache

      0

      R1

      DADDIUR2, R0, #1

      lockit EXCH R2, 0R1

      BNEZ R2, lockit


    9

    8.4

    • Cache

      • Cache

        • Cache


    9

    8.4

      • Cache

      • lockit LDR2, 0R1

        BNEZR2, lockit

        DADDIUR2, R0, #1

        EXCHR2, 0R1

        BNEZR2, lockit

      • 3


    9

    3


    9

    8.4

    • LLSC

      LL

      lockitLLR2, 0R1

      BNEZR2, lockit

      DADDIUR2, R0, #1

      SCR2, 0R1

      BEQZR2, lockit


    9

    8.4

    8.4.3


    9

    8.4

    8.310100Cache

    1010

    210


    9

    8.4

    i

    • iLL

    • iSC

    • 1

      i2i+1

      n

      1012012000


    9

    8.4


  • 9

    8.4

      • lockunlock

      • count

      • total


    9

    8.4

    lockcounterlock //

    ifcount==0release=0 //release

    count=count+1//1

    unlockcounterlock//

    ifcount==total{//

    count=0//

    release=1//

    else {//

    spinrelease=1//


    9

    8.4

    • counterlock

    • release

    • spinrelease=1

  • ()


  • 9

    8.4

      • sense_reversinglocal_sense1

      • sense_reversing

        10Cache20410020400


    9

    8.4

    local_sense=! local_sense//local-sense

    lockcounterlock//

    count++ //1

    unlockcounterlock //

    ifcount==total{//

    count=0//

    release=local_sense//

    else {//

    spinrelease==local_sense//


    9

    8.4


    9

    8.5

    • Thread Level ParallelismTLP


    9

    8.5

    • fine-grained

      • CPU


    9

    8.5

    • coarse-grained

      • Cache

      • CPU


    9

    8.5

    8.5.1

      • Simultaneous MultiThreadingSMT

  • SMT


  • 9

    8.5

    • 4


    9

    8.5

    4


    9

    8.5

      • ILP


    9

    8.5

    • :


    9

    8.5

    8.5.2


    9

    8.5


    9

    8.5

    • Cache


    9

    8.5

      • CacheTLB


    9

    8.5


    9

    8.5.3

    8/


    9

    SMT


    9

    8.5

      • CacheCache

      • 8


    9

    • Origin 20001128

    • Origin 2000MIPS R10000UNIX64IRIX

    • OriginNUMA

    8.6


    9

    Origin


    9

    8.6

    12MIPS R10000L2 CacheHubHubIO

    • Origin: 4GB

    • Hub4

    • 780Mbps

    • IO:2780Mbs(1.56Gbps)

    • Origin6OriginASIC


    9

    • Origin1128

    4


    9

    • 816

    16


    9

    8.6

    • 128

      128Origin 20004

      • SMPSMP

      • Origin


    9

    128


    9

    8.6

    • OriginCPU

        • CPU195MHz

        • Cache

        • CPU

        • CPU


    9

    8.6

    OriginCPU


    9

    8.6

    • Origin

      Hub

      1.56 Gbps2780 Mbps


    9

    8.6

    • Origin

      L1 CacheL2 Cache

      • L1 CacheR10000

      • L1 CacheCacheCache

        /

      • L2 CacheSRAM


    9

    8.6

    • Cache

      • CacheCache


  • Login