1 / 24

SIMD Architectures

SIMD Architectures. Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan. Data Parallel Model. Operations can be performed in parallel on each element of a large regular data structure, such as an array 1 Control Processsor broadcast to many PEs (see Ch. 1, Fig. 1-25, page 45 of [CSG99])

elana
Download Presentation

SIMD Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMD Architectures Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan

  2. Data Parallel Model • Operations can be performed in parallel on each element of a large regular data structure, such as an array • 1 Control Processsor broadcast to many PEs (see Ch. 1, Fig. 1-25, page 45 of [CSG99]) • When computers were large, could amortize the control portion of many replicated PEs • Condition flag per PE so that can skip • Data distributed in each memory • Early 1980s VLSI => SIMD rebirth: 32 1-bit PEs + memory on a chip was the PE • Data parallel programming languages lay out data to processor

  3. Data Parallel Model • Vector processors have similar ISAs, but no data placement restriction • SIMD led to Data Parallel Programming languages • Advancing VLSI led to single chip FPUs and whole fast µProcs (SIMD less attractive) • SIMD programming model led to Single Program Multiple Data (SPMD) model • All processors execute identical program • Data parallel programming languages still useful, do communication all at once: “Bulk Synchronous” phases in which all communicate after a global barrier

  4. SIMD Programming – High-Performance Fortran (HPF) • Single Program Multiple Data (SPMD) • FORALL Construct similar to Fork: FORALL (I=1:N), A(I) = B(I) + C(I), END FORALL • Data Mapping in HPF 1. To reduce interprocessor communication 2. Load balancing among processors http://www.npac.syr.edu/hpfa/ http://www.crpc.rice.edu/HPFF/

  5. How does an SIMD computer work? • A Host computer is necessary to do the I/O operations • The user program is loaded into the control memory • The data is distributed to all the memory modules • The control unit decodes the instn and executes it if it is a scalar instn. If it is a vector instn, it broadcasts the control signals to the PEs to do the executions • Before broadcasting the control signals, the CU broadcasts an enable vector which will enable the PEs

  6. Masking and Data Routing Mechanisms • A,B,C – working registers • Si = status (1 active, 0 inactive) • Ri – Data routing register • Di – holds address • Ii – Index register

  7. Example

  8. Matrix Multiplication

  9. N * N Mesh

  10. The Illiac IV Architecture • Distributed memory architecture • 64 PEs connected as an 8X8 2-D mesh with end around connection • LDB: Local Data Buffer • 64, 64-bit each • PEM: 2K X 64 bits memory

  11. The Illiac IV Network

  12. Maspar MP-1 Architecture • Configuration with 1K-16K PEs are available • Each PE has a 4-bit ALU, 1-bit logic unit, a 64-bit mantissa unit, a 16-bit exponent unit, communication input and output ports • Each PE has 40 32-bit registers available to the programmer • Each processor board has 1024 PEs arranges as 64 PE clusters (PECs) with 16 PEs per cluster • Each PEC is a chip connected to 8 neighbors via an octagonal mesh • Another network, called Multistage Crossbar Network, with three router stages gives a function of 1024X1024 crossbar for routing from any PEC to another PEC

More Related