270 likes | 305 Views
Explore the development of early models like associative processing and parallel processors, comparing SIMD vs MIMD designs, global vs distributed memory architectures, PRAM shared-memory model in the context of parallel processing. Learn about various design choices and architectures.
E N D
Parallel processors come in many different varieties. • Thus, we often deal with abstract models of real machines. Part I
Development of Early Models (1) • Associative processing (AP) was perhaps the earliest form of parallel processing. • Associative or content-addressable memories (AMs, CAMs), which allow memory cells to be accessed based on contents rather than their physical locations within the memory array. • AMI AP architectures are essentially based on incorporating simple processing logic into the memory array so as to remove the need for transferring large volumes of data through the limited-bandwidth interface between the memory and the processor (the von Neumann bottleneck) Part I
Development of Early Models (2) • the AM/AP model has evolved through the incorporation of additional capabilities, so that it is in essence converging with SIMD-type array processors. Part I
Development of Early Models (3) • neural networks • Cellular automata Part I
SIMD Vs. MIMD (1) • Most early parallel machines had SIMD designs. • Within the SIMD category, two fundamental design choices exist: • Synchronous versus asynchronous SIMD • A possible cure is to use the asynchronous version of SIMD, known as SPMD • Custom- versus commodity-chip SIMD Part I
SIMD Vs. MIMD (2) • In the 1990s, the MIMD paradigm has become more popular recently. • MIMD machines are most effective for medium- to coarse-grain parallel applications, where the computation is divided into relatively large subcomputations or tasks whose executions are assigned to the various processors. Part I
SIMD Vs. MIMD (3) • Within the MIMD class, three fundamental issues or design choices are subjects of ongoing debates in the research community. • MPP-massively or moderately parallel processor • Is it more cost-effective to build a parallel processor out of a relatively small number of powerful processors or a massive number of very simple processors • Tightly versus loosely coupled MIMD • network of workstations (NOW), cluster computing, Grid Computing • Explicit message passing versus virtual shared memory Part I
Global Vs. Distributed Memory (1) • Within the MIMD class of paranel processors, memory can be global or distributed. • Global memory may be visualized as being in a central location where all processors can access it with equal ease. • memorylatency-hiding techniques must be employed. An example of such methods is the use of multithreading. Part I
Global Vs. Distributed Memory (2) • Examples for both the processor-to-memory and processor-to-processor networks include: • an abstract model of global-memory computers, known as PRAM. • One approach to reducing the amount of data that must pass through the processor-tomemory interconnection network is to use a private cache memory. (locality of data access, cache coherence problem) Part I
Global Vs. Distributed Memory (3) • Distributed-memory architectures can be conceptually viewed as in Fig. 4.5. • In addition to the types of interconnection networks enumerated for shared-memory parallel processors, distributed-memory MIMD architectures can also be interconnected by a variety of direct networks. (as nonuniform memory access (NUMA) architectures) Part I
PRAM Shared-Memory Model (1) • The theoretical model used for conventional or sequential computers (SISD class) is known as the random-access machine (RAM) • The parallel version of RAM (PRAM), constitutes an abstract model of the class of global-memory parallel processors. The abstraction consists of ignoring the details of the processor-to-memory interconnection network and taking the view that each processor can access any memory location in each machine cycle, independent of what other processors are doing. Part I
PRAM Shared-Memory Model (2) • In the formal PRAM model, a single processor is assumed to be active initially. In each computation step, each active processor can read from and write into the shared memory and can also activate another processor. • Even though the global-memory architecture was introduced as a subclass of the MIMD class, the abstract PRAM model depicted in Fig. 4.6 can be SIMD or MIMD. Part I
PRAM Shared-Memory Model (3) • This implies that each instruction cycle would have to consume Ω(log p) real time. • The above point is important when we try to compare PRAM algorithms with those for distributed-memory models. An O(log p)-step PRAM algorithm may not be faster than an O(1og2p)-step algorithm for a hypercube architecture. Part I
Distributed-Memory or Graph Models (1) • Given the internal processor and memory structures in each node, a distributed-memory architecture is characterized primarily by the network used to interconnect the nodes. • This network is usually represented as a graph. • Important parameters of an interconnection network include • Network diameter: the longest of the shortest paths between various pairs of nodes • Bisection (band)width: the smallest number (total capacity) of links that need to be cut in order to divide the network into two subnetworks of half the size. • Vertex or node degree: the number of communication ports required of each node Part I
Distributed-Memory or Graph Models (2) • Even though the distributed-memory architecture was introduced as a subclass of the MIMD class, machines based on networks of the type shown in Fig. 4.8 can be SIMD- or MIMD-type. • Fig. 4.9 are available for reducing bus traffic by taking advantage of the locality of communication within small clusters of processors. Part I