1 / 20

Computing Machinery Chapter 11: Alternative Architectures

Computing Machinery Chapter 11: Alternative Architectures. Flynn's Taxonomy. Parallel Architectures Functional Diagrams. Pipeline Processing. PRAM (Parallel Random Access Machine). EREW - Exclusive Read/Exclusive Write CREW - Concurrent Read/Exclusive Write ERCW - Not Used

dasha
Download Presentation

Computing Machinery Chapter 11: Alternative Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Machinery Chapter 11: Alternative Architectures

  2. Flynn's Taxonomy

  3. Parallel Architectures Functional Diagrams

  4. Pipeline Processing

  5. PRAM (Parallel Random Access Machine) EREW - Exclusive Read/Exclusive Write CREW - Concurrent Read/Exclusive Write ERCW - Not Used CRCW - Concurrent Read/Concurrent Write

  6. Concurrent Read/Exclusive Write (CREW) In this model, a particular address in shared memory can be read by multiple processors concurrently. However only one processor at a time can write to a particular address in shared memory. Concurrent means that the order in which two operations occur, does not affect the outcome (or state) of the system.

  7. Concurrent Read/Concurrent Write (CRCW) - In the concurrent read, concurrent write PRAM model, multiple processors can read from or write to the same address in shared memory concurrently. A number of alternative interpretations for the concurrent write operation have been studied. We can choose from a number of operations for concurrent write such as RANDOM, PRIORITY, MAX, and SUM.

  8. Parallel Architecture Performance Analysis Speed - The speed of a computing system is the amount of work accomplished (e.g. number of instructions completed) in a specified time. So we normally refer to processing speed in terms of instructions per second. Speedup - The speedup for a multi-processor system is the ratio of the time required to solve a problem using a multi-processor computer to the time required for a single-processor computer. Since speedup is the ratio of two quantities that have the same units (instructions per second), it is a unitless quantity.

  9. Efficiency - The efficiency of an n-processor multi-processor computer system is defined as the speedup of the multi-processor divided by the number of processors, n. Traditionally it has been assumed that efficiency cannot be greater than unity (1).

  10. Simultaneous Multithreading (SMT) The functional difference between conventional multiprocessing and SMT is that in the first case each functional processor is a separate physical processor and in the second case one set of arithmetic and logical functions are shared between the logical processors within a physical (multicore) CPU.

  11. Scheduling Priority in SMT When two instructions are in contention for a resource the one from the higher-priority thread slot "wins" the contention. In order to prevent indefinite postponement, the SMT scheduling policy rotates the priority ranking periodically.

  12. Internal Organization of an SMT Architecture

  13. Commercial Vector Processor Super-Computers The vector processor concept is still a viable one. While general-purpose vector supercomputers are being surpassed by much less expensive multiprocessors, the vector processor concept has a likely future in special applications such as audio, and video processing and computer generated imagery.

  14. Array Processor for Video Decoding

  15. Shared-Memory Multiprocessor For speed and efficiency each processor of shared memory multiprocessor system keeps a cache of local memory, periodically updated from a common shared memory. The shared memory of a parallel processing system needs a management scheme that ensures that all processsors keep a current version of all data values (this is called memory coherence).

  16. MESI Protocol In the MESI protocol, a two bit tag is used to designate the status of each address of shared memory. modified - When the status is modified this means that the data value in cache has been altered but is not currently held in the cache of any othe processor. This status indicates that the address must be written back to shared memory before it is overwritten by another word. exclusive - When the status is exclusive thsi means that the data value is being held only by the current processor and has not been modified. When it is time to write over this value in cache, it does not need to be written back to the shared memory. shared - The shared status means that copies of this value may be stored in the caches of other processors. invalid - The invalid status indicates that this cache line is not valid. In order to validate these data, the cache must be updated from shared memory.

  17. Multicore Data Coherence The MOESI protocol is an extension of the MESI protocol that adds a new status called owned. A processor can write to a cache line it owns even if other processors are holding copies. When a processor modifies data it owns, it is responsible for updating the copies being held by other processors. The MOESI protocol is used in multicore CPU's in which processor-to-processor communication is much faster than access to shared memory.

  18. 4-D Hypercube Interconnections

  19. Neural Networks

  20. The Future of Computer Architecture the end of Moore's Law It is believed that the ability to achieve process shrinks will continue as far as into the early 2010's but relatively soon. Specifically, the quantum mechanical properties of electrons and other atoms begin to dominate in the substrate when the feature size reaches around 50 nanometers. At sizes smaller than this, only a few electrons are needed to saturate the channel. Statistical fluctuations due to thermal effects will make the switching of transistors difficult to control.

More Related