Parallelism l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Parallelism PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Parallelism. Can we make it faster?. The RAM model. The RAM (Random Access Machine) model of computation assumes: There is a single processing unit There is an arbitrarily large amount of memory Accessing any arbitrarily chosen (i.e. random) memory location takes unit time

Download Presentation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Can we make it faster?

The RAM model

  • The RAM (Random Access Machine) model of computation assumes:

    • There is a single processing unit

    • There is an arbitrarily large amount of memory

    • Accessing any arbitrarily chosen (i.e. random) memory location takes unit time

  • This simple model is very useful guide for algorithm design

    • For maximum efficiency, “tuning” to the particular hardware is required

  • The RAM model breaks down when the assumptions are violated

    • If an array is so large that only a portion of it fits in memory (the rest is on disk), very different sorting algorithms should be used

Approaches to parallelism

  • The basic question is, do the processing units share memory, or do they send messages to one another?

  • A thread consists of a single flow of control, a program counter, a call stack, and a small amount of thread-specific data

    • Threads share memory, and communicate by reading and writing to that memory

    • This is thread-based or shared-memory parallel processing

    • Java “out of the box” is thread-based

  • A process is a thread that has its own private memory

    • Threads (sometimes called actors) send messages to one another

    • This is message-passing parallel processing

The PRAM model

  • An obvious extension to the RAM model is the Parallel Random Access model, which assumes:

    • There are multiple processing units

    • There is an arbitrarily large amount of memory

    • Accessing any memory location takes unit time

  • The third assumption is “good enough” for many in-memory sequential programs, but not good enough for parallel programs

    • If the processing units share memory, then complicated and expensive synchronization mechanisms must be used

    • If the processing units do not share memory, then each has its own (fast) local memory, and communicates with other processes by sending messages to them (much slower--especially if over a network!)

  • Bottom line: Because there seems to be no way to meet the unit time assumption, the PRAM model is seriously broken!

The CTA model

  • The Candidate Type Architecture model makes these assumptions:

    • There are P standard sequential processors, each with its own local memory

      • One of the processors may be acting as “controller,” doing things like initialization and synchronization

    • Processors can access non-local memory over a communication network

      • Non-local memory is between 100 times and 10000 times slower to access than local memory (based on common architectures)

    • A processor can make only a very small number (maybe 1 or 2) of simultaneous non-local memory accesses

Consequences of CTA

  • The CTA model does not specify how many processors are available

    • The programmer does not need to plan for some specific number of processors

    • More processors may cause the code to execute somewhat more quickly

  • The CTA modes does specify a huge discrepancy between local and non-local memory access

    • The programmer should minimize the number of non-local memory accesses

Costs of parallelism

  • It would be great if having N processors meant our programs would run N times as fast, but...

  • There is overhead involved in setting up the parallelism, which we don’t need to pay for a sequential program

  • There are parts of any program that cannot be parallelized

  • Some processors will be idle because there is nothing for them to do

  • Processors have to contend for the same resources, such as memory, and may have to wait for one another


  • Overhead is any cost incurred by the parallel algorithm but not by the corresponding sequential algorithm

    • Communication among threads and processes (a single thread has no other threads with which to communicate)

    • Synchronization is when one thread or process has to wait for results or events from another thread or process

    • Contention for a shared resource, such as memory

      • Java’s synchronized is used to wait for a lock to become free

    • Extra computation to combine the results of the various threads or processes

    • Extra memory may be needed to give each thread or process the memory required to do its job

Amdahl’s law

  • Some proportion P of a program can be made to run in parallel, while the remaining (1 - P) must remain sequential

  • If there are N processors, then the computation can be done in (1 - P) + P/N time

  • The maximum speedup is then 1 . (1 - P) + P/N

  • As N goes to infinity, the maximum speedup is 1/(1 - P)

  • For example, if P = 0.75, the maximum speedup is (1/0.25), or four times

Consequences of Amdahl’s law

  • If 75% of a process can be parallelized, and there are four processors, then the possible speedup is1 / ((1 - 0.75) + 0.75/4) = 2.286

  • But with 40 processors--ten times as many--the speedup is only1 / ((1 - 0.75) + 0.75/40) = 3.721

  • This has led many people (including Amdahl) to conclude that having lots of processors won’t help very much

  • However....

  • For many problems, as the data set gets larger,

    • The inherently sequential part of the program remains (fairly) constant

    • Thus, the sequential proportion P becomes smaller

    • So: The greater the volume of data, the more speedup we can get

Idle time

  • Idle time results when

    • There is a load imbalance--one process may have much less work to do than another

    • A process must wait for access to memory or some other shared resource

      • Data is registers is most quickly accessed

      • Data in a cache is next most quickly accessed

        • A level 1 cache is the fastest, but also the smallest

        • A level 2 cache is larger, but slower

        • Memory--RAM--is much slower

        • Disk access is very much slower


  • A dependency is when one thread or process requires the result of another thread or process

    • Example: (a + b) * (c + d)

      • The additions can be done in parallel

      • The multiplication must wait for the results of the additions

      • Of course, at this level, the hardware itself handles the parallelism

  • Threads or processors that depend on results from other threads or processors must wait for those results

Parallelism in Java

  • Java uses the shared memory model

    • There are various competing Java packages (such as Akka and Kilim) to support message passing, but nothing yet in the official Java release

  • The programming language Erlang has developed the message passing approach

  • Scala is a Java competitor that supports both approaches

    • Scala’s message passing is based on Erlang

Concurrency in Java, I

  • Java Concurrency in Practice, by Brian Goetz, is the book to have if you need to do much concurrent programming in Java

  • The following 11 points are from his summary of basic principles

  • It’s the mutable state, stupid!

  • Make fields final unless they need to be mutable.

  • Immutable objects are automatically thread-safe.

  • Encapsulation makes it practical to manage the complexity.

  • Guard each mutable variable with a lock.

Concurrency in Java, II

  • Guard all variables in an invariant with the same lock.

  • Hold locks for the duration of compound actions.

  • A program that accesses a mutable variable from multiple threads without synchronization is a broken program.

  • Don’t rely on clever reasoning about why you don’t need to synchronize.

  • Include thread safety in the design process—or explicitly document that your class in not thread-safe.

  • Document your synchronization policy.

Functional programming

  • In functional programming (FP):

    • A function is a value

      • It can be assigned to variables

      • It can be passed as an argument to another function

      • It can be returned as the result of a function call

      • There are much briefer ways of writing a literal function

        • Scala example: a => 101 * a

    • A function acts like a function in mathematics

      • If you call it with the same arguments, you will get the same result. Every time. Guaranteed.

      • Functions have no side effects

    • Immutable values are strongly emphasized over mutable values

      • Some languages, such as Haskell, don’t allow mutable values at all

      • Computation proceeds by the application of functions, not by changing the state of mutable variables

Why functional programming?

  • Here are the three most important reasons that functional programming is better for concurrency than imperative programming:

    • Immutable values are automatically thread safe

    • Immutable values are automatically thread safe

    • Immutable values are automatically thread safe

Why not functional programming?

  • Functional languages—Lisp, Haskell, ML, OCaml—have long been regarded as only for ivory-tower academics

  • Functional languages are “weird” (meaning: unfamiliar)

What’s happening now?

  • Moore’s law has ended

    • Instead of getting faster processors, we’re now getting more of them

    • Consequently, parallelism, and concurrency, have become much more important

  • After about ten years, CIS 120 is once again starting with OCaml

  • Python has gotten more functional

  • Other languages are getting more functional

  • Microsoft is starting to promote F# (based on ML?)

  • Java 8 will have some functional features

  • Scala is a hybrid object/functional language based on Java, and is freely available now

The End

…for now

  • Login