Skip this Video
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 19

Parallelism - PowerPoint PPT Presentation

  • Uploaded on

Parallelism. Can we make it faster?. The RAM model. The RAM (Random Access Machine) model of computation assumes: There is a single processing unit There is an arbitrarily large amount of memory Accessing any arbitrarily chosen (i.e. random) memory location takes unit time

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Parallelism' - yaron

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Can we make it faster?

the ram model
The RAM model
  • The RAM (Random Access Machine) model of computation assumes:
    • There is a single processing unit
    • There is an arbitrarily large amount of memory
    • Accessing any arbitrarily chosen (i.e. random) memory location takes unit time
  • This simple model is very useful guide for algorithm design
    • For maximum efficiency, “tuning” to the particular hardware is required
  • The RAM model breaks down when the assumptions are violated
    • If an array is so large that only a portion of it fits in memory (the rest is on disk), very different sorting algorithms should be used
approaches to parallelism
Approaches to parallelism
  • The basic question is, do the processing units share memory, or do they send messages to one another?
  • A thread consists of a single flow of control, a program counter, a call stack, and a small amount of thread-specific data
    • Threads share memory, and communicate by reading and writing to that memory
    • This is thread-based or shared-memory parallel processing
    • Java “out of the box” is thread-based
  • A process is a thread that has its own private memory
    • Threads (sometimes called actors) send messages to one another
    • This is message-passing parallel processing
the pram model
The PRAM model
  • An obvious extension to the RAM model is the Parallel Random Access model, which assumes:
    • There are multiple processing units
    • There is an arbitrarily large amount of memory
    • Accessing any memory location takes unit time
  • The third assumption is “good enough” for many in-memory sequential programs, but not good enough for parallel programs
    • If the processing units share memory, then complicated and expensive synchronization mechanisms must be used
    • If the processing units do not share memory, then each has its own (fast) local memory, and communicates with other processes by sending messages to them (much slower--especially if over a network!)
  • Bottom line: Because there seems to be no way to meet the unit time assumption, the PRAM model is seriously broken!
the cta model
The CTA model
  • The Candidate Type Architecture model makes these assumptions:
    • There are P standard sequential processors, each with its own local memory
      • One of the processors may be acting as “controller,” doing things like initialization and synchronization
    • Processors can access non-local memory over a communication network
      • Non-local memory is between 100 times and 10000 times slower to access than local memory (based on common architectures)
    • A processor can make only a very small number (maybe 1 or 2) of simultaneous non-local memory accesses
consequences of cta
Consequences of CTA
  • The CTA model does not specify how many processors are available
    • The programmer does not need to plan for some specific number of processors
    • More processors may cause the code to execute somewhat more quickly
  • The CTA modes does specify a huge discrepancy between local and non-local memory access
    • The programmer should minimize the number of non-local memory accesses
costs of parallelism
Costs of parallelism
  • It would be great if having N processors meant our programs would run N times as fast, but...
  • There is overhead involved in setting up the parallelism, which we don’t need to pay for a sequential program
  • There are parts of any program that cannot be parallelized
  • Some processors will be idle because there is nothing for them to do
  • Processors have to contend for the same resources, such as memory, and may have to wait for one another
  • Overhead is any cost incurred by the parallel algorithm but not by the corresponding sequential algorithm
    • Communication among threads and processes (a single thread has no other threads with which to communicate)
    • Synchronization is when one thread or process has to wait for results or events from another thread or process
    • Contention for a shared resource, such as memory
      • Java’s synchronized is used to wait for a lock to become free
    • Extra computation to combine the results of the various threads or processes
    • Extra memory may be needed to give each thread or process the memory required to do its job
amdahl s law
Amdahl’s law
  • Some proportion P of a program can be made to run in parallel, while the remaining (1 - P) must remain sequential
  • If there are N processors, then the computation can be done in (1 - P) + P/N time
  • The maximum speedup is then 1 . (1 - P) + P/N
  • As N goes to infinity, the maximum speedup is 1/(1 - P)
  • For example, if P = 0.75, the maximum speedup is (1/0.25), or four times
consequences of amdahl s law
Consequences of Amdahl’s law
  • If 75% of a process can be parallelized, and there are four processors, then the possible speedup is1 / ((1 - 0.75) + 0.75/4) = 2.286
  • But with 40 processors--ten times as many--the speedup is only1 / ((1 - 0.75) + 0.75/40) = 3.721
  • This has led many people (including Amdahl) to conclude that having lots of processors won’t help very much
  • However....
  • For many problems, as the data set gets larger,
    • The inherently sequential part of the program remains (fairly) constant
    • Thus, the sequential proportion P becomes smaller
    • So: The greater the volume of data, the more speedup we can get
idle time
Idle time
  • Idle time results when
    • There is a load imbalance--one process may have much less work to do than another
    • A process must wait for access to memory or some other shared resource
      • Data is registers is most quickly accessed
      • Data in a cache is next most quickly accessed
        • A level 1 cache is the fastest, but also the smallest
        • A level 2 cache is larger, but slower
        • Memory--RAM--is much slower
        • Disk access is very much slower
  • A dependency is when one thread or process requires the result of another thread or process
    • Example: (a + b) * (c + d)
      • The additions can be done in parallel
      • The multiplication must wait for the results of the additions
      • Of course, at this level, the hardware itself handles the parallelism
  • Threads or processors that depend on results from other threads or processors must wait for those results
parallelism in java
Parallelism in Java
  • Java uses the shared memory model
    • There are various competing Java packages (such as Akka and Kilim) to support message passing, but nothing yet in the official Java release
  • The programming language Erlang has developed the message passing approach
  • Scala is a Java competitor that supports both approaches
    • Scala’s message passing is based on Erlang
concurrency in java i
Concurrency in Java, I
  • Java Concurrency in Practice, by Brian Goetz, is the book to have if you need to do much concurrent programming in Java
  • The following 11 points are from his summary of basic principles
  • It’s the mutable state, stupid!
  • Make fields final unless they need to be mutable.
  • Immutable objects are automatically thread-safe.
  • Encapsulation makes it practical to manage the complexity.
  • Guard each mutable variable with a lock.
concurrency in java ii
Concurrency in Java, II
  • Guard all variables in an invariant with the same lock.
  • Hold locks for the duration of compound actions.
  • A program that accesses a mutable variable from multiple threads without synchronization is a broken program.
  • Don’t rely on clever reasoning about why you don’t need to synchronize.
  • Include thread safety in the design process—or explicitly document that your class in not thread-safe.
  • Document your synchronization policy.
functional programming
Functional programming
  • In functional programming (FP):
    • A function is a value
      • It can be assigned to variables
      • It can be passed as an argument to another function
      • It can be returned as the result of a function call
      • There are much briefer ways of writing a literal function
        • Scala example: a => 101 * a
    • A function acts like a function in mathematics
      • If you call it with the same arguments, you will get the same result. Every time. Guaranteed.
      • Functions have no side effects
    • Immutable values are strongly emphasized over mutable values
      • Some languages, such as Haskell, don’t allow mutable values at all
      • Computation proceeds by the application of functions, not by changing the state of mutable variables
why functional programming
Why functional programming?
  • Here are the three most important reasons that functional programming is better for concurrency than imperative programming:
    • Immutable values are automatically thread safe
    • Immutable values are automatically thread safe
    • Immutable values are automatically thread safe

Why not functional programming?

  • Functional languages—Lisp, Haskell, ML, OCaml—have long been regarded as only for ivory-tower academics
  • Functional languages are “weird” (meaning: unfamiliar)
what s happening now
What’s happening now?
  • Moore’s law has ended
    • Instead of getting faster processors, we’re now getting more of them
    • Consequently, parallelism, and concurrency, have become much more important
  • After about ten years, CIS 120 is once again starting with OCaml
  • Python has gotten more functional
  • Other languages are getting more functional
  • Microsoft is starting to promote F# (based on ML?)
  • Java 8 will have some functional features
  • Scala is a hybrid object/functional language based on Java, and is freely available now
the end
The End

…for now