1 / 18

2009 Parallelism Odyssey

2009 Parallelism Odyssey. CodeCampOZ ’07 Joel Pobar joelpobar@gmail.com http://callvirt.net/blog/. Agenda. Hardware The Current State More Moore’s Law Memory Models Programming Models Languages Plumbing Demo’s. Definitions. Concurrency:

kasia
Download Presentation

2009 Parallelism Odyssey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2009 Parallelism Odyssey CodeCampOZ ’07 Joel Pobar joelpobar@gmail.com http://callvirt.net/blog/

  2. Agenda • Hardware • The Current State • More Moore’s Law • Memory Models • Programming Models • Languages • Plumbing • Demo’s

  3. Definitions • Concurrency: Dijkstra -- “Concurrency occurs when two or more execution flows are able to run simultaneously” • Parallelism: Simultaneous execution of the same task!

  4. log transistors/die log CPU clock freq 5 B<10 GHz >30%/y ! 100 M 3 GHz <10%/y 10,000 1 MHz 2015 2003 1975 HardwarePerformance: The Multi-Core era • Processors don’t get way faster anymore – You just get a whole lot more slow ones!!!

  5. HardwareWhat’s the problem? • Power ~½ CV^2 Af • In other words: → Dissipated power is linear wrt. capacitance, activity and freq. → Power increases quadratically with CPU core voltage → Less voltage == more leakage → Leakage generally increases exponentially with smaller fab processes • 90→65→45 nm lithography advances • Great if you want more processor yields per wafer • Smaller transistors == more transistors per die == more features! • Reduction in Vcore (offset by increase in transistors) • Wires get smaller • Increased resistance == slower wires • Typically “global” wires are the problem

  6. Hardware • Power has been increasing as Voltage, Leakage, Activity and Frequency have been increasing

  7. HardwareThe end result • Maxed out thermal envelope + “slower” wires → slower CPU frequency scaling!! → reduced activity • Where to from here?

  8. HardwareLet’s take a quick look back

  9. HardwareSmoke and mirrors • Instruction execution throughput sped up by: • Superscalar architecture → executing multiple instructions at once → out of order instruction execution (OOE) • Exploiting memory cache hierarchy → more L1, L2 and now even L3 • Compiler & VM optimizations → processor type optimization • Simultaneous Multithreading → Intel’s HyperThreading → MULTICORE!

  10. HardwareOOE • Instructions are usually reordered to achieve better throughput 1 1 1 4 1 1 4 1

  11. HardwareILP decoder buffers • Allows multiple instructions to be executed at the same time on different registers = inst./data caches InstructionScheduler FPU FPU FPU ALU ALU MMX … ALU ALU ALU ALU

  12. Agenda • Hardware • The Current State • More Moore’s Law • Memory Models • Programming Models • Languages • Plumbing • Demo’s

  13. Programming ModelsServer side • Server per-client work-unit parallelism • Web server – implicit request parallelism • SQL: Implicit data parallelism • Scale out possible, but bottlenecks can occur at layer boundaries • Typically hard to scale out to lots of machines • Clusters (Beowulf, Windows HPC) • Grid Computing/Cycle Stealing (Sun Grid, G2, Alchemi) • Map/Reduce (Hadoop [Java, Open Source], Google MapReduce [Not available]

  14. Programming ModelsClient side • Shared memory, threads and locks • Most used, most disastrous • Synchronisation is costly – shared memory accesses across multiple CPU’s doesn’t scale (cache misses etc) • Tough “heisenbugs” • Loop parallelism: OpenMP • Message Passing: CCR, MPI, Erlang • Functional Languages: Implicit, no shared state • Software Transactional Memory • IMO: Most likely to solve the problem

  15. Programming ModelsMessage passing • Message passing systems • TODO:// Erlang code

  16. Programming ModelsFunctional • TODO:// Scheme code

  17. MapReduce • Nice functional programming model (similar to Google’s MapReduce model) • Scheduling, latency, file system, resource management • Things to think about: Hyperthreading, Programming model, code distribution, security, resource management for dummies, automatic scheduling and tuning • What we did…

  18. MapReduce.NET experiement

More Related