80 likes | 180 Views
Explore data parallelism, MapReduce concepts, and distributed computing in this educational lecture series covering Ocamel programming, parallelization techniques, functional programming, and Google's implementation of MapReduce. Detailed presentation on how to apply SIMD, MIMD, and distributed computing. Learn about MapReduce paradigm and its wide-scale usage at Google in Fall 2008.
E N D
Lecture #4Introduction to Data Parallelism and MapReduce CS492 Special Topics in Computer Science: Distributed Algorithms and Systems
Today’s Topics to Cover • Short quiz on programming in Ocaml
How to parallelize (I) Runlength encoding Fibonacchi function Calculation of π Word count Inverted index
How to parallelize (II) SIMD MIMD via shared memory MIMD via message passing Distributed computing
MapReduce • Functional programming “Map / Reduce” way of thinking about problem solving • Google’s runtime library supporting MR paradigm at a very large scale
Fall 2008 CS492 MapReduce Execution Overview
How popular is MapReduce? • In September 2007, Google used 11,081 “machine-years” (roughly, CPU-years) on MapReduce jobs alone • Assume all machines were busy 100% and ran only MR 11,081 x 365 / 30 = 134,818 • If a rack holds 176 CPUS (88 1U dual-processor) 134,818 / 176 = 766
Reading material “MapReduce: Simplified data processing on large clusters” by J. Dean and S. Ghemawat Communications of the ACM, Jan. 2008/Vol. 51, No. 1 “MapReduce: Simplified data processing on large clusters” by J. Dean and S. Ghemawat USENIX OSDI 2004