1 / 55

If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)?

If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)?. Joel Adams Department of Computer Science Calvin College. An Anecdote about CCSC:MW. This story has nothing to do with parallel computing, but it may be of interest…

Download Presentation

If Parallelism Is The New Normal, How Do We Prepare Our Students (And Ourselves)?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. If Parallelism Is The New Normal, How Do We Prepare Our Students(And Ourselves)? Joel Adams Department of Computer Science Calvin College

  2. An Anecdote about CCSC:MW This story has nothing to do with parallel computing, but it may be of interest… Did you know that if it were not for CCSC:MW, CS Education Week would likely not exist? CCSC:MW 2014 - 2

  3. How CCSC:MW  CS Ed Week • No Child Left Behind was killing HS CS! • I’m pretty apolitical, but ... At CCSC:MW in2008: • The ACM-CSTA’s Chris Stevenson gave the keynote, describing the decline of CS in high schools CCSC:MW 2014 - 3

  4. How CCSC:MW  CS Ed Week (a Physics PhD and former Calvin prof). • He was surprised to hear of the problems (esp. enrollment declines) CS was facing. I decided to visit my Congressman, Rep. Vernon Ehlers, ranking member of the House Committee on Science & Technology CCSC:MW 2014 - 4

  5. How CCSC:MW  CS Ed Week • CCSC:MW catalyzed CS Education Week! Rep. Ehlers contacted the ACM, specifically Cameron Wilson. They worked together on CS Education Week, which the House passed 405-0 in 2009. CCSC:MW 2014 - 5

  6. What’s Happening Now? There is a bill currently in Congress: • H.R. 2536: The CS Education Act of 2013 • It seeks to strengthen K-12 CS education, and make CS a core subject. • It currently has 116 co-sponsors (62R, 54D); is supported by ACM, NCWIT, Google, MS, ... • It has been referred to the Committee on Early Childhood, Elementary, and Secondary Ed., chaired by Rep. Todd Rokita (R, IN). CCSC:MW 2014 - 6

  7. Most Representatives Are Unaware CCSC:MW 2014 - 7

  8. What Can You Do? There is strength in numbers: • Contactyour Congressional reprentative and ask them to co-sponsor HR 2536. • If you are in Rep. Rokita’s district… (!) • More co-sponsors improve its chances. • Tweet to Rep. Rokita (@ToddRokita) • Tell him you support HR 2536 – the CS Education Act of 2013 – and want it to pass. CCSC:MW 2014 - 8

  9. And Now, Back To Today’s Topic Overview • The past • How our computing foundation has shifted • The present • Today’s hardware & software landscapes • The future? • Preparing ourselves & our students CCSC:MW 2014 - 9

  10. the sun hot plate Temperature actual projected 2020 CCSC:MW 2014 - 10

  11. The Heat Problem… • … was not caused by Moore’s Law • It was caused by manufacturers doubling the clock speeds every 18-24 months • This was the “era of the free lunch” for software developers: • If your software was sluggish, faster hardware would fix your problem within two years! CCSC:MW 2014 - 11

  12. Solving the Heat Problem… • In 2005, manufacturers stopped doubling the clock speeds because of the heat, power consumption, electron bleeding, … • This ended the “era of the free lunch” • Software will no longer speed up on its own. CCSC:MW 2014 - 12

  13. Clock Speed (frequency) trend CCSC:MW 2014 - 13

  14. But Moore’s Law Continued • Every 2 years, manufacturers could still double the transistors in a given area: • 2006: Dual-core CPUs • 2008: Quad-core CPUs • 2010: 8-core CPUs • 2012: 16-core CPUs • … • Each of these cores has the full functionality of a traditional CPU. CCSC:MW 2014 - 14

  15. 12 Years of Moore’s Law • 2001: ohm.calvin.edu: 18 nodes, each with: • One 1-GHz Athlon CPU • 1 GB RAM / node • Gigabit Ethernet, USB, HDMI, … • Ubuntu Linux • ~$60,000 (funded by NSF). • 2013: Adapteva Parallella • A Dual-core 1-GHz ARM A7 • 16 core Epiphany Coprocessor • 1 GB RAM • Gigabit Ethernet, USB, HDMI, … • Ubuntu Linux • ~$99 (but freevia university program!) CCSC:MW 2014 - 15

  16. Multiprocessors are Inexpensive • 2014: Nvidia Jetson TK1 • Quad-core ARM A15 • Kepler GPU w/ 192 CUDA cores • 2 GB RAM • Gigabit Ethernet, HDMI, USB, … • Ubuntu Linux • ~$200 CCSC:MW 2014 - 16

  17. Multiprocessors are Everywhere CCSC:MW 2014 - 17

  18. Some Implications • Traditional sequential programs will not run faster on today’s hardware. • They may well run slower because the manufacturers are decreasing clock speeds. • The only software that will run faster is parallel software designed to scalewith the number of cores. CCSC:MW 2014 - 18

  19. Categorizing Parallel Hardware Parallel Systems Heterogeneous Systems Distributed Memory Shared Memory Multicore Accelerators Newer Clusters Modern Super Computers Older Clusters GPUs Coprocessors CCSC:MW 2014 - 19

  20. Hardware: A Diverse Landscape • Shared-memory systems • Distributed-memory systems • Heterogeneous systems Core1 Core2 Core3 Core4 Memory Mem1 CPU1 Network CPU2 CPU3 Mem2 Mem3 CPUN MemN CCSC:MW 2014 - 20

  21. CS Curriculum 2013 Because of this hardware revolution, the advent of cloud computing, and so on, CS2013 has added a new knowledge area: Parallel and Distributed Computing (PDC) CCSC:MW 2014 - 21

  22. What is PDC? It goes beyond traditional concurrency: • Parallelemphasizes: • Throughput / performance (and timing) • Scalability (performance improves with # of cores) • New topics like speedup, Amdahl’s Law, … • Distributedemphasizes: • Multiprocessing (no shared memory) • MPI, MapReduce/Hadoop, BOINC, … • Cloud computing • Mobile apps accessing scalable web services CCSC:MW 2014 - 22

  23. Software: Communication Options In shared-memory systems, programs may: • Communicate via the shared-memory • Languages: Java, C++11, … • Libraries: POSIX threads, OpenMP • Communicate via message passing • Message-passing languages: Erlang, Scala, … • Libraries: the Message Passing Interface (MPI) CCSC:MW 2014 - 23

  24. CS Curriculum 2013 (CS2013) • The CS2013 core includes 15 hours of parallel & distr. computing (PDC) topics • 5 hours in core Tier 1 • 10 hours in core Tier 2 + related topics in System Fundamentals (SF) • How/where do we cover these topics in the CS curriculum? CCSC:MW 2014 - 24

  25. Model 1: Create a New Course Add a new course to the CS curriculum that covers the core PDC topics: • If someone else has to teach this new course, dealing with PDC is theirproblem, not mine! • The CS curriculum is already full! • What do we drop to make room? CCSC:MW 2014 - 25

  26. Model 2: Across the Curriculum Sprinkle 15+ hours (3 weeks) of PDC across our core CS courses, not counting SF: • Students see relationship of PDC to data structures, algorithms, prog. lang., … • Easier to make room for 1 week in 1 course than jettison an entire course. • Spreads the effort across multiple faculty • All those faculty have to be “on board” CCSC:MW 2014 - 26

  27. Calvin CS Curriculum Year Fall Semester Spring Semester 1 Intro to Computing Calculus I Data Structures Calculus II Data Structures Calculus II 2 Algorithms & DS Intro. Comp. Arch. Discrete Math I Programming Lang. Discrete Math II Algorithms & DS Intro. Comp. Arch. Programming Lang. Discrete Math II 3 Software Engr Adv. Elective OS & Networking Adv. Elective Statistics Software Engr. OS & Networking 4 Adv. Elective Sr. Practicum I Adv. Elective Sr. Practicum II Perspectives on Comp. Adv. Elective: HPC CCSC:MW 2014 - 27

  28. Why Introduce Parallelism in CS2? • For students to be facile with parallelism, they need to see it early and often. • Performance(Big-Oh) is a topic that’s first addressed in CS2. • Data structures let us store large data sets • Slow sequential processing of these sets provides a natural motivation for parallelism. CCSC:MW 2014 - 28

  29. Parallel Topics in CS2 • Lecture topics: • Single threading vs. multithreading • The single-program-multiple-data (SPMD), fork-join, parallel loop, and reduction patterns • Speedup, asymptotic performance analysis • Parallel algorithms: searching, sorting • Race conditions: non-thread-safe structures • Lab exercise: Compare sequential vs. parallel matrix operations using OpenMP CCSC:MW 2014 - 29

  30. Lab Exercise: Matrix Operations Given a Matrix class, the students: • Measure the time to perform sequential addition and transpose methods • For each of three different approaches: • Use the approach to parallelize those methods • Record execution times in a spreadsheet • Create a chart showing time vs # of threads Students directly experience the speedup… CCSC:MW 2014 - 30

  31. Addition: m3 = m1 + m2 ~36 steps + = Multi-threaded (4 threads): ~9 steps = + Single-threaded: CCSC:MW 2014 - 31

  32. Tranpose: m2 = m1.transpose() ~24 steps = .tranpose() Multi-threaded (4 threads): ~6 steps = .tranpose() Single-threaded: CCSC:MW 2014 - 32

  33. SIGCSE 2014 - 33

  34. Programming Project • Parallelize other Matrix operations • Multiplication • Assignment • Constructors • Equality • Some operations (file I/O) are inherently sequential, providing a useful lesson… CCSC:MW 2014 - 34

  35. Alternative Exercise/Project • Parallelize image-processing operations: • Color-to-grayscale • Invert (negative) • Blur, Sharpen • Sepia-tinting • Many students will find photo-processing to be more engaging than matrix ops. CCSC:MW 2014 - 35

  36. Assessment All students complete end-of-course evaluations with open-ended feedback: • They really like the week on parallelism • Covering material that is not in the textbook makes CS2 seem fresh and cutting edge • Students really like learning how they can use all their cores instead of just one • Having students experience speedup is key (and even better if they can seeit) CCSC:MW 2014 - 36

  37. More Implications • Software developers who cannot build parallel apps will be unable to leverage the full power of today’s hardware. • At a competitive disadvantage? • Designing / writing parallel apps is very different from designing / writing sequential apps. • Pros think in terms of parallel design patterns CCSC:MW 2014 - 37

  38. Parallel Design Patterns … are industry-standard strategies that parallel professionals have found useful over 30+ years of practice. … often have direct support built into popular platforms like MPI and OpenMP. … are likely to remain useful, regardless of future PDC developments. … provide a framework for PDC concepts. CCSC:MW 2014 - 38

  39. Algorithm Strategy Patterns Example 1: Most parallel programs use one of just three parallel algorithm strategy patterns: • Data decomposition: divide up the data and process it in parallel. • Task decomposition: divide the algorithm into functional tasks that we perform in parallel (to the extent possible). • Pipeline: divide the algorithm into linear stages, through which we “pump” the data. Of these, only data decomposition scales well… CCSC:MW 2014 - 39

  40. Data Decomposition (1 thread) Thread 0 CCSC:MW 2014 - 40

  41. Data Decomposition (2 threads) Thread 0 Thread 1 CCSC:MW 2014 - 41

  42. Data Decomposition (4 threads) Thread 0 Thread 1 Thread 2 Thread 3 CCSC:MW 2014 - 42

  43. Task Decomposition Thread 0 int main() { x = f(); y = g(); z = h(); w = x + y + z; } main() f() g() h() Thread 1 Thread 2 Thread 3 Independent functions in a sequential computation can be “parallelized”: CCSC:MW 2014 - 43

  44. Pipeline 1 2 3 4 5 6 Time- Step: 0 Thread 0 main() int main() { ... while (fin) { fin >> a; b = f(a); c = g(b); d = h(c); fout << d; } ... } a4 a2 a3 a6 a1 a5 a0 Thread 1 f(a) b3 b1 b2 b5 b0 b4 Thread 2 g(b) c2 c0 c1 c4 c3 Thread 3 h(c) d1 d0 d3 d2 … can still be pipelined: Programs with non-independent functions… CCSC:MW 2014 - 44

  45. Scalability • If a program gets faster as more threads /cores are used, its performance scales. • For the three algorithm strategy patterns: • Only data decomposition scales well. CCSC:MW 2014 - 45

  46. The Reduction Pattern To sum these 8 numbers: 6 8 9 1 5 7 2 4 Step 1 14 10 12 6 Step 2 24 18 Step 3 42 Programs often need to combine the local results of N parallel tasks: • When N is large, O(N) time is too slow • The reductionpattern does it in O(lg(N))time: CCSC:MW 2014 - 46

  47. A Parallel Pattern Taxonomy

  48. Faculty Development Resources • National Computational Science Institute (NCSI) offers workshops each summer: • www.computationalscience.org/workshops/ • The XSEDE Education Program offers workshops, bootcamps, and facilities: • www.xsede.org/curriculum-and-educator-programs • The LittleFe Project offers “buildouts” at which participants can build (and take home) a free portable Beowulf cluster: • littlefe.net CCSC:MW 2014 - 48

  49. LittleFe • Little Fe (v4): 6 nodes • Dual-core Atom CPU • Nvidia ION2 w/ 16 CUDA cores • 2 GB RAM • GigabitEthernet, USB, … • Custom Linux distro (BCCD) • Pelican case • ~$2500 (but free at “buildouts”!) SIGCSE 2014 - 49

  50. Faculty Development Resources • CSinParallel is an NSF-funded project to help CS educators integrate PDC topics. • 1-3 hour hands-on PDC “modules” in: • Different level courses • Different languages • Different parallel design patterns (patternlets) • Workshops (today, here; summer 2015 in Chicago) • Community of supportive people to help work through problems and issues. • csinparallel.org CCSC:MW 2014 - 50

More Related