1 / 14

CS 341 Programming Language Design and Implementation

CS 341 Programming Language Design and Implementation. Administrative: Final project part 1, topic selection : due Mon, 4/21 @ 11am Today: parallel programming for performance…. Async vs. Parallel Programming…. Async programming: Better responsiveness… Long-running operations

kayla
Download Presentation

CS 341 Programming Language Design and Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 341 Programming Language Design and Implementation • Administrative: • Final project part 1, topic selection: due Mon, 4/21 @ 11am • Today: • parallel programming for performance… CS 341 -- 18 Apr 2014

  2. Async vs. Parallel Programming… • Asyncprogramming: • Better responsiveness… • Long-running operations • I/O operations • OS calls • Parallel programming: • Better performance… • Numerical processing • Data analysis • Big data Lecture 30 CS 341 -- 18 Apr 2014 2

  3. HW solution for better performance? • Multiple cores… Lecture 30 CS 341 -- 18 Apr 2014 3

  4. SWsolution for better performance? • Threading across different cores… C C Main <<start Work1>> <<start Work2>> My work… Work2 Stmt4; Stmt5; Stmt6; Work1 Stmt1; Stmt2; Stmt3; C C C C C C Workerthread Workerthread Mainthread Lecture 30 CS 341 -- 18 Apr 2014 4

  5. First things to consider… • how to divide up the data? • rows? columns? blocks? nodes? sub-trees? • is workload evenly-distributed, or unpredictable? • if workload even, we can decide ahead of time how to divide up ("static") • if workload is unpredictable, need adaptive approach ("dynamic") • how to map threads onto the data? • which threads touch which data?

  6. Demo: • The importance of data layout & locality… • Matrix multiplication Lecture 30 CS 341 -- 18 Apr 2014

  7. Sequential Going parallel // // Naïve, triply-nested sequential solution: // Parallel.For(0, N, (i) => for (inti = 0; i < N; i++) { for(int j= 0; j< N; j++) { for(int k= 0; k< N; k++) C[i][j] += (A[i][k] * B[k][j]); } } ); Parallel fork Sequential Works great! 2x faster on 2 cores, 4x faster on 4 cores, … join

  8. But wait… • What’s the other half of the chip? • Are we using it effectively? Memory cache… Lecture 30 CS 341 -- 18 Apr 2014

  9. Memory architecture C C • Key features: • Registers • L1 cache • L2 cache • L3 cache • RAM C C L3 Cache Each level of cache is 10x slower to access: L1: 1 cycle L2: 10 cycles L3: 100 cycles RAM: 1000 cycles

  10. A better matrix multiply… X Lecture 30 CS 341 -- 18 Apr 2014

  11. Cache-friendly matrix multiplication — Step 1 • Loop interchange so inner-most loop goes along row… Parallel.For(0, N, (i) => { for (intk= 0; k < N; k++) for (intj= 0; j < N; j++) C[i][j] += (A[i][k] * B[k][j]); }); Factor of 2-10x improvement!

  12. Cache-friendly matrix multiplication — Step 2 • work in blocks so they fit in L1 cache… for (intjj=0; jj<N; jj+=BS) { intjjEND= Min(jj+BS, N); // initialize: for (inti=0; i<N; i++) for (int j=jj; j < jjEND; j++) C[i][j] = 0.0; // block multiply: for (intkk=0; kk<N; kk+=BS) { intkkEND= Min(kk+BS, N); for (inti=0; i<N; i++) for (int k=kk; k < kkEND; k++) for (int j=jj; j < jjEND; j++) C[i][j] += (A[i][k] * B[k][j]); } } Another factor of 2-4x…

  13. Current state of parallel programming • in mainstream languages… CS 341 -- 18 Apr 2014

  14. Parallel execution model in C# Parallel.For( ... ); task task task task Windows Process (.NET) App Domain App Domain App Domain C C C C C C C C Task Parallel Library .NET Thread Pool Task Scheduler worker thread worker thread worker thread worker thread Resource Manager Windows

More Related