1 / 35

CIS 610: Many-core visualization libraries

CIS 610: Many-core visualization libraries. Hank Childs, University of Oregon. Jan. 21st, 2013. Schedule for this class. We have done 5 lectures in 2 weeks We should have done 4 lectures over last two weeks We will do 3 lectures this week We will be one full week ahead of schedule.

roza
Download Presentation

CIS 610: Many-core visualization libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 610: Many-core visualization libraries Hank Childs, University of Oregon Jan. 21st, 2013

  2. Schedule for this class • We have done 5 lectures in 2 weeks • We should have done 4 lectures over last two weeks • We will do 3 lectures this week • We will be one full week ahead of schedule. • We will cancel two lectures over the coming weeks.

  3. Schedule this week • Tuesday lecture: today • Review of data parallel operations, general discussion of packages so far • Thursday lecture: Ken Moreland • (Thursday colloquium @ 12: Ken Moreland) • Friday lecture: Ken Moreland • 8:30-10 (I can’t make this time) • 11-12:30 • 11:30-1:00

  4. Upcoming schedule • Tuesday, Jan 28th • 10 minute presentation by each student on the project they want to pursue • Non-binding • Discuss the problem, and some initial thoughts about how to do it in many-core libraries

  5. Upcoming schedule • Thursday, Jan 30th • Group session debugging problems. • Important that you have started your project by then.

  6. Upcoming schedule • Weeks following • Series of 20 minute presentations, 3 per lecture • Two flavors of presentation: • “Update on my project” • “Overview of a paper I read”

  7. How this class will be graded • You will all submit a report at the end of the quarter. • The report will include: • A summary of what you have done • It will focus on your project • You should also include • Presentations made • Porting of libraries • Assistance to other students • Bugs debugged (or reported) • Etc…

  8. How this class will be graded • It is not curved • If you all decide to not to present papers, you will all be penalized • I expect you all get very good grades • But it is important that you work hard and accomplish something in this class • Play with the libraries, present papers in class, and really try to nail your “research project”

  9. Lectures • I expect you will all make about 3 presentations • 1 research update, 2 papers • 2 research updates, 1 paper • Some lectures in the short term on CUDA, Thrust, data parallelism, etc, would probably be helpful.

  10. EAVLExtreme-scale Analysis and Visualization Library Jeremy Meredith January, 2014

  11. A Simple Data-Parallel Operation void CellToCellDivide(Field &a, Field &b, Field &c) { for_each(i) c[i] = a[i] / b[i]; } void CalculateDensity(...) { //... CellToCellDivide(mass, volume, density); } Internal Library API Provides This Algorithm Developer Writes This

  12. Functor + Iterator Approach void CalculateDensity(...) { //... CellToCellBinaryOp(mass, volume, density, Divide()); } template <class T>void CellToCellBinaryOp<T>(Field &a, Field &b, Field &c T &f) { for_each(i) f(a[i],b[i],c[i]); } structDivide { void operator()(float &a, float &b, float &c) { c = a / b; } }; Internal Library API Provides This Algorithm Developer Writes This

  13. Custom Functor void CalculateDensity(...) { //... CellToCellBinaryOp(mass, volume, density, MyFunctor()); } template <class T>void CellToCellBinaryOp<T>(Field &a, Field &b, Field &c T &f) { for_each(i) f(a[i],b[i],c[i]); } structMyFunctor { void operator()(float &a, float &b, float &c) { c = a + 2*log(b); } }; Internal Library API Provides This Algorithm Developer Writes These

  14. Data Parallelism Basics

  15. Map with 1 input, 1 output Simplest data-parallel operation. Each result item can be calculated from its corresponding input item alone. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 result 6 14 0 2 8 0 0 8 10 6 2 0 structf { float operator()(float x) { return x*2; } };

  16. Map with 2 inputs, 1 output With two input arrays, the functor takes two inputs. You can also have multiple outputs. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 y 2 4 2 1 8 3 9 5 5 1 2 1 result 5 11 2 2 12 3 9 9 10 4 3 1 structf { float operator()(float a, floatb) { return a+b; } };

  17. Scatter with 1 input (and thus 1 output) Possibly inefficient, risks of race conditions and uninitialized results. (Can also scatter to larger array if desired.) Often used in a scatter_if–type construct. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 indices 2 4 1 5 5 0 4 2 1 2 1 4 result 0 1 3 0 4 No functor

  18. Gather with 1 input (and thus 1 output) Unlike scatter, no risk of uninitialized data or race condition. Plus, parallelization is over a shorter indices array, and caching helps more, so can be more efficient. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 indices 1 9 6 9 3 result 7 3 0 3 1 No functor

  19. Reduction with 1 input (and thus 1 output) Example: max-reduction. Sum is also common. Often a fat-tree-based implementation. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 result 7 structf { float operator()(float a, floatb) { return a>b ? a : b; } };

  20. Inclusive Prefix Sum (a.k.a. Scan)with 1 input/output Value at result[i] is sum of values x[0]..x[i]. Surprisingly efficient parallel implementation. Basis for many more complex algorithms. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 + + + + + + + + + + + result 3 10 10 11 15 15 15 19 24 27 28 28 No functor.

  21. Exclusive Prefix Sum (a.k.a. Scan)with 1 input/output Initialize with zero, value is sum of only up to x[i-1]. May be more commonly used than inclusive scan. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 + + + + + + + + + + + 0 result 0 3 10 10 11 15 15 15 19 24 27 28 No functor.

  22. Writing Algorithms in EAVL

  23. Example: Threshold

  24. Threshold • Keep cell if it meets some criteria, else discard • Criteria: • Pressure > 2 • 10 < temperature < 20 Cells that meet criteria

  25. How to implement threshold • Iterate over cells • If a cell meets the criteria, then place that cell in the output • Output is an unstructured mesh

  26. Example: Thresholding an RGrid (a) • Explicit cells can be combined with structured coordinates. eavlStructuredCellSet eavlExplicitCellSet eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#2 eavlField#0 eavlField#1 eavlField#2

  27. Example: Thresholding an RGrid (b) • A second Cell Set can be added which refers to the first one eavlStructuredCellSet eavlSubset eavlStructuredCellSet eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#2 eavlField#0 eavlField#1 eavlField#2

  28. Starting Mesh We want to threshold a mesh based on its density values (shown here). 43 47 52 63 32 38 42 49 31 37 41 38 0 1 2 3 4 5 6 7 8 9 10 11 density 43 47 52 63 32 38 42 49 31 37 41 38 43 47 52 63 If we threshold 35 < density < 45, we want this result: 32 38 42 49 31 37 41 38

  29. Which Cells to Include? Evaluate a Map operation with this functor: structInRange { float lo, hi; InRange(floatl, floath) :lo(l), hi(h){ } int operator()(float x) { return x>lo && x<hi; } } 1 0 0 0 0 1 1 0 0 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 density 43 47 52 63 32 38 42 49 31 37 41 38 InRange() inrange 1 0 0 0 0 1 1 0 0 1 1 1

  30. How Many Cells in Output? Evaluate a Reduce operation using the Add<> functor. We can use this to create output cell length arrays. 1 0 0 0 0 1 1 0 0 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 inrange 1 0 0 0 0 1 1 0 0 1 1 1 plus result 6

  31. Where Do the Output Cells Go? Input indices Output indices 0 1 2 3 0 4 5 6 7 1 2 8 9 10 11 3 4 5 0 1 2 3 4 5 6 7 8 9 10 11 input cell output cell 0 1 2 3 4 5 How do we create this mapping?

  32. Create Input-to-Output Indexing? Exclusive Scan (exclusive prefix sum) gives us the output index positions. 0 1 2 3 4 5 + + + + + + + + + + + 0 1 2 3 4 5 6 7 8 9 10 11 inrange 1 0 0 0 0 1 1 0 0 1 1 1 0 startidx 0 1 1 1 1 1 2 3 3 3 4 5

  33. Scatter Input Arrays to Output? NO. We can do this, but scatters can be risky/inefficient. Assuming we have multiple arrays to process, we can do something better.... 0 5 6 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 43 47 52 63 32 38 42 49 31 37 41 38 density startidx 0 1 1 1 1 1 2 3 3 3 4 5 Race condition unless we add a mask array! output_density 43 38 42 37 41 38

  34. Create Output-to-Input Indexing? We want to work in the shorter output-length arrays and use gathers. A specialized scatter in EAVL creates this reverse index. 0 5 6 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 startidx 0 1 1 1 1 1 2 3 3 3 4 5 revindex 0 5 6 9 10 11

  35. Gather Input Mesh Arrays to Output? We can now use simple gathers to pull input arrays (density, pressure) into the output mesh. 43 38 42 37 41 38 0 1 2 3 4 5 6 7 8 9 10 11 43 47 52 63 32 38 42 49 31 37 41 38 density revindex 0 5 6 9 10 11 output_density 43 38 42 37 41 38

More Related