1 / 25

Farhana Aleen, Nate Clark Georgia Institute of Technology Modified by Michelle Goodstein

Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. Farhana Aleen, Nate Clark Georgia Institute of Technology Modified by Michelle Goodstein LBA Reading Group 6/4/09. Motivation. Extracting performance from multi-core is hard.

walker
Download Presentation

Farhana Aleen, Nate Clark Georgia Institute of Technology Modified by Michelle Goodstein

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture Farhana Aleen, Nate Clark Georgia Institute of Technology Modified by Michelle Goodstein LBA Reading Group 6/4/09

  2. Motivation Extracting performance from multi-core is hard I need to write parallel program Automatic compiler-based parallelization helps 2

  3. Source Of Parallelism: Commutativity sum(5) sum(10) 15 = sum (10) sum(5) 15 = Application Application foo(a) foo(b) foo(b) foo(a) output output =

  4. Execute the function in two different orders Check equivalence of memory Existing Approach Of Detecting Commutativity sum(x) sum(y) x+y = sum(y) sum(x) y+x =

  5. Opportunities Missed By Existing Approach Insertion of elements in to Hash-set (vector <linked-list>) 2 2 6 insert(2) insert(6) 6 6 2 insert(6) insert(2)

  6. The Idea 2 2 6 2 remove(6) Yes! insert(2) insert(6) is_member(2) 6 2 6 remove(6) 2 Yes! insert(6) insert(2) is_member(2) class hash_set{ vector<linked_list> set; insert(); remove(); is_member(); } • Identical memory does not matter • Final output matters

  7. Our Approach: Step 1 Symbolically execute in two different orders Check for the identical memory layout M I1 M I2 insert() insert() M1 M2 I2 I1 insert() insert() ? M1,2 M2,1 == If not similar, check reader functions

  8. Step 2: Checking Reader Functions M2,1 M2,1 I M1,2 M1,2 I I I is_member() is_member() remove() remove() M’1,2 M’2,1 M”1,2 M”2,1 == == insert() Candidate function Readers of candidate function’s output is_member() remove() Readers of readers’ output … … … …

  9. Pros/Cons Of Our Approach Pros- Identifies more commutativity Finds more parallelism Cons- More equivalence checking

  10. Equivalence Checking Options Random Testing X X Random Interpretation X Speed Symbolic Execution X Accuracy

  11. Random Interpretation: Example Input(x,y) x=2 y=3 x 2 y 3 a=x+y • Choose random values for input variables a 5 x 3 • Execute taken branch of the condition • Execute fall-through branch • Replicate initial memory state • Adjust values if(x!=y) y 3 a 6 fall-through taken b=a b=2x w=3 x x • Affine join of v1 and v2w.r.t. weight w • w(v1,v2)w v1 + (1-w)v2 3 2 y 3 y 3 a 6 a 5 b b 4 6 assert(b=2x) x 5 y 3 a 8 b 10

  12. Random Interpretation In Equivalence Checking Initial memory Initial memory foo(x) foo(y) foo(y) foo(x) Modified memory

  13. Why Random Interpretation Works Avoids scalability problem Affine join superposes all execution paths Linear relationships same before and after the join The error probability is very low: at most Decreases the error probability exponentially

  14. (Added Slide) Probability details • Low error probability: • In general, at most 1 bad random value / join in program • Prob(error) = (# joins )/264 • Empiricially (prior work): # of joins increases linearly in # of program statements • Coefficient of .5 to 5.2 • Assume 1000 statement function, commutative • Prob(error)  (5.2 * 1000) / 264  2.8 * 10-16 • To decrease error, increase # of runs

  15. Experimental Methodology • Trimaran compiler • Scheduled them • Infinite issue machine • Perfect memory system • Pointer Analysis • Stack and heap sensitive • Tested on • SPECint2000 • MediaBench

  16. (Added) Experimental Methodology • In some ways, an “upper bound” on commutativity • Can issue as many instructions as are commutative • Memory is perfect • Not a true upper bound tho • Random interpretation will sometimes fail/give up

  17. (Added) Suggested Parallelism • Suppose a sorting algorithm will print to stderr if debug flag is set • Cannot be parallelized, b/c of dependences between writes • Human can differentiate • Compiler identifies things that are almost parallel, • Human states that the semantic changes (e.g., printf orders) do not matter parallel • Otherwise, ignore

  18. Analysis Time: Commutativity Analysis

  19. % Functions Commutative

  20. Parallelism Uncovered

  21. Summary Commutativity a significant source of parallelism Identical memory does not matter for identifying commutative functions Our technique: 13% more commutative functions detected 28% more parallelism uncovered

  22. Thank you

  23. % Functions Commutative

  24. Parallelism Uncovered

  25. Analysis Time: Commutativity Analysis

More Related