1 / 53

Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

Rupesh Nasre. Indian Institute of Science, India. Jointly with: Dr. Kaushik Rajan, Prof. R. Govindarajan, Prof. Uday P. Khedker. Dec 14, 2009. Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter. Outline. Introduction and Motivation. Bloom filter.

gates
Download Presentation

Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rupesh Nasre. Indian Institute of Science, India. Jointly with: Dr. Kaushik Rajan, Prof. R. Govindarajan, Prof. Uday P. Khedker. Dec 14, 2009. Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

  2. Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis.

  3. What is Pointer Analysis? Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer. a = &x; b = a; if (b == *p) { ... } else { ... } a points to x. We deal with C/C++. a and b are aliases. Alias analysis versus points-to analysis.

  4. Normalized input. • address-of assignment: a = &x • copy assignment: a = b • load assignment: a = *p • store assignment: *p = a Our analysis isflow-insensitive and context-sensitive.

  5. Context sensitivity. caller1() { caller2() { fun(int *ptr) { fun(&x); fun(&y); a = ptr; } } } intra-procedural: {(a, x), (a, y), (a, z), ...}. context-insensitive: {(a, x), (a, y)}. context-sensitive: {(a, x)} along main-...-caller1-fun, {(a, y)} along main-...-caller2-fun.

  6. Context sensitivity. main() { f(a) { g(z) { S1: f(&x); S3: g(a); ... S2: f(&y); S4: g(b); ... } } } • Storage requirement increases exponentially. Along S1-S3-S5-S7, a points to {x1, x3, x5, x7}. Along S1-S3-S5-S8, a points to {x1, x3, x5, x8}. Along S1-S3-S6-S7, a points to {x1, x3, x6, x7}. Along S1-S3-S6-S8, a points to {x1, x3, x6, x8}. Along S1-S4-S5-S7, a points to {x1, x4, x5, x7}. Along S1-S4-S5-S8, a points to {x1, x4, x5, x8}. Along S1-S4-S6-S7, a points to {x1, x4, x6, x7}. Along S1-S4-S6-S8, a points to {x1, x4, x6, x8}. Along S2... Exponential blow-up of contexts. main S2 S1 f f S3 S4 S3 S4 g g g g Invocation graph.

  7. Tackling scalability issues. How about exploiting commonality across contexts to store points-to information? e.g. BDDs (Berndl et al, PLDI 2003, Whaley et al, PLDI 2004). How about not storing complete contexts? e.g. k-cfa approach (O. Shivers, PhD Thesis, CMU, 1991), one level flow (M. Das, PLDI 2000). Can we have a probabilistic data structure that approximates the storage? Can we control the false-positive rate?

  8. Observation. Points-to information is sparse. Average dereference size <<< number of address-taken variables. A few millions A few tens

  9. Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis. This is the first work using bloom filter for program analysis.

  10. Bloom Filter. A bloom filter is a probabilistic data structure for membership queries, and is typically implemented as a fixed-sized array of bits. Storing elements e1, e2, e3. e1, e3 e2 1 1 hash1 1 1 hash2 e1, e2 e3

  11. Bloom Filter. N = size of bloom filter in bits, n = number of elements added, h = number of hash functions, P = false positive rate, P = e.g. for N=1,000,000, n=10,000, h=4, P=6.5%. for h=8, P=0.4%. Gives probabilistic guarantee on precision loss. h (1/2) (1 - nh/N)

  12. Storing points-to information in bloom filter. points-to information: {(p, a), (a, x), (a, y), (q, a)}. hash(p, a) = 2 hash(a, x) = 0 hash(a, y) = 2 hash(q, a) = 5 Does p point to a? What all variables does p point to? 1 2 3 4 5 6 7 8 9 0 1 1 1 (a, x) (p, a) (a, y) (q, a)

  13. Processing input. address-of assignment: a = &x copy assignment: a = b load assignment: a = *p store assignment: *p = a ?? ?? ??

  14. Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis.

  15. A 2-D structure. hash1 1 p hash2 1 hash1 1 1 q hash2 1 1 hash1 r hash2 ...

  16. A multi-dimensional structure. contexts 1 1 1 p 1 1 1 1 1 1 1 1 1 hash functions 1 1 q 1 1 pointers 1 1 1 1 1 1 1 1 r pointees We call it multi-dimensional bloom filter or simply multi-bloom.

  17. Accessing multibloom. bloom[v] Earlier: hash(pointer, context, object) multiblom[p][c][x] Now: hash(pointer) hash(context) hash(object)

  18. Multibloom. Multibloom is a 5-tuple: <P, C, H, B, M> P = number of entries for pointers. C = number of entries for contexts. H = number of hash functions. B = bit-vector size for each hash function. M = number of entries for multi-level pointers. You can play around with parameters keeping the total size of the bloom filter under control, with a probabilistic guarantee over precision loss.

  19. Handling copy statement (a = b). c = hash(context); for each hash function i { for each bucket j { source = mb[b][c][i][j; destination = mb[a][c][i][j]; destination = destination bitwise-or source; } }

  20. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 p2 p3 1 1 p4 1 1 1 p5

  21. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. p1 p2 p3 p4 p5

  22. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 p2 p3 p4 p5

  23. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 p3 p4 p5

  24. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 p4 p5

  25. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 p4 1 p5

  26. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 1 p4 1 p5

  27. Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 1 p4 1 1 1 p5

  28. Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis.

  29. Performance. • Benchmarks: SPEC 2000 C/C++, httpd, sendmail. • Framework: LLVM. • Platform: Intel Xeon, 2GHz clock, 4MB L2, 3GB RAM. • Precision: NoAlias percentage. NoAlias % is the percentage of queries that return NoAlias for all pairs of pointers in each function of the program. It need not be 100% for an exact analysis.

  30. Multibloom. Multibloom is a 5-tuple: <P, C, H, B, M> P = number of entries for pointers. C = number of entries for contexts. H = number of hash functions. B = bit-vector size for each hash function. M = number of entries for multi-level pointers.

  31. Multibloom parameters. Contexts (C) 1 1 1 p 1 1 1 1 1 1 1 hash functions (H) 1 1 1 1 q 1 1 Pointers (P) 1 1 1 1 1 1 1 1 r Pointees (B) Another dimension M for multi-level pointers.

  32. Performance (vortex) (C-H-B) = (4-4-10) = (number of entries for contexts, number of hash functions, size of the bit-vector per hash function)

  33. Experimental evaluation: Time(s).

  34. Experimental evaluation: Memory (KB).

  35. Experimental evaluation: Precision (NoAlias %). User has control over the trade-off between memory, analysis time and precision.

  36. Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis.

  37. Client Analysis Mod/Ref analysis. Output: NoModRef, Ref, Mod, ModRef. Precision: NoModRef %.

  38. Experimental evaluation: Mod/Ref.

  39. Related work. • L. O. Andersen, Program analysis and specialization for the C programming language, PhD Thesis, DIKU, 1994. • B. Steensgaard, Points-to Analysis in Almost Linear Time, POPL 1996. • J. Whaley and M. S. Lam, Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, PLDI 2004. • B. Hardekopf and C. Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007. • V. Kahlon, Bootstrapping: a technique for scalable flow and context-sensitive pointer alias analysis, PLDI 2008.

  40. Take away. By using a multi-dimensional bloom filter, one can trade off precision, memory and time of an analysis to suit his needs, with a probabilistic guarantee on precision loss.

  41. Rupesh Nasre. Indian Institute of Science, India. nasre@csa.iisc.ernet.in. Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

  42. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. h1 p1 h2 h1 p2 h2 h1 p3 h2

  43. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 h1 p1 h2 h1 Iteration 1. p2 h2 h1 p3 h2

  44. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 h1 p1 h2 h1 Iteration 1. p2 h2 h1 p3 h2

  45. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 h1 Iteration 1. p2 h2 h1 p3 h2

  46. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 h1 Iteration 1. p2 1 h2 h1 p3 h2

  47. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 h1 Iteration 1. p2 1 h2 1 h1 p3 h2 1

  48. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 h1 Iteration 2. p2 1 h2 1 1 h1 p3 h2 1 1

  49. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 1 h1 Iteration 2. p2 1 1 h2 1 1 h1 p3 h2 1 1

  50. Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 1 h1 Iteration 2. p2 1 1 h2 1 1 h1 p3 h2 1 1

More Related