1 / 68

Data Representation Synthesis PLDI’2011 * , ESOP’12, PLDI’12 * CACM’12

Data Representation Synthesis PLDI’2011 * , ESOP’12, PLDI’12 * CACM’12. Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, DARPA Martin Rinard, MIT Mooly Sagiv, TAU. * Best Paper Award. http://theory.stanford.edu/~hawkinsp/. Background.

barton
Download Presentation

Data Representation Synthesis PLDI’2011 * , ESOP’12, PLDI’12 * CACM’12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Representation SynthesisPLDI’2011*, ESOP’12, PLDI’12*CACM’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, DARPA Martin Rinard, MIT Mooly Sagiv, TAU * Best Paper Award http://theory.stanford.edu/~hawkinsp/

  2. Background • High level formalisms for static program analysis • Circular attribute grammars • Horn clauses • Interprocedural Analysis • Context free reachability • Implemented in SLAM/SDV • Shape Analysis • Low level pointer data structures

  3. Composing Data Structures filesystems filesystem=1 filesystem=2 s_list s_list s_files s_files file=14 file=7 f_list f_list f_fs_list f_fs_list file=6 file=5 f_list f_list f_fs_list f_fs_list file=2 f_list f_fs_list

  4. Problem: Multiple Indexes +Concurency Access Patterns filesystems filesystem=1 filesystem=2 s_list s_list s_files s_files • Find all mounted filesystems • Find cached files on each filesystem • Iterate over all used or unused cached files in Least-Recently-Used order file=14 file=7 f_list f_list file_in_use f_fs_list f_fs_list file=6 file=5 f_list f_list file_unused f_fs_list f_fs_list file=2 f_list f_fs_list

  5. Disadvantages of linked shared data structures • Error prone • Hard to change • Performance may depend on the machine and workload • Hard to reason about correctness • Concurrency makes it harder • Lock granularity • Aliasing

  6. Our thesis • Very high level programs • No pointers and shared data structures • Easier programming • Simpler reasoning • Machine independent • The compiler generates pointers and multiple concurrent shared data structures • Performance comparable to manually written code

  7. Our Approach • Program with “database” • States are tables • Uniform relational operations • Hide data structures from the program • Functional dependencies express program invariants • The compiler generates low level shared pointer data structures with concurrent operations • Correct by construction • The programmer can tune efficiency • Autotuning for a given workload

  8. Conceptual Programming Model insert query … insert … remove … query … insert … remove shared database

  9. Relational Specification • Program states as relations • Columns correspond to properties • Functional dependencies define global invariants

  10. The High Level Idea Decomposition RelScala query <inuse:T> {fs, file} Compiler Concurrent Compositions ofData Structures, Atomic Transactions Scala List * query(FS* fs, File* file) { lock(fs) ; for (q= file_in_use; …) ….

  11. Filesystem • Three columns {fs, file, inuse} • fs:int  file:int inuse:Bool • Functional dependencies • {fs, file} { inuse}

  12. Filesystem (operations) • query <inuse:T> {fs, file }= • [<fs:2, file:7>, <fs:1, file:6>]

  13. Filesystem (operations) • insert <fs:1, file:15> <inuse:T>

  14. Filesystem (operations) • remove <fs:1>

  15. Directed Graph Data Structure • Three columns {src, dst, weight} • src  dst  weight • Functional dependencies • {src, dst} { weight} • Operations • query <src:1> {dst, weight} • query <dst:5> {src, weight}

  16. Plan • Compiling into sequential code (PLDI’11) • Adding concurrency (PLDI’12)

  17. Mapping Relations into Low Level Data Structures • Many mappings exist • How to combine several existing data structures • Support sharing • Maintain the relational abstraction • Reasonable performance • Parametric mappings of relations into shared combination of data structures • Guaranteed correctness

  18. The RelC Compiler Relational Specification • fsfileinuse • {fs, file}  {inuse} • foreach <fs, file, inuse> filesystemss.t. fs= 5 • do … inuse fs list array C++ list list Graph decomposition file fs, file RelC inuse

  19. Decomposing Relations • Represents subrelations using container data structures • A directed acyclic graph(DAG) • Each node is a sub-relation • The root represents the whole relation • Edges map columns into the remaining sub-relations • Shared node=shared representation

  20. Decomposing Relations into Functions Currying • fsfileinuse • {fs, file}  {inuse} • fsfileinuse group-by {fs} group-by {inuse} inuse FS  (FILEINUSE) fs INUSE  FS  FILE • fs file • fileinuse fs, file • group_by {fs, file} file group-by {file} FILEINUSE FS  FILE INUSE • inuse FS  (FILEINUSE) INUSE  (FS  FILE INUSE)

  21. Filesystem Example {fs, file, inuse} inuse fs fs:1 fs:2 file fs, file inuse file:14 file:2 file:5 file:7 file:6

  22. Memory Decomposition(Left) inuse fs fs:1 fs:2 file fs, file inuse file:14 file:2 file:5 file:7 file:6 inuse:F inuse:T Inuse:T inuse:F Inuse:F

  23. Filesystem Example inuse fs inuse:T inuse:F file fs, file inuse • {fs, file} { inuse} fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2

  24. Memory Decomposition(Right) inuse fs inuse:T inuse:F file fs, file inuse • {fs, file} { inuse} fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:T inuse:F Inuse:F Inuse:F

  25. Decomposition Instance inuse fs inuse:F inuse:T fs:1 fs:2 file fs, file fs  file inuse {fs, file} { inuse} inuse file:5 file:14 file:6 file:2 file:7 fs:1 file:6 fs:1 file:2 fs:2 file:7 fs:2 file:5 fs:1 file:14 inuse:F inuse:T inuse:F Inuse:T Inuse:F

  26. Decomposition Instance inuse fs list array inuse:F inuse:T fs:1 fs:2 list list file fs, file s_list fs  file inuse fs  file inuse {fs, file} { inuse} {fs, file} { inuse} inuse file:5 file:14 file:6 file:2 file:7 fs:1 file:6 fs:1 file:2 fs:2 file:7 fs:2 file:5 fs:1 file:14 f_list f_fs_list f_fs_list f_fs_list inuse:F inuse:T inuse:F Inuse:T Inuse:F f_list f_list

  27. Decomposing Relations Formally(PLDI’11) • fsfileinuse • {fs, file}  {inuse} x inuse fs let w: {fs, file,inuse}  {inuse} = {inuse} in let y : {fs}  {file, inuse} = {file} list {w} in let z : {inuse}  {fs, file, inuse} = {fs,file} list {w} in let x: {}  {fs, file, inuse} = {fs} clist{y}  {inuse} array{z} y z clist array file fs, file list w list inuse

  28. Memory State filesystems filesystem=1 filesystem=2 inuse fs list s_list s_list array s_files s_files file=5 file=7 list file=14 list file fs, file f_list f_list f_list f_fs_list f_fs_list f_fs_list fs  file inuse {fs, file} { inuse} inuse file=6 f_list f_fs_list file_in_use file=2 f_list file_unused f_fs_list

  29. Adequacy Adequacy Not every decomposition is a good representation of a relation A decomposition is adequate if it can represent every possible relation matching a relational specification enforces sufficient conditions for adequacy

  30. Adequacy of Decompositions • All columns are represented • Nodes are consistent with functional dependencies • Columns bound to paths leading to a common node must functionally determine each other

  31. Respect Functional Dependencies file,fs  {file, fs}  {inuse} inuse

  32. Adequacy and Sharing inuse fs fs, file file Columns bound on a path to an object x must functionally determine columns bound on any other path to x  {fs, file}{inuse, fs, file} inuse

  33. Adequacy and Sharing inuse fs fs file Columns bound on a path to an object x must functionally determine columns bound on any other path to x  {fs, file}  {inuse, fs} inuse

  34. The RelC Compiler PLDI’11 ReLC inuse fs list array list list file fs, file Compiler inuse Sequential Compositions ofData Structures C++

  35. Query Plans foreach <fs, file, inuse> filesystems if inuse=T do … scan inuse fs list array list fs, file list file scan inuse Cost proportional to the number of files

  36. Query Plans foreach <fs, file, inuse> filesystems if inuse=T do … access inuse fs list array fs, file list list file scan inuse Cost proportional to the number of files in use

  37. Removal and graph cuts remove <fs:1> filesystems fs:2 fs list s_list array s_files list file:7 list file f_list inuse:T f_fs_list inuse file:5 f_list inuse:F f_fs_list

  38. Abstraction Theorem • If the programmer obeys the relational specification and the decomposition is adequate and if the individual containers are correct • Then the generated low-level code maintains the relational abstraction remove <fs:1> relation relation   low level code remove <fs:1> low-level state low-level state

  39. Autotuner • Given a fixed set of primitive types • list, circular list, doubly-linked list, array, map, … • A workload • Exhaustively enumerate all the adequate decompositions up to certain size • The compiler can automatically pick the best performing representation for the workload

  40. Directed Graph Example (DFS) • Columns srcdst weight • Functional Dependencies • {src, dst}  {weight} • Primitive data types • map, list dst dst src src src dst map map map map map map … list list dst list src list src list list src dst dst weight weight weight weight weight

  41. Synthesizing Concurrent Programs PLDI’12

  42. Multiple ADTs Invariant: Every element that added to eden is either in eden or in longterm public void put(K k, V v) {       if (this.eden.size() >= size) {this.longterm.putAll(this.eden);this.eden.clear();        }this.eden.put(k, v);}

  43. OOPSLA’11 Shacham • Search for all public domain collection operations methods with at least two operations • Used simple static analysis to extract composed operations • Two or more API calls • Extracted 112 composed operations from 55 applications • Apache Tomcat, Cassandra, MyFaces – Trinidad, … • Check Linearizability of all public domain composed operations

  44. Motivation: OOPSLA’11 Shacham 15%Open Non Linearizable 47%Linearizable 38%Non Linearizable

  45. Relational Specification • Program states as relations • Columns correspond to properties • Functional dependencies define global invariants

  46. The High Level Idea Concurrent Decomposition RelScala query <inuse:T> {fs, file} ConcurrentHashMap Compiler HashMap Concurrent Compositions ofData Structures, Atomic Transactions Scala List * query(FS* fs, File* file) { lock(…) for (q= file_in_use; …) ….

  47. Two-Phase Locking Attach a lock to each piece of data • Well-locked: To perform a read or write, a thread must hold the corresponding lock • Two-phase: All lock acquisitions must precede all lock releases Two phase locking protocol: Theorem [Eswaran et al., 1976]: Well-locked, two-phase transactions are serializable

  48. Two Phase Locking Decomposition Decomposition Instance Attach a lock to every edge Two Phase Locking  Serialiazability We’re done! Problem 1: Can’t attach locks to container entries Problem 2: Too many locks Butler Lampson/David J. Wheeler: “Any problem in computer science can be solved with another level of indirection.”

  49. Two Phase Locking Decomposition Decomposition Instance Attach a lock to every edge Two Phase Locking  Serialiazability We’re done! Problem 1: Can’t attach locks to container entries Problem 2: Too many locks

  50. Lock Placements Decomposition Decomposition Instance 1. Attach locks to nodes

More Related