1 / 71

Bloom : Big Systems, Small Programs

Bloom : Big Systems, Small Programs. Neil Conway UC Berkeley. Distributed Computing. Programming Languages. Data prefetching. Register allocation. Loop unrolling. Function inlining. Optimization. Global coordination, waiting. Caching, indexing. Replication, data locality.

decima
Download Presentation

Bloom : Big Systems, Small Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bloom:Big Systems,SmallPrograms Neil Conway UC Berkeley

  2. Distributed Computing

  3. Programming Languages

  4. Data prefetching Register allocation Loop unrolling Function inlining Optimization Global coordination, waiting Caching, indexing Replication, data locality Partitioning, load balancing

  5. Undeclared variables Type mismatches Sign conversion mistakes Warnings Replica divergence Inconsistent state Deadlocks Race conditions

  6. Stack traces gdb Log files, printf Debugging Full stack visualization, analytics Consistent global snapshots Provenance analysis

  7. Developer productivityis amajor unsolved problemin distributed computing.

  8. We can do better! … provided we’re willing to make changes.

  9. Design Principles

  10. Centralized Computing • Predictable latency • No partial failure • Single clock • Global event order

  11. Taking Order For Granted Global event order

  12. Distributed Computing • Unpredictable latency • Partial failures • No global event order

  13. Alternative #1: Enforce global event order at all nodes (“Strong Consistency”)

  14. Alternative #1: • Enforce global event order at all nodes • (“Strong Consistency”) • Problems: • Availability (CAP) • Latency

  15. Alternative #2: Ensure correct behaviorfor anynetwork order(“Weak Consistency”)

  16. Alternative #2: Ensure correct behaviorfor anynetwork order(“Weak Consistency”) Problem: With traditional languages, this is very difficult.

  17. The “ACID 2.0” Pattern Associativity: X ○ (Y ○ Z) = (X ○ Y) ○ Z “batch tolerance” Commutativity: X ○ Y = Y ○ X “reordering tolerance” Idempotence: X ○ X = X “retry tolerance”

  18. “When I see patterns in my programs, I consider it a sign of trouble … [they are a sign] that I'm using abstractions that aren't powerful enough.” —Paul Graham

  19. Bounded Join Semilattices A triple hS,t,?isuch that: • S is a set • t is a binary operator (“least upper bound”) • Induces a partial order on S: x·Sy if xty = y • Associative, Commutative, and Idempotent • 8x2S: ?t x= x

  20. Bounded Join Semilattices Lattices are objects that grow over time. An interfacewith anACID 2.0 merge() method • Associative • Commutative • Idempotent

  21. Time Set(Merge = Union) Increasing Int(Merge = Max) Boolean(Merge = Or)

  22. CRDTs: Convergent Replicated Data Types • e.g., registers, counters, sets, graphs, trees Implementations: • Statebox • Knockbox • riak_dt

  23. Lattices represent disorderly data. How can we represent disorderly computation?

  24. f: ST is a monotone functioniff: 8a,b2S : a·Sb) f(a)·Tf(b)

  25. Time Monotone function:set  increase-int Monotone function:increase-intboolean size() >= 3 Set(Merge = Union) Increasing Int(Merge = Max) Boolean(Merge = Or)

  26. Consistency As Logical Monotonicity

  27. Case Study

  28. Questions • Will cart replicas eventually converge? • “Eventual Consistency” • What will client observe on checkout? • Goal: checkout reflects all session activity • To achieve #1 and #2, how much ordering is required?

  29. Design #1: Mutable State Add(item x, count c): if kvs[x] exists: old = kvs[x] kvs.delete(x) if old > c kvs[x] = old – c Remove(item x, count c): if kvs[x] exists: old = kvs[x] kvs.delete(x) else old = 0 kvs[x] = old + c Non-monotonic!

  30. Non-monotonic! Conclusion: Every operation might require coordination!

  31. Design #2: “Disorderly” Add(item x, count c): Remove(item x, count c): Add x,c to del_log Checkout(): Add x,c to add_log Group add_log byitem ID; sum counts. Group del_log byitem ID; sum counts. For each item, subtract deletes from adds. Non-monotonic!

  32. Monotonic Conclusion: Replication is safe; might need tocoordinate on checkout

  33. Takeaways • Avoid: mutable state updatePrefer: immutable data, monotone growth • Major difference in coordination cost! • Coordinate once per operation vs.Coordinate once per checkout • We’d like a type system for monotonicity

  34. Language Design

  35. Disorderly Programming • Order-independent: default • Order dependencies: explicit • Order as part of the design process • Tool support • Where is order needed? Why?

  36. The Disorderly Spectrum ASM Java Haskell Bloom C Lisp, ML SQL,LINQ,Datalog High-level “Declarative” Powerful optimizers

  37. Processes that communicate viaasynchronousmessagepassing Bloom ≈ declarative agents Each process has a local database Logical rulesdescribecomputationandcommunication(“SQL++”)

  38. Each agent has a database of valuesthat changes over time. All values have a locationand timestamp.

  39. If RHS is true (SELECT …) ThenLHS is true (INTO lhs) left-hand-side <= right-hand-side Whenand whereis the LHS true?

  40. Temporal Operators • Same location,same timestamp • Same location,next timestamp • Different location,non-deterministic timestamp <= Computation <+<- PersistenceDeletion <~ Communication

  41. Observe Compute Act

More Related