1 / 36

Our Perspective

Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi. Our Perspective. Inktomi builds two distributed systems: Global Search Engines Distributed Web Caches Based on scalable cluster & parallel computing technology

rey
Download Presentation

Our Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Invariant BoundariesDr. Eric A. BrewerProfessor, UC BerkeleyCo-Founder & Chief Scientist, Inktomi

  2. Our Perspective • Inktomi builds two distributed systems: • Global Search Engines • Distributed Web Caches • Based on scalable cluster & parallel computing technology • But very little use of classic DS research...

  3. “Distributed Systems” don’t work... • There exist working DS: • Simple protocols: DNS, WWW • Inktomi search, Content Delivery Networks • Napster, Verisign, AOL • But these are not classic DS: • Not distributed objects • No RPC • No modularity • Complex ones are single owner (except phones)

  4. Concept: Invariant Boundaries • Claim: we don’t understand boundaries • Solution: make “Invariant Boundary” explictit • Goal: simpler, easier, faster to correctness Invariantmay not hold InvariantHolds

  5. Three Basic Issues • Where is the state? • Consistency vs. Availability • Communication Boundaries

  6. Santa Clara Cluster • Very uniform • No monitors • No people • No cables • Working power • Working A/C • Working BW

  7. Boundaries for Kinds of State • Stateless • Front ends • Immutable state • Soft State (rebuild on restart) • Durable Single-writer (e.g. user data) • Emissaries, horizontal partitioning • Durable MW • Fiefdoms, DBMS

  8. Persistent State is HARD • Classic DS focus on the computation, not the data • this is WRONG, computation is the easy part • Data centers exist for a reason • can’t have consistency or availability without them • Other locations are for caching only: • proxies, basestations, set-top boxes, desktops • phones, PDAs, … • Distributed systems can’t ignore location • Invariant Boundary is small

  9. Workstations & PCs AP AP Active Proxy: Bootstraps thin devices into infrastructure, runs mobile code Berkeley Ninja Architecture Base: Scalable, highly-available platform for persistent-state services Internet PDAs (e.g. IBM Workpad) Cellphones, Pagers, etc.

  10. Workstations & PCs AP AP Active Proxy: Bootstraps thin devices into infrastructure, runs mobile code Berkeley Ninja Architecture Base: Scalable, highly-available platform for persistent-state services Consistent Shared Single user Internet Soft-state, immutable state PDAs (e.g. IBM Workpad) Cellphones, Pagers, etc.

  11. Three Basic Issues • Where is the state? • Consistency vs. Availability • Communication Boundaries

  12. Data is only consistent *inside* Data is not consistent(“reference” data) Consistent A

  13. Data is only consistent *inside* Data is not consistent(“reference” data) Consistent B Consistent A

  14. Consistency Availability Tolerance to networkPartitions The CAP Theorem Theorem: You can have at most two of these invariants for any shared-data system

  15. Consistency Availability Tolerance to networkPartitions The CAP Theorem Theorem: You can have at most two of these invariants for any shared-data system Corollary: consistency boundary must choose A or P

  16. Consistency Availability Tolerance to networkPartitions Forfeit Partitions Examples • Single-site databases • Cluster databases • LDAP • Fiefdoms Traits • 2-phase commit • cache validation protocols • The “inside”

  17. Consistency Availability Tolerance to networkPartitions Forfeit Availability Examples • Distributed databases • Distributed locking • Majority protocols Traits • Pessimistic locking • Make minority partitions unavailable

  18. Consistency Availability Tolerance to networkPartitions Forfeit Consistency Examples • Coda • Web cachinge • DNS • Emissaries Traits • expirations/leases • conflict resolution • Optimistic • The “outside”

  19. ACID Strong consistency Isolation Focus on “commit” Nested transactions Availability? Conservative (pessimistic) Difficult evolution(e.g. schema) “small” Invariant Boundary The “inside” BASE Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) “Simpler” and faster Easier evolution (XML) “wide” Invariant Boundary Outside consistency boundary ACID vs. BASE but it’s a spectrum

  20. Consistency Boundary Summary • Can have consistency & availability within a cluster. No partitions within boundary! • OS/Networking better at A than C • Databases better at C than A • Wide-area databases can’t have both • Disconnected clients can’t have both

  21. Three Basic Issues • Where is the state? • Consistency vs. Availability • Communication Boundaries

  22. The Boundary • The interface between two modules • client/server, peers, libaries, etc… • Basic boundary = the procedure call • thread traverses the boundary • two sides are in the same address space • What invariants don’t hold across? C S

  23. Different Address Spaces • What if the two sides are NOT in the same address space? • IPC or LRPC • Can’t do pass-by-reference (pointers) • Most IPC screws this up: pass by value-result • There are TWO copies of args not one • What if they share some memory? • Can pass pointers, but… • Need synchronization between client/server • Not all pointers can be passed

  24. Partial Failure • Can the two sides fail independently? • RPC, IPC, LRPC • Can’t be transparent (like RPC) !! • New exceptions (other side gone) • Idempotent calls? • Use Transaction Ids (to solve replay problem) • Reclaim local resources • e.g. kernels leak sockets over time => reboot • RPC tries to hide these issues (but fails) • Use Level 4/7 switches to hide failures?

  25. Resource Allocation • How to reclaim resources allocated for client? • Usually timeout exception cleans up • Release locks? (must track them!) • How to avoid long delays while holding resources? • How long to remember client? • Delayed responses (past timeout) must be ignored • Problem with leases: • Great for servers, but… • Client’s lease may expire mid operation • Hard to make client updates atomic with multiple leases (2PC?) • Which things have leases? (can be hidden)

  26. Trust the other side? • What if we don’t trust the other side? • Or partial trust (legal contract), or malicious? • Have to check args, no pointer passing • Limited release of information (leaks) • Kernels get this right: • copy/check args • use opaque references (e.g. File Descriptors) • Most systems do not • TCP, Napster, web browsers • Security boundaries tend to be explicit • Holes come from services!

  27. Multiplexing clients? • Does the server have to: • Deal with high concurrency? • Say “no” sometimes (graceful degradation) • Treat clients equally (fairness) • Bill for resources (and have audit trail) • Isolate clients performance, data, …. • These all affect the boundary definition

  28. Boundary evolution? • Can the two sides be updated independently? (NO) • The DLL problem... • Boundaries need versions • Negotiation protocol for upgrade? • Promises of backward compatibility? • Affects naming too (version number)

  29. Example: protocols vs. APIs • Protocols have been more successful than APIs • Some reasons: • protocols are pass by value • protocols designed for partial failure • not trying to look like local procedure calls • explicit state machine, rather than call/return(this exposes exceptions well) • Protocols still not good at trust, billing, evolution

  30. Example: XML • XML doesn’t solve any of these issues • It is RPC with an extensible type system • It makes evolution better? • two sides need to agree on schema • can ignore stuff you don’t understand • Must agree on meaning, not just tags • Can mislead us to ignore/postpone the real issues

  31. Example: services • Claim: you can’t magically convert a class to a service • Behavior depends on the boundaries that callers cross…. • Trusted? Multiplexed? Partial failure? Namespaces? • Shouldn’t TRY to be transparent • Instead: make it easier to state boundary assumptions (and check them)

  32. Annotated Interfaces • IDL can annotate interfaces: • “Timeout” => client may not respond • “Trusted” => no malicious clients • “Malicious’ => client may be hostile • “Multiplexed” => many simultaneous callers • “Idempotent” • Etc. • Could be checked at statically in some cases, dynamically in others • Annotations + Boundaries => fewer bugs

  33. Lessons for Applications • Make boundaries very explicit • Not just client/server • Independent systems • Third-party software • Third-party services (RPC to vendors/partners) • Have a few big modules… • Otherwise too many boundaries, and no invariants • Examples: Apache, Oracle, Inktomi search engine • Big Modules are well supported • Big Modules justify their cost (Lampson)

  34. Partial checklist • What is shared? (namespace, schema?) • What kind of state in each boundary? • How would you evolve an API? • Lifetime of references? Expiration impact? • Graceful degradation as modules go down? • External persistent names? • Consistency semantics and boundary?

  35. Conclusions • Most systems are fragile • Root causes: • False transparency: assuming locality, trust, privacy… • Implicitlychanging boundaries: consistency, partial failure, … • Some of the causes: • focus on computation, not data • ignoring location distinctions • overestimating consistency boundary • degraded boundaries (RPC is not a PC) • Invariant Boundaries • Help understanding, documentation • Simplify detection • Simpler and easier than full specifications (but weaker)

  36. ACID vs. BASE • DBMS research is about ACID (mostly) • But we forfeit “C” and “I” for availability, graceful degradation, and performance This tradeoff is fundamental. BASE: • Basically Available • Soft-state • Eventual consistency

More Related