1 / 73

Bigger, Better, Faster, More

Bigger, Better, Faster, More. An Introduction to Super-Scalability. But first…. The Arms Race. 1 ENIAC. 1 Teletype. 1 Mainframe. N Terminals. N Servers. N Terminals. N Servers. N PCs. N Web Servers. N Browsers. N Web Servers. N AJAX Apps. N Clusters. N AJAX Apps. N Clusters.

jovita
Download Presentation

Bigger, Better, Faster, More

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bigger, Better, Faster, More An Introduction to Super-Scalability

  2. But first… The Arms Race

  3. 1 ENIAC 1 Teletype

  4. 1 Mainframe N Terminals

  5. N Servers N Terminals

  6. N Servers N PCs

  7. N Web Servers N Browsers

  8. N Web Servers N AJAX Apps

  9. N Clusters N AJAX Apps

  10. N Clusters N*M Phones

  11. N Cloudlets N*M Phones

  12. And So On…

  13. What is Scalability?

  14. Scalability = Ability to do More

  15. More What?

  16. More Processing

  17. Processing Takes Resources

  18. Types of Resources Network CPU Disk Memory

  19. Types of Utilization Time / Throughput Space / Capacity

  20. Types of Utilization Time / Throughput Space / Capacity Complexity Locking

  21. Resources & Utilization

  22. We Want More! (but how to scale?)

  23. How to Scale Just make it bigger (vertical scaling)

  24. We Want Even More! (super-scalability)

  25. Scaling Strategies

  26. Bigger (Space) Not Super Partitioning Sharding / Hashing Growth = Add Partition Tradeoff: Splitting Partitions Tradeoff: Redundancy becomes a distribution problem • One big data store • One big memory store • Make it bigger • Make it redundant • E.g. Full activity logging A B C …

  27. Better (Complexity) Not Super Distribution Chop up problem / workload Map/Reduce Tradeoff: coordination Tradeoff: network • Number of objects increase • As relations increase, add time or space requirements • Common with graph problems • E.g. PageRank

  28. Faster (Time) Not Super Optimization As fast as possible Can’t scale as fast as growth Specialization – ONE thing Caching - Reduces work in trade for space Tradeoff: space Tradeoff: coordination • Tune your code • Tune your database • Tune your network • Better hardware

  29. More (Locking) Not Super Parallelizing / Estimating Separate reads & writes Non-locking estimation Reduce contention Tradeoff: space Tradeoff: coordination • One at a time • Serialized access

  30. But Which Technique(s)?

  31. It Depends!

  32. All: Divide & Conquer • Partitions: Data & Processing • Sharding • Worker Processes • Coordination: Distribution & Ordering • Queues & Managers • Separate Read/Write Access • What does this make the system look like?

  33. And now… Some Theory

  34. ACID: reliable transaction systems • Atomicity – all or nothing • Consistency – always correct • Isolation – changesets executed independently • Durability – once committed, stays so Really hard to scale in one big block (although SSDs + RAM helps!)

  35. Maybe It’s Not so Important? (it depends)

  36. BASE is easier • Basically Available • Soft State • Eventual Consistency • A node will either eventually get a change or retire • Well…still need conflict resolution BASE is NOT ACID (get it?)

  37. Can we have a Balanced pH?

  38. CAP Theorem • Choose TWO: • Consistency • Availability • Partition tolerance Manager Double Outage! Double Outage! Replica 1 Replica 2 Client 1 Client 2

  39. Designing a scalable system

  40. It Depends!

  41. Understand Your Scale Points • Log • Profile • Tune • Test • Divide • Compare • Partition • No, really, log a lot

  42. Fallacies of Distributed Computing • The network is reliable. • Latency is zero. • Bandwidth is infinite. • The network is secure. • Topology doesn't change. • There is one administrator. • Transport cost is zero. • The network is homogeneous.

  43. Some “Scaly” Tools

  44. CQRS Pattern • Separate operations for: • Command – perform an action • Query – returns data about state • Promotes simpler programs • Allows Command Queues • Reduces locking

  45. A Scaly Stack

  46. Infrastructure as a Service

  47. Platform as a Service

  48. Application as a Service • Salesforce? • (Also sort of a platform) • Whateva!

  49. Cassandra An Example

More Related