Horizontal Scaling and Coordination. Jeff Chase Duke University. Growth and scale. The Internet. How to handle all those client requests raining on your server?. Servers Under Stress. saturation. Ideal. Response time. Response rate (throughput). Overload Thrashing Collapse.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
How to handle all those client requests raining on your server?
Response rate (throughput)
Request arrival rate (offered load)
Load (concurrent requests, or arrival rate)
Add servers or “bricks” for scale and robustness.
Issues: state storage, server selection, request routing, etc.
SQL query API
SQL: Structured Query Language
Caches can help if much of the workload is simple reads.
SQL query API
Multi-core server scaling, MxN communication, replacement, consistency
x: log rank
y: log share of accesses
y: log $$$
It turns out this matters.
With Zipf power-law popularity distributions, the best possible (ideal) hit rate of a cache is logarithmic in its size.
…and logarithmic in the population served.
The hit rate also depends on how frequently objects are updated at their source.
Intuition. The “head” (most popular objects) is cached easily. After that: diminishing benefits. The “tail” is effectively random.
Approximates a sum over a universe of n objects...
...of the probability of access to each object x...
…times the probability x was accessed since its last change.
C is just a normalizing constant for the Zipf-like popularity distribution, which must sum to 1. C is not to be confused with CN.
0 < α< 1
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the WWW. Karger, Lehman, Leighton, Panigrahy, Levine, Lewin. ACM STOC, 1997. 1000+ citations
Distributed hash table
node IP address
[image from Morris, Stoica, Shenker, etc.]
Each P proposes a value to the others.
All nonfaulty P agree on a value in a bounded time.
Coulouris and Dollimore
A network partition is any event that blocks all message traffic between subsets of nodes.
C-A-P choose two
CA: available, and consistent, unless there is a partition.
CP: always consistent, even in a partition, but a reachable replica may deny service if it is unable to agree with the others (e.g., quorum).
AP: a reachable replica provides service even in a partition, but may be inconsistent.
Butler Lampson is a Technical Fellow at Microsoft Corporation and an Adjunct Professor at MIT…..He was one of the designers of the SDS 940 time-sharing system, the Alto personal distributed computing system, the Xerox 9700 laser printer, two-phase commit protocols, the Autonet LAN, the SPKI system for network security, the Microsoft Tablet PC software, the Microsoft Palladium high-assurance stack, and several programming languages. He received the ACM Software Systems Award in 1984 for his work on the Alto, the IEEE Computer Pioneer award in 1996 and von Neumann Medal in 2001, the Turing Award in 1992, and the NAE’s Draper Prize in 2004.
Wait for majority
Wait for majority
“Can I lead b?”
Nodes may compete to serve as leader, and may interrupt one another’s rounds. It can take many rounds to reach consensus.
Similar: Hadoop HDFS