(The Case for) Methodology Research. Indranil Gupta March 7, 2006 CS598IG Fall 2006. Big Picture. Distributed systems with large numbers of processes… Grid, P2P systems, Web, … …require scalable and reliable distributed protocols inside Multicast, Replication, Voting, …
March 7, 2006
CS598IG Fall 2006
Efforts to understand existing systems, and design simple, effective systems.
For any "project" or "problem", design a (i) solution, and (ii) a methodology underlying the design for the solution(s), and (iii) (optional) tie this methodology to at least one other methodology.
Protocol Design Methodology =
An organized, documented set of building blocks, rules and/or guidelines for design of a class of protocols, possibly amenable to automated code generation.
[adapted from FOLDOC]
Marshall McLuhan: “Technology is an extension of our natural facilities”.
Bill Gates: “Automation of any activity will magnify both its efficiencies and inefficiencies”.
x= fraction of receptives, y=stashers, z=averse
“Endemic Protocol” for Migratory Replication
Convergence Complexity: typically exponentially fast
Set of stashers changes every 40.6 s (on average)
No long horizontal lines
No vertical stripes
No temporal or hostid-wise correlation of stasher set
Endemic Protocol under Massive Failures: 50% of computers in this
100,000-computer system fail at time t=5000 s.
The file does not disappear.
Endemic Protocol under Churn: Even under 25% churn
(injected throughout), file does not disappear.
File Flux Rate (system-wide): Number of transfers of given file
per protocol period. Low at 1-2 per second
e.g., “Voting” on good and bad replicas
of a file required in digital libraries
All One-Time-Sampling Actions
f(X) may be
Not Completely Partitionable…
e.g., LV equations:
Automatic Code Generation
D[x] = 0.3*x^2*z^2 - 0.3*x^2*y^2
D[y] = 0.3*y^2*z^2 - 0.3*x^2*y^2
D[z] = -0.3*x^2*z^2 -0.3*y^2*z^2 +
0.3*x^2*y^2 + 0.3*x^2*y^2
C code over
void schedule_timer_event (int nodeid,
struct pp_payload* payload)
int curr_term, to_state, prev_state;
curr_state = get_state();
prev_state = *curr_state;
if (*curr_state != payload->state) return;
curr_term = payload->term;
if (*curr_state == ST_x && curr_term == 0)
int *states, *exponents;
num_states = 2;
states = (int*)malloc(num_states*sizeof(int));
exponents = (int*)malloc(num_states
states = ST_y;states = ST_x;
exponents = 1;exponents = 0;
ots (ST_z, 0.5, num_states,
if (*curr_state == ST_y && curr_term == 0)
(very very (very) brief)
Ease of Protocol Specification: A protocol designer no longer has to write a C/C++/Java program several thousand lines long to design a new system. Design is a matter of writing only a few rules.
Formal Verification: Any such declarative design can potentially be run through specially-built verification engines that find bugs in the design, or better still, analyze the scalability and fault-tolerance of the protocol.
On-line distributed debugging: Execution history can be exported as a set of relational tables, distributed debugging of a deployed distributed system can be achieved by writing the appropriate P2 rules.
Yet another language
[Distributed Protocols Research Group, UIUC] http://www-faculty.cs.uiuc.edu/~indy/rsrch.htm
(to be continued) Overlays [B.-T. Loo et al]