The case for methodology research
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

(The Case for) Methodology Research PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

(The Case for) Methodology Research. Indranil Gupta March 7, 2006 CS598IG Fall 2006. Big Picture. Distributed systems with large numbers of processes… Grid, P2P systems, Web, … …require scalable and reliable distributed protocols inside Multicast, Replication, Voting, …

Download Presentation

(The Case for) Methodology Research

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The case for methodology research

(The Case for)Methodology Research

Indranil Gupta

March 7, 2006

CS598IG Fall 2006


Big picture

Big Picture

  • Distributed systems with large numbers of processes…

    • Grid, P2P systems, Web, …

  • …require scalable and reliable distributed protocols inside

    • Multicast, Replication, Voting, …

  • Researchers design protocols to optimize message and time complexity, reliability, process overheads, etc.

  • However, the only assistance for this design comes from research literature and experience. This is a laborious, almost “seat of the pants” approach.

  • Leads to complex system internals, e.g., credit-card systems [Spec03], information systems [CRA], the Grid, the Internet,…

    Efforts to understand existing systems, and design simple, effective systems.


More ailments

More ailments…

  • The research community generates thousands of ideas every month. How many of these are used? Reused? Preserved? When projects finish, papers go into archives. No reuse may lead to reinvention of the wheel.

  • Today, there is minimal reuse of ideas from one research project in another

    • exceptions: use of modified programming languages such as Cyclone (a variant of C) by the Security community

  • This “barrier” is likely because of the inherent requirement that research projects maximize the percentage of “unique” contributions

    • in the above example, the new PL was not known in the security community, hence it worked.


More ailments 2

More Ailments…(2)

  • A different kind of gap is the one between theory and systems.

  • Other fields of science have already developed methodologies

    • Synthesis in hardware design. [Ambrosio et al, Bluespec]

    • Design patterns

    • Methodologies are needed for maturity in a field of science.


How do you attack these problems

How do you Attack these Problems?

Design Methodologies

  • Simple Thesis

    For any "project" or "problem", design a (i) solution, and (ii) a methodology underlying the design for the solution(s), and (iii) (optional) tie this methodology to at least one other methodology.

    • Calls for a new layer of Methodology Research

    • Does not solvethe mentioned problems, but attacks them

    • Is more powerful than meets the eye


Methodology definition

Methodology: Definition

Protocol Design Methodology =

An organized, documented set of building blocks, rules and/or guidelines for design of a class of protocols, possibly amenable to automated code generation.

[adapted from FOLDOC]


Advantages of methodologies

Advantages of Methodologies

Composable Methodologies

  • “Archival” of ideas and results.

  • Systematic reuse of ideas and results.

  • Help designer systematically design new protocols with provable properties.

  • Theoreticians and Practitioners

    • Methodologies are understood by both theoreticians and practitioners. E.g., Classes of survivable storage archs. [Wylie et al] and ones on [Probabilistic I/O automata].

    • “Composability" is a term as familiar to both sides as "fault-tolerance" and "scalability“ (with slightly varying interpretations)

    • Methodologies also allow both theoreticians and practitioners to apply their solutions more “generally” and to exchange ideas in a systematic manner.

Innovative Methodologies


Advantages 2

Advantages (2)

  • Systematic Generalization of an Approach

    • in a sense, a methodology captures the mode of thinking of the designer (without a psychological examination or MRI).

  • Systematic tie-in with existing systems

  • Shorten Life-span of research projects.

  • These advantages are especially evident after a methodology has been discovered

    • “If only I had realized there was an underlying design methodology, I might have designed these protocols much quicker”


Is methodology research new

Is Methodology Research New?

  • No! It’s been going on for decades (see papers in this session). The goal is to recognize this, encourage it and bring it to the surface.

  • How is a methodology different from

    • a design philosophy (e.g., end to end principle or localized algorithms)?

      • a philosophy is more generally applicable and is a frame of mind for the designer. Methodologies have the power to build entire solutions, and already have multiple philosophies inherently embedded in them. Methodologies are closer to building the actual protocols and the system.

    • a protocol family or a framework or a paradigm? it's the same; it encourages the development of these for specific problem areas.


Questions

Questions

  • Should there be an overarching methodology?

    • Probably not, too expansive and raises too much contention. Allow order to emerge.

  • Should there be standard ways to express methodologies ?

    • Not yet. Over time, disparate methodologies may merge. Standardization should be emergent through integration.


Taxonomy of methodologies

Taxonomy of Methodologies

  • Inherent Nature

    • InnovativeMethodologies: create opportunity to create completely novel protocols

    • Composable Methodologies: building blocks and composition rules

    • We will see examples of both of these today

  • Expression

    • Formal Rules (restricted, but rigorous) – either formal rules or a high level code generation PL

    • Informal (guidelines, larger # of interpretations)

  • Discovery of Methodologies

    • Retroactive: for existing systems

    • Progressive: for novel protocols (e.g., innovative M’s)

    • Auxiliary


Methodologies challenges and potentials

Methodologies: Challenges and Potentials

  • Are there systematic protocol design methodologies?

  • Can we automate part of protocol design?

     Marshall McLuhan: “Technology is an extension of our natural facilities”.

     Bill Gates: “Automation of any activity will magnify both its efficiencies and inefficiencies”.


1 innovative m probabilistic protocols

1. Innovative M.: Probabilistic Protocols

  • How does one assist the innovative process of design?

  • Scientific disciplines use differential equations to represent ideas, results, and phenomena

    • Biology, Physics, Chemistry, Electrical Engg., Economics, Sociology..

    • Many phenomena here are scalable and reliable

  • Methodologies to translate differential equations into protocols.

  • Potential to innovate protocols that inherit scale and reliability of original equations.

  • We give rigorousdesign methodologies for this

  • We show how to design practical protocols for real applications


Related work model

Related Work+Model

  • Differential Equations used to study algorithms for independent vertex sets [Worm.95], 3-SAT [Achl.01], load balancing [Mitz.01]

    • Our focus isopposite direction: converting differential equations into distributed protocols

  • Distributed Computing with infinite number of processes, and relation to very large groups: [Kur.81, Mer.00, Mitz.01] – We analyze infinite groups

  • We assume an asynchronous system with no clock drift

  • [FLP85], Randomized protocols [Motwani text], Probabilistic I/O Automata [Lyn.97, Wu97]


A working example

A Working Example

  • Endemic Diseases: e.g., Flu, Measles [in static populations]

    x= fraction of receptives, y=stashers, z=averse

  • translate into Migratory Replication

    • E.g., Persistent Distributed Storage of Files.

      • [R. Anderson] “Where a file once inserted, can never be deleted, even by a gun at your wife’s head”.

    • E.g., Migrating leader committee membership, e.g., for multicast buffering


The case for methodology research

Mapping

  • Differential Eqn.  State Machine

  • Map

  • Each Variable to a state

  • Each Term to an Action

x

y

z

Flipping Action

One-Time-Sampling Action

“Endemic Protocol” for Migratory Replication


The case for methodology research

Analysis

  • System analysis through

  • Phase Portraits

  • Behavior starting from different

  • initial points

  • Differential Equation (hence system)

  • has a trivial and a non-trivial equilibrium

  • point.

  • The trivial point is a saddlepoint.

  • The non-trivial point is a stable point.

Convergence Complexity: typically exponentially fast


In practice untraceability

Performance

In Practice -- Untraceability

Set of stashers changes every 40.6 s (on average)

No long horizontal lines

No vertical stripes

No temporal or hostid-wise correlation of stasher set


In practice effect of failures

Performance

In Practice -- Effect of Failures

Endemic Protocol under Massive Failures: 50% of computers in this

100,000-computer system fail at time t=5000 s.

The file does not disappear.


In practice effect of failures1

Performance

In Practice -- Effect of Failures

Endemic Protocol under Churn: Even under 25% churn

(injected throughout), file does not disappear.


In practice network traffic

Performance

In Practice -- Network Traffic

File Flux Rate (system-wide): Number of transfers of given file

per protocol period. Low at 1-2 per second


A brief second example

A Brief Second Example

  • Lotka-Volterra Model of Competition

    x=#rabbits, y=#sheep

  • “Two species competing for the same resource typically cannot co-exist”.


The case for methodology research

“LV Protocol” for Majority Selection

e.g., “Voting” on good and bad replicas

of a file required in digital libraries

[LOCKSS 03]

All One-Time-Sampling Actions


The case for methodology research

Phase Portrait of the LV Protocol

  • Four equilibrium points

  • X=Y=0: unstable point

  • X=N and Y=N: stable point

  • X=Y=Z(=N/3): saddlepoint

  • Initial points with X<Y converge to Y=N

  • X>Y X=N

  • X=Y X=Y=Z(=N/3)

  • (last disturbed by small perturbations)


A level up

A Level Up

  • Methodology

    f(X) may be

    • Complete: all right hands sum to zeros, and have

    • Completely Partitionable: (a) complete and (b) negative and positive terms are matched

    • Polynomial: all terms polynomials

    • Restricted polynomial: (a) polynomial and (b) for each x in X, each negative term in contains an x product term

e.g.,


Methodology i

Methodology - I

  • Theorem: Flipping and One-Time-Sampling suffice to map a differential equation system that is completely partitionable and restricted polynomial into an equivalent protocol.

    • E.g.,

  • This class includes many interesting and useful processes, e.g., endemic replication, majority selection protocol and epidemic multicast


Methodology ii

Methodology - II

Not Completely Partitionable…

  • Equation Rewriting into equivalent forms

    • To rewrite as complete equations, introduce new variable z and set to

    • Rewrite equation to have

    • Massage Terms to be completely partitionable

      e.g., LV equations:


Other protocols from this methodology

Other Protocols from this Methodology

  • Rabbits and Sheep (LV Model) Voting protocol

    • Distributed digital libraries

  • Bees (D’Silva Model)  Adaptive Grid Computing

    • Grids and clusters

  • Spread of Epidemics and Rumors  Epidemic protocols (retroactive!)

    • Used in Kelips


Summary

Summary

  • Methodologies for mapping Differential Equation systems into equivalent Distributed Protocols

    • Flipping and One-Time-Sampling Actions restricted polynomial equation systems

    • Equation Rewriting

  • Generated Protocols: Endemics for migratory replication, LV protocol for majority selection, epidemic multicast

    • Folklore file system based on endemic protocol

  • Many more details in PODC paper.


Summary 2

Summary (2)


Future ongoing work

Future/Ongoing Work

  • Equation Rewriting techniques, e.g., Is complete == completely partitionable?

  • Mapping Equations with implicit t variable, or no t variable

    • Do methodologies for these other differential equation types make sense?

  • Building file and web caching systems using these protocols


The case for methodology research

Differential equations

Automatic Code Generation

D[x] = 0.3*x^2*z^2 - 0.3*x^2*y^2

D[y] = 0.3*y^2*z^2 - 0.3*x^2*y^2

D[z] = -0.3*x^2*z^2 -0.3*y^2*z^2 +

0.3*x^2*y^2 + 0.3*x^2*y^2

C code over

Berkeley sockets

void schedule_timer_event (int nodeid,

struct pp_payload* payload)

{

int curr_term, to_state, prev_state;

int* curr_state;

float p;

curr_state = get_state();

prev_state = *curr_state;

if (*curr_state != payload->state) return;

curr_term = payload->term;

if (*curr_state == ST_x && curr_term == 0)

{

int num_states;

int *states, *exponents;

num_states = 2;

states = (int*)malloc(num_states*sizeof(int));

exponents = (int*)malloc(num_states

*sizeof(int));

states[0] = ST_y;states[1] = ST_x;

exponents[0] = 1;exponents[1] = 0;

ots (ST_z, 0.5, num_states,

states, exponents);

}

if (*curr_state == ST_y && curr_term == 0)

}

equation

variable

positive

terms

negative

terms

positive

terms

negative

terms

differential equation

constant

match

term

differential equation

variable

exp

variable

exp

equation term

differential system

DIFFGEN Toolkit

schedule_timer_event snippet

fixedclient.c


Translating natural phenomena into distributed protocols

Translating Natural Phenomena into Distributed Protocols

  • Appealing exercise but ridden with potholes

    • Is the phenomenon a good match?

    • Does the distributed system protocol behave exactly as the original phenomenon does?

    • Are there any side-effects because we have PCs and not bees interacting?

  • Design methodologies are a simple answer to these quandarie

    • Derive the protocol from a model of the phenomenon, not from the phenomenon itself

    • Run the “stupid test”: can I design a simpler, more efficient algorithm without using the natural analogy?


2 composable m survivable storage architectures wylie et al

2. Composable M.: Survivable Storage Architectures [Wylie et al]

  • Setting:

    • Federated Storage Architectures: Federated array of Bricks (FAB), HP or Collective Intelligent Bricks (CIB), IBM.

    • Clients make requests to collection of servers (fopen(), fwrite(), fread(), fclose())

  • Need to support different “assumptions” (application- and deployment- dependent) from one piece of software

  • How?

(very very (very) brief)


Wylie et al

Wylie et al

  • Develop a family of protocols, each for a specific system model. Allow the application flexibility of choosing right mix at install-time or run-time

  • Possible models from combinations of:

    • Timing: synchronous or asynchronous

    • Server: crash-stop, omission, crash-recovery, Byzantine, hybrid

    • Client: same choices as above

    • Repair by clients: allow client to repair or not

  • Relevance: synchr  LAN, crash-stop  closely controlled, Byzantine  untrusted environment


Wylie et al1

Wylie et al

  • Evident: reuse of protocols from broad distributed systems literature

    • A lot from theory too!

  • Lacking/needed:

    • How are the protocols (for different system models) composed?

    • What are the building blocks?

    • What composition rules are used?


3 innovative composable m implementing declarative overlays b t loo et al

3. Innovative/Composable M.: Implementing Declarative Overlays [B.-T. Loo et al]


Motivation

Motivation

  • Variety of overlays have been designed

    • Chord, Pastry, Kelips – DHTs

    • Narada, SRM, RMTP, Bimodal Multicast – multicasts

  • Question: can we specify the design of each of these systems (or a class of them) as a declarative language

    • Specify the goalsof the system rather than the lower level implementation

    • P2: declarative logic language for overlay design

      • Prolog-like rules


Working example ourdht

Working Example - OurDHT

  • materialize(succ, 120, infinity, keys(2))

    • Each node maintains table succ, whose tuples retained for 120 s, unbounded size; keys specifies position in tuple of primary key

  • stabilize (X) :- periodic(X,E,3)

    • stabilize is a table that has a row for X if periodic has a row (X,E,3) for some E

    • In reality, stabilize is an event that gets invoked according to the stream periodic, i.e., once every 3 seconds


Ourdht contd

OurDHT (contd.)

  • OurDHT is organized as a (surprise!) logical ring

  • Each object in the p2p system lies somewhere along the ring

  • [email protected](R,K,S,SI,E) :- [email protected](NI,N), [email protected](NI,K,R,E),[email protected](NI,S,SI), K in (N,S]

    • returns a succesful lookup result if the received lookup seeks a key K is found between the receiving node's identifier and that of its successor


Ourdht contd1

OurDHT (contd.)

  • [email protected](SI,NI) :- stabilize @NI(NI,_), [email protected](NI,_,SI)

    • a node asks its successors (all if there are multiple successors) to send it their own successors, whenever the stabilize event is issued at that node

  • [email protected](PI,S,SI) :- [email protected](NI,PI), [email protected](NI,S,SI)

    • installs the returned successor at the original node


That s it

That’s it!

  • We’ve specified, using 5 rules, a ring-based DHT

  • The P2 paper goes on to specify the entire Chord protocol in 47 rules! (compare to the 1000’s of lines of code that would need to be “hand-written”)

    • Performance of P2-generated Chord is comparable to hand-coded Chord

  • The P2 paper also specifies the Narada multicast protocol in a mere 16 rules!


Pros and cons

Pros and Cons

 Ease of Protocol Specification: A protocol designer no longer has to write a C/C++/Java program several thousand lines long to design a new system. Design is a matter of writing only a few rules.

 Formal Verification: Any such declarative design can potentially be run through specially-built verification engines that find bugs in the design, or better still, analyze the scalability and fault-tolerance of the protocol.

 On-line distributed debugging: Execution history can be exported as a set of relational tables, distributed debugging of a deployed distributed system can be achieved by writing the appropriate P2 rules.


Pros and cons1

Pros and Cons

  • Breadth: The same language P2 can be used to design other p2p overlays beyond Chord (e.g., the Narada overlay) - this makes possible quantitative comparisons among these systems that are much more believable than mere simulation-based comparisons. In addition, hybrid designs can be explored.

     Yet another language

  • Learning Curve

  • When will all the  get done (if ever)?

  • What about optimizations – will P2-generated code create room for discovering as many optimizations as hand-coded Chord?


Other methodology languages

Other Methodology Languages

  • RAML=metarouting framework for routing [Maltz et al, SIGCOMM 04]

    • Useful for designing protocols such as BGP, etc.


Back to the big picture

Back to the Big Picture

  • Goal: create new resources for the protocol designer

    • beyond research literature and experience

      Approach:

  • Methodologies: E.g., [Innovative] To translate differential equation systems into equivalent protocols. [Composable] to reuse protocol design (P2).

  • Automation: E.g., DiffGen toolkit (PODC 2004 poster) that takes as input diff. eqns (Mathematica format), and spews out ready-to-deploy code.

[Distributed Protocols Research Group, UIUC] http://www-faculty.cs.uiuc.edu/~indy/rsrch.htm


The case for methodology research

(to be continued)


  • Login