- By
**maude** - Follow User

- 105 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Local Computations in Large-Scale Networks ' - maude

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Veracity Radius – Capturing the Locality of Distributed Computations

### Scalable Load-Distance Balancing

Material

- I. Keidar and A. Schuster: “Want Scalable Computing? Speculate!”SIGACT News Sep 2006.http://www.ee.technion.ac.il/people/idish/ftp/speculate.pdf
- Y. Birk, I. Keidar, L. Liss, A. Schuster, and R. Wolff: “Veracity Radius - Capturing the Locality of Distributed Computations”. PODC\'06.http://www.ee.technion.ac.il/people/idish/ftp/veracity_radius.pdf
- Y. Birk, I. Keidar, L. Liss, and A. Schuster: “Efficient Dynamic Aggregation”. DISC\'06. http://www.ee.technion.ac.il/people/idish/ftp/eff_dyn_agg.pdf
- E. Bortnikov, I. Cidon and I. Keidar: “Scalable Load-Distance Balancing in Large Networks”. DISC’07. http://www.ee.technion.ac.il/people/idish/ftp/LD-Balancing.pdf

Brave New Distributed Systems

- Large-scale

Thousands of nodes and more ..

- Dynamic

… coming and going at will ...

- Computations

… while actually computing something together.

This is the new part.

Today’s Huge Dist. Systems

- Wireless sensor networks
- Thousands of nodes, tens of thousands coming soon
- P2P systems
- Reporting millions online (eMule)
- Computation grids
- Harnessing thousands of machines (Condor)
- Publish-subscribe (pub-sub) infrastructures
- Sending lots of stock data to lots of traders

Not Computing Together Yet

- Wireless sensor networks
- Typically disseminate information to central location
- P2P & pub-sub systems
- Simple file sharing, content distribution
- Topology does not adapt to global considerations
- Offline optimizations (e.g., clustering)
- Computation grids
- “Embarrassingly parallel” computations

Emerging Dist. Systems – Examples

- Autonomous sensor networks
- Computations inside the network, e.g., detecting trouble
- Wireless mesh network (WMN) management
- Topology control
- Assignment of users to gateways
- Adapting p2p overlays based on global considerations
- Data grids (information retrieval)

Autonomous Sensor Networks

The data center is too hot!

Let’s turn on the sprinklers (need to backup first)

Let’s all reduce power

Autonomous Sensor Networks

- Complex autonomous decision making
- Detection of over-heating in data-centers
- Disaster alerts during earthquakes
- Biological habitat monitoring
- Collaboratively computing functions
- Does the number of sensors reporting a problem exceed a threshold?
- Are the gaps between temperature reads too large?

Wireless Mesh Networks

- Infrastructure (unlike MANET)
- City-wide coverage
- Supports wireless devices
- Connections to Mesh and out to the Internet
- “The last mile”
- Cheap
- Commodity wireless routers (hot spots)
- Few Internet connections

Decisions, Decisions

- Assigning users to gateways
- QoS for real-time media applications
- Network distance is important
- So is load
- Topology control
- Which links to set up out of many “radio link” options
- Which nodes connect to Internet (act as gateways)
- Adapt to varying load

Centralized Solutions Don’t Cut It

- Load
- Communication costs
- Delays
- Fault-tolerance

Classical Dist. Solutions Don’t Cut It

- Global agreement / synchronization before any output
- Repeated invocations to continuously adapt to changes
- High latency, high load
- By the time synchronization is done, the input may have changed … the result is irrelevant
- Frequent changes -> computation based on inconsistent snapshot of system state
- Synchronizing invocations initiated at multiple locations typically relies on a common sequencer (leader)
- difficult and costly to maintain

Locality to the Rescue!

L

- Nodes make local decisions based on communication (or synchronization) with some proximate nodes, rather than the entire network
- Infinitely scalable
- Fast, low overhead, low power, …

The Locality Hype

- Locality plays a crucial role in real life large scale distributed systems

John Kubiatowicz et.al, on global storage:

“In a system as large as OceanStore,

locality is of extreme importance…

C. Intanagonwiwat et.al, on sensor networks:

“An important feature of directed

diffusion is that … are determined

by localized interactions...”

N. Harvey et.al, on scalable DHTs:

“The basic philosophy of SkipNet

is to enable systems to preserve

useful content and path locality…”

What is Locality?

- Worst case view
- O(1) in problem size [Naor & Stockmeyer,1993]
- Less than the graph diameter [Linial, 1992]
- Often applicable only to simplistic problems or approximations
- Average case view
- Requires an a priori distribution of the inputs

To be continued…

Interesting Problems Have Inherently Global Instances

- WMN gateway assignment: arbitrarily high load near one gateway
- Need to offload as far as the end of the network
- Percentage of nodes whose input exceeds threshold in sensor networks: near-tie situation
- All “votes” need to be counted

Fortunately, they don’t happen too often

Speculation is the Key to Locality

- We want solutions to be “as local as possible”
- WMN gateway assignment example:
- Fast decision and quiescence under even load
- Computation time and communication adaptive to distance to which we need to offload
- A node cannot locally know whether the problem instance is local
- Load may be at other end of the network
- Can speculate that it is (optimism )

Computations are Never “Done”

- Speculative output may be over-ruled
- Good for ever-changing inputs
- Sensor readings, user loads, …
- Computing ever-changing outputs
- User never knows if output will change
- due to bad speculation or unreflected input change
- Reflecting changes faster is better
- If input changes cease, output will eventually be correct
- With speculation same as without

Summary: Prerequisites for Speculation

- Global synchronization is prohibitive
- Many instances amenable to local solutions
- Eventual correctness acceptable
- No meaningful notion of a “correct answer” at every point in time
- When the system stabilizes for “long enough”, the output should converge to the correct one

The Challenge: Find aMeaningful Notion for Locality

- Many real world problems are trivially global in the worst case
- Yet, practical algorithms have been shown to be local most of the time !
- The challenge: find a theoretical metric that captures this empirical behavior

Reminder: Naïve Locality Definitions

- Worst case view
- Often applicable only to simplistic problems or approximations
- Average case view
- Requires an a priori distribution of the inputs

Instance-Locality

- Formal instance-based locality:
- Local fault mending [Kutten,Peleg95, Kutten,Patt-Shamir97]
- Growth-restricted graphs [Kuhn, Moscibroda, Wattenhofer05]
- MST [Elkin04]
- Empirical locality: voting in sensor networks
- Although some instances require global computation, most can stabilize (and become quiescent) locally
- In small neighborhood, independent of graph size
- [Wolff,Schuster03, Liss,Birk,Wolf,Schuster04]

“Per-Instance” Optimality Too Strong

- Instance: assignment of inputs to nodes
- For a given instance I, algorithm AIdoes:
- if (my input is as in I) output f(I)else send message with input to neighbor
- Upon receiving message, flood it
- Upon collecting info from the whole graph, output f(I)
- Convergence and output stabilization in zero time on I
- Can you beat that?

Need to measure optimality per-class notper-instance

Challenge: capture attainable locality

Local Complexity [BKLSW’06]

- Let
- G be a family of graphs
- P be a problem on G
- M be a performance measure
- Classification CG of inputs to P on a graph G into classes C
- For class of inputs C, MLB(C) be a lower bound for computing P on all inputs in C
- Locality: GGCCGIC : MA(I) const MLB(C)

- A lower bound on a single instance is meaningless!

The Trick is in The Classification

- Classification based on parameters
- Peak load in WMN
- Proximity to threshold in “voting”
- Independent of system size
- Practical solutions show clear relation between these parameters and costs
- Parameters not always easy to pinpoint
- Harder in more general problems
- Like “general aggregation function”

Yitzhak Birk, Idit Keidar, Liran Liss, Assaf Schuster, and Ran Wolf

Dynamic Aggregation

- Continuous monitoring of aggregate value over changing inputs
- Examples:
- More than 10% of sensors report of seismic activity
- Maximum temperature in data center
- Average load in computation grid

The Setting

- Large graph (e.g., sensor network)
- Direct communication only between neighbors
- Each node has a changing input
- Inputs change more frequently than topology
- Consider topology as static
- Aggregate function f on multiplicity of inputs
- Oblivious to locations
- Aggregate result computed at all nodes

Goals for Dynamic Aggregation

- Fast convergence
- If from some time t onward inputs do not change …
- Output stabilization time from t
- Quiescence time from t
- Note: nodes do not know when stabilization and quiescence are achieved
- If after stabilization input changes abruptly…
- Efficient communication
- Zero communication when there are zero changes
- Small changes little communication

Standard Aggregation Solution: Spanning Tree

20 black, 12 white

Global communication!

black!

7 black, 1 white

black!

2 black

1 black

The Bad News

- Virtually every aggregation function has instances that cannot be computed without communicating with the whole graph
- E.g., majority voting when close to the threshold “every vote counts”
- Worst case analysis: convergence, quiescence times are (diameter)

Local Aggregation – Intuition

- Example – Majority Voting:
- Consider a partition in which every set has the same aggregate result (e.g., >50% of the votes are for ‘1’)
- Obviously, this result is also the global one!

51%

73%

98%

57%

84%

91%

88%

93%

76%

59%

80%

Veracity Radius (VR) for One-Shot Aggregation [BKLSW,PODC’06]

- Roughly speaking: the min radius r0 such that"r> r0: all r-neighborhoods have same result
- Example: majority

Radius 1:

wrong result

Radius 2:

correct result

VR=2

Introducing Slack

- Examine “neighborhood-like” environments that:
- (1) include an a(r)-neighborhood for some a(r)<r
- (2) are included in an r-neighborhood
- Example: a(r)=max{r-1,r/2}

r = 2:

wrong result

Global result:

VRa=3

only b’s

I’

only a’s

n1 a’s

v

r-1

n1 a’s

v

r-1

n2 b’s

n2 b’s

VR Yields a Class-Based Lower Bound- VR for both input assignments is r
- Node v cannot distinguish between I and I’ in fewer than r steps
- Lower bound of r on both output stabilization and quiescence
- Trivially tight bound for output stabilization

Veracity Radius Captures the Locality of One-Shot Aggregation [BKLSW,PODC’06]

- I-LEAG (Instance-Local Efficient Aggregation on Graphs)
- Quiescence and output stabilization proportional to VR
- Per-class within a factor of optimal
- Local: depends on VR, not graph size!
- Note: nodes do not know VR or when stabilization and quiescence are achieved
- Can’t expect to know you’re “done” in dynamic aggregation…

Local Partition Hierarchy

- Topology static
- Input changes more frequently
- Build structure to assist aggregation
- Once per topology change
- Spanning tree, but with locality properties

Level 0 edge:

Level 1 edge:

Level 2 edge:

Level 0 pivot:

Level 1 pivot:

Level 2 pivot:

Minimal Slack LPH for Mesheswith a(r)=max(r-1,r/2)The I-LEAG Algorithm

- Phases correspond to LPH levels
- Communication occurs within a cluster only if there are nodes with conflicting outputs
- All of the cluster’s nodes hold the same output when the phase completes
- All clusters’ neighbors know the cluster’s output
- Conflicts are detected without communication
- I-LEAG reaches quiescence once the last conflict is detected

I-LEAG’s Operation(Majority Voting)

- Legend:

Input:

Output:

Message:

Tree edge:

!

Conflict:

Initialization:

Node’s output is its input

Startup: Communication AmongTree Neighbors

- Legend:

Input:

Output:

Message:

Tree edge:

!

Conflict:

Recall neighbor values

will be used in all phases

Phase 0 Conflict Resolution

Updates sent by clusters that had conflicts

- Legend:

Input:

Output:

Message:

Tree edge:

!

Conflict:

Phase 1 Conflict Resolution

Updates sent by clusters that had conflicts

- Legend:

Input:

Output:

Message:

Tree edge:

!

Conflict:

Phase 2 Conflict Detection

Using information sent at phase 0

- Legend:

Input:

Output:

Message:

Tree edge:

!

Conflict:

No Communication

Phase 2 Conflict Resolution

This region has been idle since phase 0

- Legend:

Input:

Output:

Message:

Tree edge:

!

Conflict:

No conflicts found,

no need for resolution

Simulation Study

- VR also explains the locality of previous algorithms

Naïve Dynamic Aggregation

- Periodically,
- Each node samples input, initiates I-LEAG
- Each instance I of I-LEAG takes O(VR(I)) time, but sends (|V|) messages
- Sends messages even when no input changes
- Costly in sensor networks
- To save messages, must compromise freshness of result

Dynamic Aggregation at Two Timescales

- Efficient multi-shot aggregation algorithm (MultI-LEAG)
- Converges to correct result before sampling the inputs again
- Sampling time may be proportional to graph size
- Efficient dynamic aggregation algorithm (DynI-LEAG)
- Sampling time is independent of graph size
- Algorithm tracks global result as close as possible

Dynamic Lower Bound

- Previous sample (instance) also plays a role
- Example (majority voting):
- Multi-shot lower bound:max{VRprev,VR}
- On quiescence and output stabilization
- Assumes sending zero messages when there are zero changes

I2 (0 changes)

I1 (VR2)

!

?

I3 (VR=0)

Dynamic Aggregation: Take II

- Initially, run local one-shot algorithm A
- Store distance information travels in this instance, dist
- Let D = A’s worst-case convergence time
- Every D time, run a new iteration (MULTI-A)
- If input did not change, do nothing
- If input changed, run full information protocol up to dist
- If new instance’s VR isn’t reached, invoke A anew
- Update dist

(~VR)

- (~ VRprev)

(~VR)

- Matches max{VRprev,VR} lower bound
- within same factor as A

A is for I-LEAG

- I-LEAG uses a pre-computed partition hierarchy
- LPH: Local Partition Hierarchy – cluster sizes bounded both from above and from below (doubling sizes)
- Spanning tree in each cluster, rooted at pivot
- Computed once per topology
- I-LEAG phases correspond to LPH levels
- Active phase: full-information from cluster pivot
- Phase result communicated to cluster and its neighbors
- Phase active only if there is a conflict in the previous level
- Conflicts detected without new communication

Multi-LEAG

- The Veracity Level (VL) of node v is the highest LPH level in which v’s cluster has a conflict (VL<logVR+1)
- A multi-LEAG iteration’s phases correspond to LPH levels:
- Phase level < VL: propagate changes (if any) to pivot
- active only if there are changes
- Phase level VL: fall back to I-LEAG
- active only if new VR is larger than previous
- Cache partial aggregate results in pivot nodes
- allows conflict detection between active and passive clusters

MultI-LEAG Operation

- Case I: No changes

… no conflicts

… no conflicts

… no changes to report

All is quiet…

Abrupt Change Flips Outcome

Clusters at VL recalculate, others forward up

MultI-LEAG Observations

- O(max{VRprev,VR}) output stabilization and quiescence
- Message efficient:
- Communication only in clusters with changes, only when radius < max{VRprev,VR}
- Sampling time is O(Diameter)
- Good for cheap periodic aggregation
- Can we do closer monitoring?

Dynamic Aggregation Take III: DynI-LEAG

- Sample inputs every O(1) link delays
- Close monitoring, rapidly converges to correct result
- Run multiple MultI-LEAG iterations concurrently
- Challenges:
- Pipelining phases with different (doubling) durations
- Intricate interaction among concurrent instancesE.g., which phase 4 updates are used in a given phase 5 ..
- Avoiding state explosion for multiple concurrent instances

Ruler Pipelining

- Partial iterations, fewer in every level
- Changes only communicated once

Full iteration

Sampling interval

Phase 2

Partial iteration

Phase 1

Phase 0

t

- Memory usage: O(log(Diameter))

VL and Output Estimation

- Problem: correct output and VL of an iteration is guaranteed only after O(Diameter) time
- cannot wait that long…
- Solution: choose iteration with highest VL according to most recent information
- Use this VL for new iterations and its output as MultI-LEAG’s current output estimation
- Eventual convergence and correctness guaranteed

DynI-LEAG Operation

The influence of a conflict is proportional to its level

Phase below VL

Phase above VL

2

1

0

t

“Previous VL” = 2

Dynamic Aggregation: Conclusions

- Local operation is possible
- in dynamic systems
- that solve inherently global problems
- MultI-LEAG delivers periodic correct snapshots at minimal cost
- DynI-LEAG responds immediately to input changes with a slightly higher message rate

Edward Bortnikov, Israel Cidon, Idit Keidar

Load-Distance Balancing

- Two sources of service delay
- Network delay (depends on distance to server)
- Congestion delay (depends on server load)
- Total = Network + Congestion
- Input
- Network distances and congestion functions
- Optimization goal
- Minimize the maximum total delay
- NP-complete, 2-approximation exists

Distributed Setting

- Synchronous
- Distributed assignment computation
- Initially, users report location to the closest servers
- Servers communicate and compute the assignment
- Requirements:
- Eventual quiescence
- Eventual stability of assignment
- Constant approximation of the optimal cost (parameter)

Impact of Locality

- Extreme global solution
- Collect all data and compute assignment centrally
- Guarantees optimal cost
- Excessive communication/network latency
- Extreme local solution
- Nearest-Server assignment
- No communication
- No approximation guarantee (can’t handle crowds)
- No “one-size-fits-all”?

Workload-Sensitive Locality

- The cost function is distance-sensitive
- Most assignments go to the near servers
- … except for dissipating congestion peaks
- Key to distributed solution
- Start from the Nearest-Server assignment
- Load-balance congestion among near servers
- Communication locality is workload-sensitive
- Go as far as needed …
- … to achieve the required approximation

Skewed Load

Load-Balance

Tree Clustering

- As long as some cluster has improvable cost
- Double it (merge with hierarchy neighbor)
- Clusters aligned at 2i indices
- Simple, O(log N) convergence time

Ripple Clustering

- Adaptivemerging
- Better cost in practice
- As long as some cluster is improvable
- Merge with smaller-cost neighbors
- Conflicts possible
- A B C
- A B C
- Random tie-breaking to resolve
- Many race conditions (we love it -)

Download Presentation

Connecting to Server..