Seven o clock a new distributed gvt algorithm using network atomic operations
Download
1 / 31

Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations. David Bauer, Garrett Yaun Christopher Carothers Computer Science. Murat Yuksel Shivkumar Kalyanaraman ECSE. Global Virtual Time. Defines a lower bound on any unprocessed event in the system.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations' - talib


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Seven o clock a new distributed gvt algorithm using network atomic operations

Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations

David Bauer, Garrett Yaun

Christopher Carothers

Computer Science

Murat Yuksel

Shivkumar Kalyanaraman

ECSE


Global virtual time
Global Virtual Time Network Atomic Operations

Defines a lower bound on

any unprocessed event in the

system.

Defines the point

beyond which events should

not be reclaimed.

  • Imperative that GVT computation operate as efficiently as possible.


Key problems

Simultaneous Reporting Problem Network Atomic Operations

Transient Message Problem

Key Problems

arises “because not all processors will report their local minimum at precisely the same instant in wall-clock time”.

message is delayed in the network and neither the sender nor the receiver consider that message in their respective GVT calculation.

Asynchronous Solution: create a synchronization, or “cut”, across the distributed simulation that divides events into two categories: past and future.

Consistent Cut:a cut where there is no message scheduled in the future of the sending processor, but received in the past of the destination processor.


Mattern s gvt algorithm
Mattern’s GVT Algorithm Network Atomic Operations

Construct cut via message-passing

Cost: O(log n) if tree, O(N) if ring

  • If large number of processors, then free pool exhausted waiting for GVT to complete


Fujimoto s gvt algorithm
Fujimoto’s GVT Algorithm Network Atomic Operations

Construct cut using shared memory flag

Cost: O(1)

Sequentially consistent memory model ensures proper causal order

  • Limited to shared memory architecture


Memory model
Memory Model Network Atomic Operations

Sequentially consistent does not mean instantaneous

Memory events are only guaranteed to be causally ordered

Is there a method to achieve sequentially consistent shared memory in a loosely coordinated, distributed environment?


Gvt algorithm differences
GVT Algorithm Differences Network Atomic Operations

*cost of algorithm much higher


Network atomic operations
Network Atomic Operations Network Atomic Operations

Goal: each processor observes the “start” of the GVT computation at the same instance of wall clock time

Definition: An NAO is an agreed upon frequency in wall clock time at which some event is logically observed to have happened across a distributed system.


Network atomic operations1

Update Tables Network Atomic Operations

Update Tables

Update Tables

Update Tables

Update Tables

Update Tables

Update Tables

Compute GVT

Compute GVT

Compute GVT

Compute GVT

Compute GVT

Compute GVT

Compute GVT

wall-clock time

wall-clock time

Network Atomic Operations

Definition: An NAO is an agreed upon frequency in wall clock time at which some event is logically observed to have happened across a distributed system.

Goal: each processor observes the “start” of the GVT computation at the same instance of wall clock time

possible operations provided by a complete sequentially consistent memory model


Clock synchronization
Clock Synchronization Network Atomic Operations

  • Assumption: all processors share a highly accurate, common view of wall clock time.

  • Basic building block: CPU timestamp counter

    • Measures time in terms of clock cycles, so a gigahertz CPU clock has granularity of 109 secs

    • Sending events across network is much larger granularity depending on tech: ~106 secs on 1000base/T


Clock synchronization1
Clock Synchronization Network Atomic Operations

  • Issues: clock synchronization, drift and jitter

  • Ostrovsky and Patt-Shamir:

    • provably optimal clock synchronization

    • clocks have drift and the message latency may be unbounded

  • Well researched problem in distributed computing – we used simplified approach

    • simplified approach helpful in determining if system working properly


Max send d t
Max Send Network Atomic OperationsDt

  • Definition: max_send_delta_t is maximum of

    • worst case bound on the time to send an event through the network

    • twice synchronization error

    • twice max clock drift over simulation time

  • add a small amount of time to the NAO expiration

    • Similar to sequentially consistent memory

  • Overcomes:

    • Transient message problem, clock drift/jitter and clock synchronization error


Max send d t clock drift
Max Send Network Atomic OperationsDt: clock drift

  • Clock drift causes CPU clocks to become unsynchronized

    • Long running simulations may require multiple synchs

    • Or, we account for it in the NAO

  • Max Send Dt overcomes clock drift by ensuring no event “falls between the cracks”


Max send d t1

D Network Atomic Operationsmax Dmax

GVT

tmax

LP2

LP1

tmax

Dmax Dmax

GVT

wallclock time

Max Send Dt

  • What if clocks are not well synched?

    • Let Dmax be the maximum clock drift.

    • Let Smax be the maximum synchronization error.

  • Solution: Re-define tmax as

    t’max = max(tmax , 2*Dmax , 2*Smax)

  • In practice both Dmax and Smax are very small in comparison to tmax.


Transient message problem
Transient Message Problem Network Atomic Operations

  • Max Send Dt: worst case bound on time to send event in network

    • guarantees events are accounted for by either sender of receiver


Simultaneous reporting problem
Simultaneous Reporting Problem Network Atomic Operations

  • Problem arises when processors do not start GVT computation simultaneously

  • Seven O’Clock does start simultaneously across all CPUs, therefore, problem cannot occur


GVT Network Atomic Operations

7

LVT: 7

9

GVT: min(5,7)

LVT: min(5,9)

LVT: 5

5

10

NAO

A

B

C

D

E


GVT Network Atomic Operations

7

LVT: 7

9

GVT: min(5,7)

LVT: min(5,9)

LVT: 5

5

10

NAO


Simulation seven o clock gvt algorithm

GVT #2 Network Atomic Operations

GVT #1

NAO

NAO

Simulation: Seven O’Clock GVT Algorithm

  • Assumptions:

    • Each processor has a highly accurate clock

    • A message passing interface w/o ack is available

    • The worst case bound on the time to transmit a message through the network tmax is known.

  • Properties:

    • a clock-based algorithm for distributed processors

    • creates a sequentially consistent view of distributed memory

tmax

tmax

7

12

LP4

LVT=min(7,9)

LP3

9

cut point

GVT=min(5,7)

LP2

10

LVT=min(5,9)

LP1

5

NAO

wallclock time


Limitations
Limitations Network Atomic Operations

  • NAOs cannot be “forced”

    • agreed upon intervals cannot change

  • Simulation End Time

    • worst-case, complete NAO and only one event remaining to process

    • amortized over entire run-time, cost is O(1)

  • Exhausted Event Pool

    • requires tuning to ensure enough optimistic memory available


Uniqueness
Uniqueness Network Atomic Operations

  • Only real-time based GVT algorithm

  • Zero-cost consistent-cut  truly scalable

    • O(1) cost  optimal

  • Only algorithm which is entirely independent of available event memory

    • Event memory loosely tied to GVT algorithm


Performance analysis models

r-PHOLD Network Atomic Operations

PHOLD with reverse computation

Modified to control percent remote events (normally 75%)

Destinations still decided using a uniform random number generator  all LPs possible destination

TCP-Tahoe

TCP-TAHOE ring of Campus Networks topology

Same topology design as used by PDNS in MASCOTS ’03

Model limitations required us to increase the number of LAN routers in order to simulate the same network

Performance Analysis: Models


Performance analysis clusters
Performance Analysis: Clusters Network Atomic Operations



Maximize distribution (round robin among nodes) VERSUS Network Atomic Operations

Maximize parallelization (use all CPUs before using additional nodes)


NetSim Cluster: Comparing 10- and 25% remote events Network Atomic Operations

(using 1 CPU per node)


NetSim Cluster: Comparing 10- and 25% remote events Network Atomic Operations

(using 1 CPU per node)


Tcp model topology
TCP Model Topology Network Atomic Operations

Single Campus

10 Campus Networks in a Ring

Our model contained 1,008 campus networks in a ring,

simulating > 540,000 nodes.




Future work conclusions
Future Work & Conclusions node

  • Investigate “power” of different models by computing spectral analysis

    • GVT now in frequency domain

    • Determine max length of rollbacks

  • Investigate new ways of measuring performance

    • Models too large to run sequentially

    • Account for hardware affects (even in NOW there are fluctuations in HW performance)

    • Account for model LP mapping

    • Account for different cases, ie, 4 CPUs distributed across 1, 2, and 4 nodes


ad