slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner PowerPoint Presentation
Download Presentation
Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner

Loading in 2 Seconds...

play fullscreen
1 / 19

Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer. Without easy and efficient parallel programming methods…. “computer science will become washing machine science.“.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner' - armina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Johannes Schneider

Transactional Memory:

How to Perform Load Adaption

in a Simple And Distributed Manner

Johannes Schneider

David Hasenfratz

Roger Wattenhofer

without easy and efficient parallel programming methods
Johannes SchneiderWithout easy and efficient parallel programming methods…

“computer science will become washing machine science.“

how to handle access to shared data
Johannes SchneiderHow to handle access to shared data?
  • Locks, Monitors…
    • Coarse grained vs. fine grained locking

easy but slow program demanding, time consuming but fast programs

  • Problems
    • difficult
    • error prone
    • Composability

Thread 1 Thread 2

lock all data

modify/use data

unlock all data

lock A

lock B

modify/use A,B

lock C

modify/use A,B,C

unlock A

modify/use B,C

unlock B,C

lock B

lock A

modify/use A,B

unlock A,B

Only 1 thread can execute

Deadlock!

transactional memory tm a possible solution
Johannes SchneiderTransactional memory(TM) - a possible solution

Begin transaction

modify/use data

End transaction

  • Simple for the programmer
  • Composable
  • Idea from database community
  • Many TM systems (internally) still use locks
  • But the TM system (not the programmer) takes care of
    • Performance
    • Correctness (no deadlocks...)

Method A.x()

Begin Transaction

B.y()

End Transaction

Method B.y()

Begin transaction

End transaction

transactional memory systems
Johannes SchneiderTransactional memory systems
  • If transactions modify

different data, everything is ok

    • the same data, conflicts arise that must be resolved
        • Transactions might get delayed or aborted
        • Job of a contention manager
  • A transaction keeps track of all modified values
  • It restores all values, if it is aborted
  • A transaction successfully finishes with a commit
conflicts a contention manager decides
Johannes SchneiderConflicts – A contention manager decides
  • Abort or delay a transaction, i.e. adapt load
  • Distributed
    • Each thread has its own manager
  • Example
    • Initially: A=1, B=1

Manager 1 Manager 2

Manager 2

Manager 1

Trans.1

Trans. 2

Trans. 1

Trans. 2

T1

T1

T1

B:=2

A:=3

A:=2

B:=2

A:=3

A:=2

conflict

conflict

Abort (undo all changes, i.e. set A:=1)‏ and restart (after a while)

Abort (set B:=1) and restart

OR

wait and retry

Delay to adapt load!

prior work
Johannes SchneiderPrior work
    • Contention Managers[PODC03,PODC05,ISAAC09…]
    • System load was not (explicitly) considered
  • Load adaption (based on contention)
    • Estimate contention intensity: CI [SPAA08]
      • If abort: CI = a CI + (1-a) with parameter a [0,1]
      • If commit: CI = a CI
      • If CI > parameter b then resort to central scheduler
    • Keep a transaction queue per core [PODC08]
      • Central dispatcher assigns transactions to a core, i.e. its queue
      • Each core iteratively executes transactions from queue
      • If transaction A on core 1 is aborted due to B on core 2

then A is appended to the queue of core 2

    • Central scheduler will become a bottleneck

D

C

Core 2

Core 1

A

B

B aborts A

A

D

Core 2

Core 1

C

B

this paper
Johannes SchneiderThis paper
  • Theoretical analysis
  • Decentralized (simple) approaches to load adaption
    • based on contention
strategies
Johannes Schneider

Conflict graph

A conflicted with C

D conflicted with B

Strategies

A

  • Ignore: Do not learn from conflicts
    • ImmediateRestart
  • Stay real: Remember faced conflicts
    • SerializeFacedConflicts
    • Do not schedule prior conflicting transactions

concurrently

  • Be cautious: Assume additional conflicts
    • SerializeAll
    • All transactions in a subgraph are assumed to conflict

B

C

D

B

A

C

D

B

A

A

B

D

C

C

D

load adaption strategies
Johannes SchneiderLoad Adaption Strategies
  • AbortBackoff
    • If aborted wait for a random time [0,2#aborts]
    • Priority = number of aborts #aborts
  • Who wins a conflict?
    • 2 strategies
      • Estimate the work done
      • Unrelated to work done
theory part model
Johannes SchneiderTheory Part - Model
  • n transactions (and threads)
    • Start concurrently on n cores
  • Transaction
    • sequence of operations
    • operation takes 1 time unit
    • duration (number of operations) tTis fixed
    • 2 types of operations
      • Write = modify (shared) resource and lock it until commit
      • Compute/abort/commit
  • Ignore overhead of load adaption
    • Remembering transactions, scheduling…

Core 1

Core 2

Core n

A

Z

B

A

moderate parallelism
Johannes SchneiderModerate parallelism
  • Shared counter
    • Conflicts directly after transaction start
  • Linked List
    • Conflicts at arbitrary time
  • Expected time span until all transactions committed
  • Speed-up log n (at best)

Transaction run time

#transactions

substantial parallelism
Johannes SchneiderSubstantial parallelism

T1

  • Worst case
  • Conflict graph is d-ary tree of logarithmic height
  • Exponential gap in worst case
    • SerializeAll and others

T2

T3

T5

T4

practical investigation
Johannes SchneiderPractical investigation
    • Remembering conflicts causes too much overhead
    • Good for analysis but not for implementation
  • Quickadapter
    • Serializes transactions
    • Each core has a “waiting” flag
    • If aborted, set flag and wait until flag unset
    • If commit, unset some flag
  • AbortBackOff
  • (Also considered some variants)
practical investigation1
Johannes SchneiderPractical investigation
  • Evaluation on 16 core machine
  • DSTM2 system
    • Visible readers
  • Six benchmarks
    • Little parallelism
      • Shared counter, Sorted List (accessed objects not released), Listcounter
    • Considerable parallelism
      • Red Black Tree, LFUCache, RandomAccessArray
  • Compare new load adaption policies to existing contention managers
discussion
Johannes SchneiderDiscussion
  • Hard to keep maximum throughput, also in [SPAA08, PODC08]
    • Even without conflicts
    • Improvement for 1 benchmark worsens another
  • On average better than schemes without load adaption
conclusion
Johannes SchneiderConclusion
  • Simple and distributed load adaption strategies
  • Theory
    • (For now) constants and parameters matter a lot
  • Practice
    • Hard to keep load at peak for all usage patterns
slide18
Johannes Schneider

Thanks for your attention!

Questions?

???

\vspace{10pt}

analysis abortbackoff for counter
Johannes SchneiderAnalysis AbortBackoff for counter
  • Recall: If aborted wait for a random time [0,2#aborts]
  • Assume #aborts ~ log (ntT) + x (for some x)
  • Define: a(x) := fraction of active nodes
    • a(0) = 1 (after time ~2log (ntT) = ntTa constant fraction still active)
  • Chance conflict for interval [0,2#aborts] Interval [0, 2log(ntT)+x ]

~ a(x) ntT/ 2log (ntT) +x = a(x) /2x

  • a(x+1) = a(x)/2x = 1/2∑i=0..x i~ 1/2x2
  • a(√log n) = 1/2(√log n)2 = 1/n
  • ∑i=0.. log (ntT) +√log n length interval = ∑i=0.. .. log (ntT) +√log n 2i = ntT 2√log n+1

T3

T2

T1

a(x)ntT= 3/n ntT= 3tT