BulkCommit
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

BulkCommit : Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

BulkCommit : Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment. Author: Xuehai Qian , Benjami Sahelices , Depei Qian. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA-SURATHKAL 2014. Presented by:

Download Presentation

BulkCommit : Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

BulkCommit: Scalable and Fast Commit of Atomic Blocks

in a Lazy Multiprocessor Environment

Author: XuehaiQian, BenjamiSahelices, DepeiQian

  • DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

  • NATIONAL INSTITUTE OF TECHNOLOGY

  • KARNATAKA-SURATHKAL

  • 2014

Presented by:

PravinRamteke (CS13F08)Sawan Belekar(13IS04F)

Course Instructor:

Dr. BasavarajTalawar


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Motivation

•ArchitecturesthatcontinuouslyexecuteAtomicBlocksorChunks(e.g.,TCC,BulkSC)

• Chunk:agroupofdynamicallycontiguousinstructionsexecutedatomically

• Providingperformanceandprogrammabilityadvantages[Hammond04][Ahn09]

•Chunkcommitisanimportantoperation:makingthestateofachunkvisible atomically

•Wefocusonthedesignswithlazydetectionofconflicts

• Provideshigherconcurrencyincodeswithhighconflicts

• Parallelizingthecommitischallenging

•Requirestheconsistentconflictresolutiondecisionoverallthedistributed directorymodules

• Therefore,mostcurrentschemeshavesomesequentialstepsinthecommit

•Inaddition,thecurrentlazyconflictresolutionsaresub-optimal

• IncurthesquashwhenthereisonlyWrite-After-Write(WAW)conflict

2


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Lifetime of aChunk

Time

Grouping Propagation

Execution Commit

• Execution:

• Readsandwritesbringlinesintothecache

• Nowrittenlineismadevisibletootherprocessors

• Executionendswhenthelastinstructionofthechunkcompletes

• Commit:makethechunkstatevisibleatomically

• Grouping:settherelativeorderofanytwoconflictingchunks

• Grabbingthedirectory:lockingthelocalmemorylinesanddetectingtheconflicts

• Afteracommitgrabsalltherelevantdirectories,itisguaranteedtocommitsuccessfully

• Propagation:makingthestoresinachunkvisibletotherestofthesystem

• Involvingsendinginvalidationsandupdatingdirectorystates

• Atomicityisensuredsincetherelevantcachelinesarelogicallylockedbysignaturesduringtheprocess

3


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Inefficiency:SquashonWAW-Only

Time

Grouping Propagation

C0 C1

P0 P1

store buffer wr x wr x

??

C1 Squashed/

x: D in P0 Dir wr x Serialize WAW-only

conflictwith squash

wr x

wr x

Execution

Commit

Chunks

wr x

wr x

wr x

x:S

x:I

x:I

x:D

x:D

x:I

x:S

L1 cache

Re-exec

x: S in P0&P1

x: D in P1

m

Serialize WAW

Conventional System

without re-execution

Chunk-based

System

5


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Contribution:BulkCommit

•BulkCommit:commitprotocolwithparallelgroupingandsquash-free serializationofWAW-onlyconflict

• IntelliSquash: nosquash onWAW

• Insight:using L1 cacheasthe“storebuffer”forthechunk

• IntelliCommit: parallelgroupingwithoutbroadcast

•Insight:using preemptionmechanism toensuretheconsistentorder oftwoconflictingchunks

•BulkCommittriestoachievetheoptimalcommitprotocoldesign

6


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Outline

•Motivation

•IntelliSquash

•IntelliCommit

•Evaluation

7


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

IntelliSquash:Insight

•Challenge:thespeculativedataproducedbyachunkcannotbelostwhenthechunkisreadyto commit

• Solution:usetheL1cacheasthe“storebuffer”forachunk

• Similartothestorebufferintheconventionalsystem

• Onreceivinganinvalidation,thespeculativedirtywordsofalinearepreserved

• Absentbit:itissetwhen

• Thelineisnotpresented

• Thelinecontainssomespeculativewords

• Per-worddirtybit(notshown)

P0 P0

P1 P1

000 1 111 v

011 v v 0 011 v v

Dir State m: D in P1 Dir State m: D in P1

line(m)

line(m)

1

1

1

v

v

0

1

1

1

commit

commit

spV

d

spV

A

d

1

1

1

v

v

1

1

1

v

v

spV

d

spV

A

d

m: S in P0&P1

m: S in P0&P1

8


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

IntelliSquash:MergeOperation

•Performed when thewholelinewith The dirty word is merged with

•Mergetheremotenon-speculative

cachelinewiththelocalspeculative P0

read

0 111 v’

• Oncommit AspVd

• Thelineis notaccesses again AspVd

• Therefore,needtobringtheline Dir State m: S in P1&P1

tothecacheasifthereis amiss

• Unset Absent(A)bit

Absentbitsetis brought

tothecache

the non-speculative

line

words

line(m)

On

misses

to

a

word

not

presented

v

1

0

1

1

1

0

0

1

1

1

1

1

0

v

v

v

v

m: S in P0&P1

m: D in P1

9


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Outline

•Motivation

•IntelliSquash

•IntelliCommit

•Evaluation

10


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

IntelliCommit Protocol

•Onchunkcommit:

• Processorsends commitrequests

P toalltherelevantdirectorymodules

request:

• Locksthememorylines

D0 D1 D2 D3 • Responds withcommit_ack

•Processorcountsthenumberof commit_ackreceived

Group formed • Processorsends commit_confirm

when itreceivestheexpected numberofcommit_ack

Directory

module

receives

commit

11


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

ConflictingChunksTryingtoCommit

  • Different overlapped directory modules receive commit request in opposite order

  • Need to avoid deadlock

P3 D3

P0 D0

C3

C0

P1 D1

P2 D2

12


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

IntelliCommit:DeadlockResolution

•Basicidea:enforceaconsistentorderbetweentwoconflictingchunks

•Piggybackahardware-generatedrandomnumberwiththecommitrequest

13


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

IntelliCommit:DeadlockResolution (cont..)


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

WhyDoes IntelliCommit Work?

1. Whenthedirectorygroupofachunkis alreadyformed,thechunk

cannotbepreemptedbyanother chunk

2. Allthemodulesinvolved in aconflictreachthesamedecisionon

whichchunkhasthehigherpriority,locally

14


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

IntelliCommit Implementation

• Extramessages(P=Processor,D=Directory):

• preempt_request(D→P)

• preempt_ack(P→D)

• preempt_nack(P→D)

• preempt_finish(D→P)

• CommitAckCounter(CAC):#(notreceivedcommit_ack)

• PreemptionVector(PV)(N=#P=#D):

• Eachprocessor:Ncountersofsizelog(N)

• PV[i]atPj=k

• Pj’schunkis preemptedbyPi’schunkin kdirectories

• IncreasePV[i]:abouttosendpreempt_ackforPi’schunk

• DecreasePV[i]:receivedapreempt_finishforPi’schunk

• Whentosendcommit_confirm?

• (CAC==0)&&(foreachi,PV[i]==0)

•Receivedallcommit_ackandthechunkisnotpreemptedbyanyotherchunksin anydirectory


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Outline

•Motivation

•IntelliSquash

•IntelliCommit

•Evaluation

16


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Evaluation

•CycleaccurateNOCsimulationwithprocessorandcachemodel

•Numberofcores:16and64

•11SPLASH-2and7PARSECapplications

•Oneortwooutstandingchunks

•Implementedmostdistributedcommitprotocols:

• ScalableTCC(ST)

• ScalableBulk(SB)

• BulkCommitwithoutIntelliSquash(BC-SQ)

• BulkCommit(BC)

17


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

SPLASH-2 Performance

•BulkCommitreducesbothsquash andcommittime

18


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

PARSECPerformance

19


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

One and TwoOutstandingChunks

•Using twooutstandingchunksisnotalwaysusefuldue tothesetrestriction

• Twochunksfromthesameprocessorcannotwritethesamecacheset


Bulkcommit scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Conclusion

•ProposedBulkCommit:commitprotocolwithparallelgroupingand squash-freeserializationofWAW-onlyconflict

•Keyproperties:

• SerializingWAWbetweenchunkswithoutsquashing

• Exploitingthesimilarityofachunkcommitandanindividualstore

• Parallelgrouping

•Usingpreemptionmechanismstoordertwoconflictingchunks consistently

•Results:

• Eliminatethecommitbottleneckwithevensingleoutstandingchunk

• Reducethesquashtimeforsomeapplication


  • Login