1 / 31

Multi C ore P rocessors and C asino P rogramming

Multi C ore P rocessors and C asino P rogramming. W. J. Paul Vienna 2014. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A. l ayers of system architecture. p hysical gates. different programming models on different layers

clay
Download Presentation

Multi C ore P rocessors and C asino P rogramming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi Core ProcessorsandCasino Programming W. J. Paul Vienna 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

  2. layersofsystemarchitecture physicalgates • different programmingmodels on different layers • instructionsetarchitecture (ISA)… • … • parallel C + devices + macroassembly + assembly + interrupts ISA hypervisor

  3. layer n ofsystemarchitecture • userseesprogramming model (purple) providedbylayer n • implementerimplementsit in programming model oflayer n-1 (white) • implementationsusually simple orwrong • KISS layer n-1 layer n

  4. layer n ofsystemarchitecture • userseesprogramming model (purple) providedbylayer n • implementerimplementsit in programming model oflayer n-1 (white) • implementationsusually simple • easy IFweknowprogramming model on layer n-1 layer n-1 layer n

  5. ifweonlykindofknowprogramming model oflayer n-1….. layer n-1, n…

  6. thecasinoispresentlyeverywhere • ISA ofmulticoresystemsisonlykindofknown • listofoperatingconditions in these 3000 pagesmightbeincomplete • completelistcanbeobtainedbycorrectnessproofofprocessorhardware • Semanticsstack on top is • not completelydefined + justified

  7. match

  8. mismatch

  9. mismatch • manufacturersof real time systems • avoidmulticoreor • turn presently off all parallel featurestheycan • theyknowwhattheyaredoing

  10. roadmap/plan of talk • ISA-spformulticoreprocessors • MIPS 86 = MIPS + TSO • below: • hardwarecorrectnessformulticorenondeterministic ISA • collectoperatingconditions • bottomofroadmap: digital gates • bottom: physicalgates • above: • definesemanticslayers • justifyarguingaboutimplementation in lowerlayers • ownershipand order reduction

  11. ISA-sp: disk APIC • X64 ISA model • E. Cohen: communicatingsequentialcomponents; order ofstepsnondeterministic • sb: storebuffer • mmu: memorymanagementunit; walkingofpagetablesnondeterministic(speculation) • APIC: device, interrupts • disk: forbooting mem + caches sb mmu core

  12. Nondeterministic ISA • hardwarecorrectness • induction on cycles t ofdeterministichardware • ne(t): numberofnondeterministic ISA stepscompletedatcycle t • oracleinput o forthesesteps • unitstepped • initial walk guessedof MMU • walk usedbycore

  13. Implementationdependentoperatingconditions • pipelinestages • old: wheniswritetogprvisible ? • forwardingandstalling pc-translate fetch decode execute ea-translate memory gprwrite back

  14. Implementationdependentoperatingconditions • pipelinestages • wheniswriteof an instructionvisible • speculation • Kröning 1999 pc-translate fetch decode execute ea-translate memory gprwrite back

  15. Implementationdependentoperatingconditions • pipelinestages • wheniswriteof an instructionorpagetablebyotherprocessorvisible • drainpipe + storebuffer + sync pc-translate fetch decode execute ea-translate memory gprwrite back

  16. invlpg • pipelinestages • core: • stepatstage ‚memory‘ • IMMU: • stepatstage ‚pc-translate‘; speculation in ISA. • pipeline walk wo in ghostregisters • invariant: wo in virtualtlb • corestep(wo) • onlyallowedif invariant holds • invariant: • inhibituseoftranslation in tlbinvlpgdbyinstruction in stagesdecode…memory • roll back pc-translateusingtranslationinvlpgdatstagefetch (speculativeexecution) • interruptin stagedecode • changestountranslatedmode • IMMU step in stagepc-translatewould not occur in deterministic ISA • was speculated in nondeterministic ISA (evenwithdeterministic MMU) pc-translate wo fetch decode execute ea-translate memory gprwrite back

  17. Invlpg: canbeimplementedwithoutsoftwareconditionin nodeterministic ISA • pipelinestages • core: • stepatstage ‚memory‘ • IMMU: • stepatstage ‚pc-translate‘; speculation in ISA. • pipeline walk wo in ghostregisters • invariant: wo in virtualtlb • corestep(wo) • onlyallowedif invariant holds • invariant: • inhibituseoftranslation in tlbinvlpgdbyinstruction in stagesdecode…memory • roll back pc-translateusingtranslationinvlpgdatstagefetch (speculativeexecution) • interrupt in stagedecode • changestountranslatedmode • IMMU step in stagepc-translatewould not occur in deterministic ISA • was speculated in nondeterministic ISA (evenwithdeterministic MMU) pc-translate wo fetch decode execute ea-translate memory gprwrite back

  18. currentresearch/last forhardware • pipelinestages • Whenaredevicestepsvisible in multicoremachines? pc-translate fetch decode execute ea-translate memory gprwrite back

  19. ISA +devicesanddrivercorrectness (Dublin 2009) • hardware parallel evenwithsequentialprocessor • ISA nondeterministicconcurrent, 1 stepat a time • disableinterruptsofdevices >1 anddon‘tpollthem • reordertheirdevicesteps out ofdriverrunofdev 1 • preand post conditionsfordrivers… dev 1 proc dev k

  20. ISA +devicesanddrivercorrectness • disableinterruptsofdevices >1 anddon‘tpollthem • reordertheirdevicesteps out ofdriverrunofdev 1 • preand post conditionsfordrivers… • assumesabsenceofsidechannels dev 1 proc dev k

  21. ISA +devicesanddrivercorrectness • disableinterruptsofdevices >1 anddon‘tpollthem • reordertheirdevicesteps out ofdriverrunofdev 1 • preand post conditionsfordrivers… Device 1: motor Device 2: clima Side channel: power consumption dev 1 proc dev k

  22. C + assembly (Kirkland2013extended)

  23. C + devices • Implementation • accessdeviceportsbyassemblycode • do not allocate C variables toports • disableinterruptsduringrunoftranslated C code • Order reduction: devicesstepscanbereorderedtoassemblyportion • Semantics • Configurations (a,c,d) or (a,d) • d fordevice • devicestepsonlyfor (a,d)

  24. Ownership (1)concept • Classifyaddresses • local (e.g. C stack) • sharedandreadonly (e.g. program) • sharedowned (temporarilylocal/locked) • sharedwriteable not owned (locks) • invariants: • atmost 1 owner …. • disjointness… • safeprograms: actlikenamesofaddressclassessuggest • accessestoclass 4 atomicatthelanguagelevel

  25. Ownership (2)Def: structured parallel C (almostfolklore) • Classifyaddresses • local (e.g. C stack) • sharedandreadonly (e.g. program) • sharedowned (temporarilylocal/locked) • sharedwriteable not owned (locks) • multiple C threads • sequentiallyconsistentmemory! • shared: heap + global variables • local: stacks • safew.r.t. ownership • class 4 access: volatile • Interleaveat(compilerconsistencypointsbefore) class 4 accesses

  26. Ownership (3)structured parallel C to parallel assembly • IF • translatethreadswithsequentialcompiler • translate volatile C accesstointerlocked ISA access • atmost 1 class 4 accessbetweentwointerleavingpoints(e.g. no global pointerchasingto global variable) • THEN • ISA programsafe • multicore ISA simulates parallel C • Baumann 2014

  27. Ownership (4)parallel storebufferreduction in ISA-sp dirty • maintainlocaldirtybits • class 4 writesince last localsb- flush • class 4 readonlyifdirty =0 • Cohen Schirmer ITP 2010: storebuffers invisible • formal, 70 pagesproof • nommu • push throughhierarchy • implementsb-flushascompilerintrinsic in C C compiler m-asm m-assembler ISA-u=asm before ISA-sp

  28. Ownership (5)parallel storebufferreduction in ISA-sp dirty • maintainlocaldirtybits • class 4 writesince last localsb- flush • class 4 readonlyifdirty =0 • Chen Cohen Kovalev (VSTTE 2014: storebuffers invisible • 94 pagesproof • withmmu • pagetableslocaltoprocessor + mmuorshared • newownershipclass: locallyshared. Processoraccesswhilelocalmmuwalks: class 4 C compiler m-asm m-assembler ISA-u=asm before ISA-sp

  29. Ownership (6): Semanticsof C + interruptsPentchev 2014 • C programthread + handlerthreads • ownershipdisciplinebetweenprogramandhandlerthread • interleaveatconsistencypointsaroundclass 4 accesses • Parallel C programthreads + handlerthreads • ownershipasforstructured parallel C forlocalthreads + handlers • newownershipclass: locallysharedbetweenprogramthreadandhandler

  30. Summary • Hardware • searchofsoftwareconditionsalmostcompleted (exceptmulticore + devices) • so faronlyknown type ofsoftwareconditionsfound • withnondeterministic ISA nosoftwareconditionsforuseofinvlpg • Sofwarestack • C + assembly • C + devices • structured Parallel C • storebufferreductionwith MMUs • C + interrupts

  31. Oncethisresearchisdone • wecouldquit • ifwewantedto

More Related