1 / 36

Theory of Memory

Theory of Memory. W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University. Unites theories of store buffers interlocking caches cache coherence out of order execution X64 instruction set

baba
Download Presentation

Theory of Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theory of Memory W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University

  2. Unites theories of store buffers interlocking caches cache coherence out of order execution X64 instruction set address translation optimized compilation structured parallel C semantics Explains why hypervisor might run structured parallel C VCC is supposed to mirror structured parallel C semantics thus VCC might be(come) sound why might his be important?

  3. Specifying Memory x M(x)

  4. Store Buffer memory M sbuf(y) r(j) w(i)

  5. Store Buffer memory M sbuf(y) r(j) w(i)

  6. Caches M ca

  7. Many Caches: Snooping M ca(1) ca(p)

  8. Many Caches M x.la x.off ca(1) ca(p)

  9. Many Caches M x.la x.off ca(1) ca(p)

  10. Many Caches M x.off ca(1) ca(p)

  11. Overlapping Transactions c b public (a) a c c

  12. Sequentially Consistent Memorylemma 5 c b public (a) a c c

  13. Tomasulo Schedulers for OOO IF issue reservation stations funct. units CDB ROB WB

  14. Two Memory Units m RS RS sbuf MMU funct. units LS CDB ROB

  15. Single Processor OOO correctnesslemma 6 m RS RS sbuf MMU funct. units LS CDB ROB

  16. Multi Processor OOO implementation m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB

  17. Multi Processor OOO correctnesslemma 7 m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB

  18. Multi Processor OOO correctnesslemma 7 m RS RS sbuf MMU funct. units LS CDB data(i,j) ROB

  19. X64 architecture • CPU core • R: user registers • SR: system registers • CR3 • acc: access • segmentation • mmu: memory management unit • tlb: translation look aside buffer • memory system • mm: main memory • ca: cache • sbuf: store buffer mm ca sbuf acc mmu tlb acc CR3 segmentation core R

  20. segmentation offlemma 8 • 1 segment • large as entire address space • segmentation invisible mm ca sbuf acc mmu tlb acc CR3 segmentation core R

  21. Bad news: cache state is visible • CPU core • acc: access • acc.adr: address • acc.r: rights (user,write, exe) • acc.data • acc.mmode: memory mode • WB: write back • WT: write through ... • NC: no cache mm or devices ca sbuf acc mmu tlb acc CR3 core R

  22. Good News: no device, no NC mode • acc.mmode: memory mode • WB: write back • WT: write through ... • NC: no cache not used mm ca sbuf acc mmu tlb acc CR3 core R

  23. Sequentially Consistent Physical Memorylemma 9 • acc.mmode: memory mode • WB: write back • WT: write through ... mix on same address • PM: sequentially consistent physical memory abstraction • Proof: MOESI invariants are maintained PM sbuf acc mmu tlb acc CR3 core R

  24. Initialize page tables • 1 processor • sbuf invisible • operating mode: paging disabled • mmu invisible • set up page table tree in PM PM page tables sbuf acc mmu tlb acc CR3 core R

  25. Translated Linear Memory • many processors • operating mode: paging enabled • keep tlb consistent PM page tables sbuf acc mmu tlb acc CR3 core R

  26. Translated Consistent Linear Memory+ sbufs lemma 10 • many processors • operating mode: paging enabled • keep tlb consistent LM page tables sbuf acc CR3 core R

  27. C0: Pascal with C syntaxconfigurations • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • subvariables • (m,i)[17].gpr[3] • value of pointers: subvariables ! memory m va(c,(m,i)) size(m,i) ba(m,i)

  28. Parallel C • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • Share • gm • hm • Interleave at small steps semantics steps memory m va(c,(m,i)) size(m,i) ba(m,i)

  29. Parallel C • c = ( pr, rd, lms, hm,gm) • pr program rest • rd recursion depth • lms: [0: recursion depth]!{local memories} • hm: heap memory • gm: global memory • Share • gm • hm • Interleave at small steps semantics steps • Problem: • Processor interleaves instructions of compiled programs code(p) memory m va(c,(m,i)) size(m,i) ba(m,i)

  30. simulation relation consis(c, alloc, d) LM alloc(c,y) y alloc(c,p) p

  31. Non optimizing compiler:step by step simulation

  32. Optimizing compiler:simulation between IO-steps

  33. IO-steps (1): volatile accesses

  34. Volatiles Sequentially Consistentlemma 11

  35. Structured Parallel C • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory

  36. Summary • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory • Outlined correctness proof for implementation of structured parallel C • Initialisation • compilation

More Related