1 / 17

ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory

ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory. Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer System Lab Stanford University http:/tcc.stanford.edu/prototypes. Why we built ATLAS.

Download Presentation

ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS(a.k.a. RAMP Red)Parallel Programming with Transactional Memory Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer System Lab Stanford University http:/tcc.stanford.edu/prototypes

  2. Why we built ATLAS • Multicore processors exposes challenges of multithreaded programming • Transactional Memory simplifies parallel programming • As simple as coarse-grain locks • As fast as fine-grain locks • Currently missing for evaluating TM • Fast TM prototypes to develop software on • FPGAs improving capabilities attractive for CMP prototyping • Fast  Can operate > 100 MHz • More logic, memory and I/O’s • Larger libraries of pre-designed IP cores • ATLAS: 8-processor Transactional Memory System • 1st FPGA-based Hardware TM system • Member of RAMP initiative  RAMP Red

  3. ATLAS provides … • Speed • > 100x speed-up over SW simulator [FPGA 2007] • Rich software environment • Linux OS • Full GNU environment (gcc, glibc, etc.) • Productivity • Guided performance tuning • Standard GDB environment + deterministic replay

  4. TCC’s Execution Model • Transaction • Building block of a program • Critical region • Executed atomically & isolated from others

  5. TCC’s Execution Model ... ld 0xdddd ld 0xeeee ... Re- Execute Code Execute Code Execute Code Execute Code ... st 0xbeef ld 0xbeef ... Arbitrate Arbitrate Commit Commit 0xbeef 0xbeef Undo ld 0xbeef CPU 0 CPU 1 CPU 2 ... ld 0xaaaa ld 0xbbbb ... In TCC, All Transactions All The Time [PACT 2004]

  6. CMP Architecture for TCC SpeculativelyRead Bits: ld 0xdeadbeef Speculatively Written Bits: st 0xcafebabe Violation Detection: Commit: Compare incoming address to R bits Read pointers from Store Address FIFO, flush addresses W bits set

  7. ATLAS 8-way CMP on BEE2 Board Control FPGA • Linux PPC @ 300 MHz • Launch TCC apps here • Handle system services for TCC PowerPCs • Fabric runs @ 100 MHz User FPGAs • 4 FPGAs for a total of 8 TCC CPUs • PPC, TCC caches, BRAMs and busses run @ 100 MHz

  8. ATLAS Software Overview • TM application can be easily written with TM API • ATLAS profiler provides a runtime profiling and guided performance tuning • ATLAS subsystem provides Linux OS support for the TM application TM Application TM API ATLAS Profiler ATLAS Subsystem Linux OS ATLAS HW on BEE2

  9. ATLAS subsystem Invokes parallel work Violation Transfersinitial context Linux PPC TCCPPC0 TCCPPC1 TCCPPC2 TCCPPC7 … Exit withapp. stats Commit Joins parallel work

  10. ATLAS System Support TCC PPC requests OS support. (TLB miss, system call) Linux PPC regenerates and services the request. Linux PPC TCC PPC Linux PPC replies back to the requestor. • Serialize, if request is irrevocable • System Call • Page-out

  11. Coding with TM API: histogram main (int argc, void* argv) { … sequential code … TM_PARALLEL(run, NULL, numCpus); … sequential code … } // static scheduling with interleaved access to A[] void* run(void* args) { int i = TM_GET_THREAD_ID(); for (;i < NUM_LOOP; i+= TM_GET_NUM_THREAD()) { TM_BEGIN(); bucket[A[i]]++; TM_END(); } OpenTM will provide high-level (OpenMP style) pragmas

  12. Guided Performance Tuning • TAPE: Light-weight runtime profiler [ICS 2005] • Tracking most significant violations (longest loss time) • Violated object address • PC where object was read • Loss time & # of occurrence • Committing thread’s ID and transaction PC • Tracking most significant overflows (longest duration) • Overflows: when speculative state can no longer stay in TCC$ • PC where overflows • Overflow duration & number of occurrence • Type of overflow (LRU or Write Buffer)

  13. Deterministic Replay • All Transactions All The Time • TM 101: Transaction is executed atomically and in isolation • TM’s illusion: transaction starts after older transactions finish • Only need to record “the order of commit” • Minimal runtime overhead & footprint size = 1B / transaction Logging execution Replay execution T0 T0 Token arbiter enforcescommit order specified in LOG write-set write-set T1 T1 T2 T2 T2 T2 LOG: T0 T1 T2 time time

  14. Useful Features of Replay • Monitoring code in the transaction • Remember we only record the transaction order • Verification • Log is not written in stone • Complete runtime scenario coverage is possible • Choice of running Replay on • ATLAS itself • HW support for other debugging tools (see next slide) • Local machine (your favorite desktop or workstation) • Runs natively on faster local machine, sequentially • Seamless access to existing debugging tools

  15. GDB support • Current status • GDB integrated with local machine replay • GDB provides debugability while guaranteeing deterministic replay • Below are work-in-progress • Breakpoint • Thread local BP vs. global BP • Stop the world by controlling commit token • Stepping • Backward stepping: Transaction is ready to roll back • Transaction stepping • Unlimited data-watch (ATLAS only) • Separate monitor TCC cache to register data-watches

  16. Conclusion: ATLAS provides • Speed • > 100x speed-up over SW simulator [FPGA 2007] • Software environment • Linux OS • Full GNU environment (gcc, glibc, etc.) • Productivity • TAPE: Guided performance tuning • Deterministic replay • Standard GDB environment • Future Work • High-level language support (Java, Python, …)

  17. Questions and Answers • tcc_fpga_xtreme@mailman.stanford.edu • ATLAS Team Members • System Hardware – Njuguna Njoroge, PhD Candidate • System Software – Sewook Wee, PhD Candidate • High level languages – Jiwon Seo, PhD Candidate • HW Performance – Lewis Mbae, BS Candidate • Past contributors • Interconnection Fabric – Jared Casper, PhD Candidate • Undergrads – Justy Burdick, Daxia Ge, Yuriy Teslar

More Related