1 / 29

Hydra VM: Extracting Parallelization From Legacy Code Using STM

Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran. Hydra VM: Extracting Parallelization From Legacy Code Using STM . VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University. Outline. Motivation & Objectives

cato
Download Presentation

Hydra VM: Extracting Parallelization From Legacy Code Using STM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mohamed. M. Saad Mohamed A. Mohamedin & Prof. BinoyRavindran Hydra VM: Extracting Parallelization From Legacy Code Using STM VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University

  2. Outline • Motivation & Objectives • Background • Transactional Memory • Jikes RVM • Program Reconstruction • Architecture • Profiler, Builder & Runtime • Future Work

  3. Motivation • Why Multicores? • Difficult to make single-core clock frequencies even higher • Deeply pipelined circuits • heat problems • speed of light problems • difficult design and verification • large design teams necessary • server farms need expensive air-conditioning

  4. Motivation • No fast CPUs any more, just more cores! • Trend is using multi-core & hyper-threading

  5. Motivation • At 2005, Sun Niagara (8 cores with HT run 32 HWT) • At 2010, Supermicro (48-core AMD Opteron). • Now, Sun  make boxes with between 128-512 hardware threads (16 HWT/core, 8 cores/CPU) !! What About Software!!! Are we ready for this HW ?!

  6. Objective • Many applications are designed to use few threads • Legacy systems were designed to run at a single processor • Multi-threading programming is headache for developers (race situations, concurrent access, …) • HydraVM: Java Virtual Machine Prototype based on Jikes RVM and targets utilizing large number of cores through detecting automatically possible parallel portions of code

  7. Background • Transactional Memory • Jikes RVM (Adaptive Online Architecture)

  8. Atomicity • Atomicity: An operation (or set of operations) appears to the rest of the system to occur instantaneously Example (Money Transfer): …… synchronized { from = from - amount to = to + amount } …… …… Example (Money Transfer): …… …… account1.lock() account2.lock() from = from - amount to = to + amount account1.unlock() account2.unlock() …… …… ≈ Y account1 account2 X

  9. Mutual Locks “Classical Approach” • Drawbacks • Deadlock • Livelock • Starvation • Priority Inversion • Non-composable • Cost of managing the lock • Non-scalable on multiprocessors Y A B X

  10. Transactional Memory • Simplifies parallel programming by allowing a group of load and store instructions to execute in an atomicway using additional primitives account1y account2y Y Example (Money Transfer): …… …… START-TRANSACTION from = from - amount to = to + amount END-TRANSACTION …… …… account1 account2 account1x account2x X Commit or Rollback & Retry

  11. Transactional Memory • Each transaction has ReadSet & WriteSet • Transactions conflict if have the same variable(s) at ReadSet / WriteSet • Conflict Resolution using Contention Managerthat employs different policies (Aggressive, Polite, Back-Off, Random, …..) • Aborted code undo changes (if required) and retries again

  12. Transactional Memory • Transactions may be nested (multiple levels) • Inner transaction share the ReadSet/WriteSet of parent • Inner transactions conflicts with each other and with other higher level transactions • Aborting parent transaction forces abort for children • Inner transactions changes are visible to parents once commit successfully, but hidden from outside world till commit of highest level

  13. Transactional Memory • Hardware Transactional Memory Modifications in processors, cache and bus protocol ex; unbounded HTM, TCC, …. • Software Transactional Memory Software runtime library or the programming language support Minimal hardware support; CAS, LL/SC ex; RSTM, DSTM, ESTM, .. • Hybrid Transactional Memory Exploits HTM support to achieve hardware performance for transactions that do not exceed the HTM’s limitations, and STM otherwise ex; LogTM, HyTM, … • Distributed Transactional Memory Extends transaction primitives to distributed environment (network of multiple machines) ex; HyFlow, DecentSTM, GenSTM, …

  14. Jikes RVM • Mature modular open source Java virtual machine designed for research purposes. Unlike most other JVMs it is written in Java! • Adaptive Online System

  15. Program Reconstruction “The Main Idea” • We view program as a set of basic building blocks • Each block is a set of instructions • Block has single entry and multiple exists • Blocks may access the same memory (variables) • It is possible to reconstruct the program from these blocks by rearranging it differently with some changes to the control instructions. • It is even possible to assign each set of blocks to different thread

  16. Example 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return int counter = 0; for(inti=0; i<2; i++)      if(Math.random()>0.3)        counter++;      else          counter--; 

  17. Example public class Test{   public static void foo(){int counter = 0;     for(inti=0; i<12; i++)        if(Math.random()>0.3)           counter++;        else           counter--;   }    public static void zoo(){System.out.println("hi");   }   public static void main(String[] args){inti=6;        if(i<10)foo();        else                zoo();   }}

  18. Architecture

  19. ArchitectureProfiler • Split code into Basic Block • Inject loaded classes with additional instructions to monitor: • Program Flow (Which Basic Blocks are accessed and in what order?) • Memory accessed by each Basic Block • Which Basic Block is doing I/O ?

  20. ArchitectureProfiler 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 write J write C visit B1 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl read K write K visit B2 14: ifle 23 17: iinc 1, 1 read C write C visit B3 20: goto 26 23: iinc 1, -1 read C write C visit B4 26: iinc 2, 1 read J write J visit B5 29: iload_2 30: bipush 12 visit B6 read J 32: if_icmplt 7 35: return Example: int C = 0; for(int J=0; J<2; J++)      if(Math.random()>0.3)        C++;      else          C--;  0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

  21. ArchitectureRecompilation • Recompile the Java class bytecode into machine-code • Replace and reload class definition at memory

  22. ArchitectureCode Execution • Running the profiled code • Collecting flow & memory access information and store it at the knowledge repository

  23. ArchitectureBuilder • Analyze knowledge repository information and know: • Which Blocks can be grouped together • Which groups of blocks can be parallelized

  24. ArchitectureBuilder • Program can be represented as a string (each character is a basic block). • Example: for (Integer i = 0; i < DIMx; i++) { for (Integer j = 0; j < DIMx; j++) { for (Integer k = 0; k < DIMy; k++) { C[i][j] += A[i][k] * B[k][j]; } } } abjbhcfefghcfefghijbhcfefghcfefghijk ab(jb(hcfefg)2hi)2jk

  25. ArchitectureBuilder ab(jb(hcfefg)2hi)2k • Externalize common blocks patterns as methods • Generated methods may be nested • Reconstruct the program as producer-consumer pattern • Collector • Provides Executor with suitable blocks as Tasks to execute according to flow up-to time • Executor • Allocates core threads • Assign tasks to threads • Requests Collector for more blocks based on program flow, after all threads complete

  26. ArchitectureBuilder • Problems • Threads may conflict when access the same variables • Threads may finish out of normal order • Collector may generate invalid tasks • Lets represents each Thread as Transaction • When two transactions conflicts abort one that has newer blocks relative to normal execution • Transaction will not commit unless its preceding one in timeline is finished • Transaction timeout if not reachable

  27. ArchitectureCode Execution – revisit • Collects which transactions conflicts and commit rate • We can refine the constructed program • Builder re-organize generated blocks and recompile the code again

  28. Ongoing & Future Work • Complete the implementation of HydraVM • Profiling by monitoring memory instead of generating new instructions • Automatically uses of Java NIO to handle I/O operations and generate callbacks to process it • Using thread scheduling techniques instead of TM • Formal verification of reconstructed programs matches desired semantics

  29. Questions www.hydravm.org msaad@vt.edu

More Related