1 / 24

JIT-Compiler-Assisted Distributed Java Virtual Machine

JIT-Compiler-Assisted Distributed Java Virtual Machine. Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong Presented by Cho-Li Wang. Outline.

iria
Download Presentation

JIT-Compiler-Assisted Distributed Java Virtual Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong Presented by Cho-Li Wang

  2. Outline • Distributed Java Virtual Machine (DJVM) • Design tradeoffs • Related work • JESSICA2 DJVM • JIT-compiler-assisted dynamic thread migration • Global Object Space (GOS) for location-transparent object access • Experimental results + A demo • Conclusion & future work

  3. Distributed Java Virtual Machine (DJVM) import java.util.*; class worker extends Thread{ private long n; public worker(long N){ n=N; } public void run(){ long sum=0; for(long i=0; i<n; i++) sum+=i; System.out.println(“N=“+n+” Sum="+sum);} } public class test { static final int N=100; public static void main(String args[]){ worker [] w= new worker[N]; Random r = new Random(); for (int i=0; i<N; i++) w[i] = new worker(r.nextLong()); for (int i=0; i<N; i++) w[i].start(); try{ for (int i=0; i<N; i++) w[i].join();} catch (Exception e){}} } Java thread • A distributed Java Virtual Machine (DJVM) consists of a group of extended JVMs running on a distributed environment to support true parallel execution of a multithreaded Java application. • A DJVM provides all the JVM services, that are compliant with the Java language specification. • DJVM provides an illusion that the program is running on a single machine (yet more powerful) -- Single System Image (SSI) (Single System Image) Bytecode Execution Engine DJVM Thread Heap Class JVM JVM JVM JVM

  4. Design Tradeoffs of a DJVM Thread Sched • How to manage the threads? • Distributed thread scheduling • Initial thread placement vs migration • How to store the data ? • Object store : A global heap shared by threads ? • Memory consistency : Java memory model ? • Can an off-the-shelf DSM be used ? Or others ? • How to process the bytecode ? • Execution Engine : Interpretation, Just-in-Time (JIT) compilation, static compilation • High performance ? Exec Engine Heap

  5. Remote Creation Related work Intr Embedded OO-based DSM (Proxy) • cJVM (IBM Haifa Research) • Interpreter mode execution • Embedded OO-based DSM (Proxy) • JAVA/DSM (Rice University) • Interpreter mode execution • Heap built on top of a page-based DSM • JESSICA (HKU) • Thread migration • Interpreter mode execution • Heap built on top of a page-based DSM • Jackal, Hyperion • Static compilation • Link to an object-based DSM Manual Distribution Intr Page-based DSM Transparent Migration Intr Page-based DSM Remote Creation Static compilation OO-based DSM

  6. JESSICA2 (Java-Enabled Single-System-Image Computing Architecture) A Multithreaded Java Program Thread Migration JIT Compiler Mode Portable Java Frame JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM Master Worker Worker Worker Worker Worker Global Object Space A shared global heap spanning all cluster nodes

  7. JESSICA2 Main Features • Cluster-aware bytecode execution engine (JITEE) • JVM operated in Just-In-Time (JIT) compilation mode • Cluster-aware : global naming scheme for threads, objects,.. • JIT-compiler-assisted dynamic thread migration • Runtime capturing and restoring of thread execution context. • No source code modification; no bytecode instrumentation (preprocessing); no new API introduced • Enable dynamic load balancing • Global Object Space (GOS) • Provide location-transparent object access for threads • Tightly integrated with JVM, • Memory consistency : compliant with Java Memory Model (JMM) • Various optimizing schemes : adaptive migrating home, synchronized method shipping, object pushing • I/O redirection

  8. Frame JESSICA2 thread migration (In a JIT-enabled JVM) RTC: Raw Thread Context BTC : Bytecode-oriented Thread Context= thread id + Java frames (class name, method signature, PC, Operand stack ptr, local vars …) • Frame parsing • Restore execution Thread Frames (3) Frames BTC RTC Migration Manager JVM Method Area Frame PC (2) RTC • Stack analysis • Stack capturing Method Area Thread Scheduler PC Source node (1) Alert Transformation of the RTC into the BTC directly inside the JIT compiler Destination node Load Monitor

  9. Thread Stack Transformation Raw Thread Context (RTC) %esp: 0x00000000 %esp+4: 0x082ca809 %esp+8: 0x08225400 %esp+12: 0x08266bc0 %esp: 0x00000000 %esp+4: 0x086243c %esp+8: 0x08623200 %esp+12: 0x08293010 ... %eax = 0x08623200 %ebx = 0x08293010 %esp : stack pointer Stack Restoration Stack Capturing Frames{ method CPI::run()V@111 local=13;stack=0; var: arg0:CPI, 33, 0x8225400 local1: [D; 33, 0x8266bc0@2 local2: int, 2; ... method id bytecode Program Counter node id [ : array; D: double Bytecode-oriented Thread Context (BTC)

  10. migration points : (1) head of basic block (loop) (2) before a method invocation invoke Native Code Java frame C frame Thread State Capturing : Details Bytecode verifier Construct control flow graph Bytecode translation Intermediate Code • Add migration checking code (cmp mflag,0) • Add object checking (local or remote obj) • Add type and register spilling code generation Global Object Space Linking & Constant Resolution Java frame detection raw stack thread stack

  11. Restoring: Dynamic Register Patching(on i386 Architecture) Rebuilt register context Small code stubs Compiled methods: reg1 <- value1 jmp restore_point1 Method1(){ ... retore_point1: } frame 1 %ebp Ret addr reg1 <- value1 reg2 <- value2 jmprestore_point0 Stack growth Native code Method0(){ ... retore_point0: } frame 0 %ebp Ret addr trampoline frame trampoline bootstrap frame bootstrap(){ trampoline(); closing handler(); } %ebp %ebp : i386 frame pointer “Ret Addr”: return address of the current function call

  12. Global Object Space (GOS) • Provide global heap abstraction for DJVM • Home-based object coherence protocol, compliant with JVM Memory Model • OO-based to reduce false sharing • Non-blocking communication • Use threaded I/O interface inside JVM for communication to hide the latency • Adaptive object home migration mechanism • Take advantage of JVM runtime information for optimization • Optimizations: Home migration, Synchronized Method Shipping, Object pushing

  13. Experimental environment • HKU Gideon 300 Linux cluster : 300 P4 PCs (2GHz, 512 MB RAM, 40 GB disk) • Network: 312-port Foundry FastIron 1500 Non-blocking switch (100 Mbits/s) • Kaffe JVM version 1.0.6; Linux kernel 2.4.18-3 (RedHat 7.3)

  14. Migration overhead during normal execution (SPECJVM98 benchmark)

  15. Migration overhead analysis Overall migration latency (2-10 ms) Migration time breakdown (LT program)

  16. GOS Optimizations (using 4 PCs) NO = No optimizations HS = Home migration + Synchronized Method Shipping H = Home migration HSP = HS + Object pushing

  17. Application benchmark Number of Nodes

  18. JESSICA2 vs JESSICA (CPI)

  19. Parallel Ray Tracing (using 64 nodes of Gideon 300 cluster) Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes: 108 seconds 1 node: 4402 seconds ( 1.2 hour) Speedup = 4402/108=40.75

  20. Demo • Execution Steps • Create the display panel • Start the ray tracing program on node 26 with 8 threads • Add two more nodes: 27 and 28 • Add 5 more nodes: 29, 30, 31, 32, 33

  21. Conclusions • Dynamic Java thread migration makes it possible for true parallel execution of Java threads and enables dynamic load balancing. • Runtime (“Just-In-Time”) code Instrument for thread state capturing and restoring is feasible. • An embedded GOS layer can take advantage of the JVM runtime information to reduce communication overhead

  22. Advantages of native code instrumentation • Lightweight • Re-use JIT compiler internal data structures and control flow analysis functions • Instrumented native codes are more efficient than instrumented bytecode. • Transparent • No source code modification. • No new API introduced. • No preprocessing

  23. Future work • Advanced thread migration mechanism without overhead during normal execution • Incremental Distributed GC • Enhanced Single I/O Space to benefit more real-life applications • Parallel I/O Support

  24. Thanks • JESSICA2 Webpage http://www.csis.hku.hk/~clwang/projects/JESSICA2.html

More Related