1 / 31

HotSpot TM : A Huge Step Beyond JIT’s

HotSpot TM : A Huge Step Beyond JIT’s. Zhanyong Wan May 1st, 2000. Sources of Information. From Sun’s web-site HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html Various articles on Sun’s web-site http://java.sun.com/products/hotspot / From other web-sites

sai
Download Presentation

HotSpot TM : A Huge Step Beyond JIT’s

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HotSpotTM: A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000

  2. Sources of Information • From Sun’s web-site • HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html • Various articles on Sun’s web-site http://java.sun.com/products/hotspot/ • From other web-sites • Java on Steroids: Sun's High-Performance Java Implementation, U. Hölzle et.al.(slides from HotChips IX, August 1997) http://www.cs.ucsb.edu/oocsb/papers/HotChips.pdf • The HotSpot Virtual Machine, Bill Venners http://www.artima.com/designtechniques/hotspot.html • HotSpot: A new breed of virtual machine, Eric Amstrong http://www.javaworld.com/jw-03-1998/f_jw-03-hotspot.html Zhanyong Wan

  3. Overview • Why Java is different • Why JIT is not good enough • What HotSpot does • The HotSpot architecture • Memory model • Thread model • Adaptive optimization • Conclusions Zhanyong Wan

  4. History • 1st generation JVM • Purely interpreting • 30 - 50 times slower than C++ • 2nd generation JVM • JIT compilers • 3 - 10 times slower than C++ • Static compilers • Better performance than JIT’s Zhanyong Wan

  5. The Future? • HotSpot • Dynamic, fully optimizing compiler • Close-to-C++ performance • May even exceed the speed of C++ in the future Zhanyong Wan

  6. Questions of Interest • How is it possible that HotSpot runs programs faster than the native code generated by a static optimizing Java compiler? • How does HotSpot score? (The collection of technologies used by HotSpot.) • Where did they get the ideas? • Which of these technologies also apply in other systems (e.g. JIT, static source code/bytecode compiler, C++)? • Can Java be made to surpass the performance of C++, or is this a hype? Zhanyong Wan

  7. Why Java Is Different (to C++) • Granularity of factoring • Smaller classes • Smaller methods • More frequent calls • Standard compiler analysis fails • Dynamic dispatch • Slower calls for virtual functions • Much more frequent than in C++ • Sophisticated run-time system • Allocation, garbage collection • Threads, synchronization • Dynamically changing program • Classes loaded/discarded on the fly Zhanyong Wan

  8. Why Java Is Different (cont’d) • Distributed in a portable form • A compiler can generate optimal machine code for a particular processor version • e.g. Pentium vs. Pentium II • Welcomes dynamic compilation (developed in the last decade)! Zhanyong Wan

  9. Find the Java Bottleneck • Time used in a typical Java program executed w/ JDK interpreter: • Allocation/GC: 1/6 • Synchronization: 1/6 • Byte code: 2/3 • Native methods: negligible • Performance critical code: the “hot spots” Zhanyong Wan

  10. Why JIT Is Not Good Enough • Compiles on method-by-method basis when a method is first invoked • Compilation consumes “user time” • Startup latency • Dilemma: either good code or fast compiler • Gains of better optimization may not justify extra compile time • More concerned w/ generating code quickly than w/ generating the quickest code • Root of problem: compilation is too eager Zhanyong Wan

  11. The Baaad Way to Optimize • People try to help: the optimization lore • Make methods final or static • Large classes/methods • Avoid interfaces (interface method invocation much slower than regular dynamic method dispatch) • Avoid creating lots of short-lived objects • Avoid synchronization (very expensive) • Against good OO design! • “Premature optimization is the root of all evil.” (Donald Knuth) Zhanyong Wan

  12. The HotSpot Way to Optimize • Optimize only when you know you have a problem • A program starts off being interpreted • A profiler collects run-time info in the background • After a while, a set of hot spots is identified • A thread is launched to compile the methods in the hot spots • Execution of the program is *not* blocked • “Take your time!” – fully optimizing • Take advantage of the late compilation: run-time info used • Once a method is compiled, it doesn’t need to be interpreted • Native code can be discarded when the hot spots change • Keeping the footprint small • Bytecode is always kept around Zhanyong Wan

  13. The HotSpot Way (cont’d) • Tackles each of the bottlenecks • Adaptive optimization • Fast, accurate garbage collection • Fast thread synchronization • Performance • 2-3 times faster than JITs • Comparable to C++ • Most importantly, eliminates the “performance excuse” for poor designs/code Zhanyong Wan

  14. The HotSpot Architecture • Memory model • Thread model • Adaptive compiler Zhanyong Wan

  15. The HotSpot Memory Model • Object references • Java 2 SDK: as indirect handles • Relocating objects made easy • A significant performance bottleneck • HotSpot: as direct pointers • A performance boost • GC must adjust all reference to an object when it is relocated • Object headers • Java 2 SDK: 3-word • HotSpot: 2-word • 2 bits for GC mark (reference count removed?) • An 8% savings in heap size Zhanyong Wan

  16. Garbage Collection Background • GC traditionally considered inefficient • Takes 1/6 of the time in an interpreting JVM • Even worse in a JIT VM • Modern GC technology • Performs substantially better than explicit freeing • How can this be true? • Unnecessary copies avoided • Memory segmentation, space locality Zhanyong Wan

  17. The HotSpot Garbage Collector • A high-level GC framework • New collection algorithms can be “plugged-in” • Currently has 3 cooperating GC algorithms • Major features • Fast allocation and reclamation • Fully accurate: guarantees full memory reclamation • Completely eliminates memory fragmentation • Incremental, no perceivable pauses (usually < 10ms) • Small memory overhead • 2-bit GC mark per object • 2-word object header (instead of 3- in Java 2 SDK) Zhanyong Wan

  18. The HotSpot GC: Accuracy • A partially accurate (conservative) collector must • Either avoid relocating objects • Or use handles to refer indirectly to objects (slow) • The HotSpot collector • Fully accurate • All inaccessible objects can be reclaimed • All objects can be relocated • Eliminates memory fragmentation • Increases memory locality Zhanyong Wan

  19. The HotSpot GC: the Structure • Three cooperating collectors • A generational copying collector • For short-lived objects • A mark-compact “old object” collector • For longer-lived objects when the live object set is small • An incremental “pauseless” collector • For longer-lived objects when the live object set is big Zhanyong Wan

  20. Generational Copying Collector • Observation: the vast majority (often > 95%) of the objects are very short-lived • The way it works • A memory area is reserved as an object “nursery” • Allocation is just updating a pointer and checking for overflow: extremely fast • By the time the nursery overflows, most objects in it are dead; the collector just moves the few survivors to the “old object” memory area Zhanyong Wan

  21. Mark-Compact Collector • Rare case • Triggered by low-memory conditions or programmatic requests • Time proportional to the size of the set of live objects • Calls for an incremental collector when the size is large Zhanyong Wan

  22. Incremental Pauseless Collector • An alternative to the mark-compact collector • Relatively constant pause time even w/ extremely large data set • Suitable for server applications and soft-real time applications (games, animations) • The way it works • The “train” algorithm • Breaks up GC pauses into tiny pauses • Not a hard-real time algorithm: no guarantee for upper limit on pause times • Side-benefit: better memory locality • Tends to relocate tightly-coupled objects together Zhanyong Wan

  23. The HotSpot Thread Model • Native thread support • Currently supports Solaris & 32bit Windows • Preemption • Multiprocessing • Per-thread activation stack is shared w/ native methods • Fast calls between C and Java Zhanyong Wan

  24. Thread Synchronization • takes 1/6 of the time in an interpreting JVM • (I think) the proportion can be even higher for a JIT • HotSpot’s thread synchronization • Ultra-fast (“a breakthrough”) • Constant time for all uncontended (no rival) synch • Fully scalable to multiprocessor • Makes fine-grain synch practical, encouraging good OO design Zhanyong Wan

  25. Adaptive Inlining • Method invocations reduce the effectiveness of optimizers • Standard optimizers don’t perform well across method boundaries (need bigger block of code) • Inlining is the solution • Inlining has problems • Increased memory foot-print • Inlining is harder w/ OO languages because of dynamic dispatching (worse in Java than in C++) • HotSpot uses run-time information to • Inline only the critical methods • Limit the set of methods that might be invoked at a certain point Zhanyong Wan

  26. Dynamic Deoptimization • Simple inlining may violate the Java semantics • A program can change the patterns of method invocation • Java program can change on the fly via dynamic class loading/discarding • Optimizations may become invalid • Must be able to deoptimize dynamically! • HotSpot can deoptimize (revert back to bytecode?) a hot spot even during the execution of the code for it. Zhanyong Wan

  27. Fully Optimizing Compiler • Performs all the classic optimizations • Dead code elimination • Loop invariant hoisting • Common sub-expression elimination • Constant propagation • And more … • Java-specific optimizations • Null-check elimination • Range-check elimination • Global graph coloring register allocator • Highly portable • Relying on a small machine description file Zhanyong Wan

  28. Transparent Debugging & Profiling Semantics • Native code generation & optimization fully transparent to the programmer • Uses two stacks • One real, one simulating • Overhead of two stacks? • Pure bytecode semantics: easy debugging & profiling • Question: what’s the point of a transparent profiling semantics? Zhanyong Wan

  29. Performance Evaluation • Micro-benchmarks: not the way • No or few method calls/synchronizations • Small live data set • No correlation w/ real programs • Give unrealistic results for HotSpot • SPEC JVM98 benchmark • The only industry-standard benchmark for Java • Predictive of the performance across a number of real applications Zhanyong Wan

  30. Where are the ideas from? • Mostly from the last decade’s academic work • Dynamic compilation • Modern GC • HotSpot puts them together • Academic research is relevant! Zhanyong Wan

  31. (My) Conclusions • HotSpot is great • Many new technologies previously only seen in academia • Java performance may come close to or exceed the current implementation of C++ • However Sun’s argument that Java can be faster than C++ is not convincing yet: • C++ has better control on machine resources • Many technologies used in HotSpot can be exploited for C++ as well. Especially: • Fast synchronization • Dynamic compilation • Maybe GC (for some dialects of C++) • Whether Java can exceed C++ remains to be tested Zhanyong Wan

More Related