1 / 25

Exploring Multi-Threaded Java Application Performance on Multicore Hardware

This presentation explores the performance of multi-threaded Java applications on multicore hardware, examining the impact of thread scheduling and hardware resources on performance. The study includes experiments with frequency scaling, isolation of threads, and pairing of application and collection threads. Gain insights on how to optimize performance for multi-threaded applications on multicore hardware.

stamps
Download Presentation

Exploring Multi-Threaded Java Application Performance on Multicore Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Multi-Threaded Java ApplicationPerformance on Multicore Hardware Exploring Multi-Threaded Java ApplicationPerformance on Multicore Hardware Jennifer B. Sartor, LievenEeckhout Ghent University, Belgium OOPSLA 2012 presentation – October 24th 2012

  2. Modern Software & Hardware • Managed languages • Ubiquitous, but added runtime layer • Many service threads interact with application • JIT compilation, on-stack replacement, collector • Stop the application, possibly critical • Share hardware resources • Multicore with multiple sockets • How do we schedule threads with constrained resources? • Scale core frequency for power • Use caches of all sockets, or limit communication

  3. Extensive Performance Study • Multi-threaded Java application on multicore, multi-socket hardware • Large space to explore • Number of threads • Thread-to-core/socket mapping • Pairing or isolating application and JVM threads • Pinning • Impact of frequency scaling • Difference between startup and steady state How do choices with scheduling and hardware resources affect performance?

  4. Experimental Machine: Nehalem Scale frequency per socket to 1.596 or 3.059 GHz

  5. Gain Insight on Scheduling • Application • Java Virtual Machine • Garbage collector • Just-in-time compiler with on-stack replacement • Cao, et al. [ISCA 2012] studied JVM amenability to heterogeneity by measuring service threads’ performance per energy • We study end-to-end performance

  6. Roadmap • Cost of Isolation • Frequency Scaling Socket 1 Socket 0 • Pairing Threads Socket 1 Socket 0 Socket 1 Socket 0

  7. Experimental Methodology • Jikes Research Virtual Machine (Dec 2011) • Generational Immix collector • 1.5, 2, and 3x minimum heap sizes • Multithreaded DaCapo benchmarks 9.12-bach • Avrora, lusearch (with fix), pmd, sunflow, xalan • Also, pseudojbb2005 • Timed 10 invocations • Steady state, measure 15th iteration • Startup, measure 1st iteration

  8. Baseline Setup Application threads JVM service threads Pin application & collection threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Collection Compilation Socket 0 Socket 1

  9. Boosting Socket Frequency 1.596 3.059 GHz Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 27-50% improvement in execution time Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Socket 0 Socket 1

  10. Exploring The Cost of Isolation Collection threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1

  11. Isolating Collection Threads Isolating collector does not significantly hurt performance

  12. Exploring The Cost of Isolation Compiler thread Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1

  13. Isolating Compiler Thread at Startup Isolating compiler at startup has little impact

  14. Isolating On-Stack-Replace at Startup Isolating OSR at startup improves performance

  15. Exploring The Cost of Isolation All JVM service threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1

  16. Isolating All JVM Threads Isolating service threads only significantly hurts one benchmark

  17. Exploring Frequency Scaling Baseline: JVM service threads isolated, all cores at highest frequency Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Socket 0 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 1

  18. Exploring Frequency Scaling Lower frequency of JVM service threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 2 Nehalem Core 3 Nehalem Core 5 Nehalem Core 0 Nehalem Core 6 Nehalem Core 7 Nehalem Core 1 versus Lower frequency of application threads Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7

  19. Lower Frequency: Collector vs App Lowering collector frequency affects performance 5x less than for application

  20. Lower Freq at Startup: Compiler vs App Lowering compiler frequency is not detrimental compared to application

  21. Lower Frequency: JVM vs App Lowering JVM frequency affects performance 5x less than for application

  22. Exploring Pairing Threads Pair application and collection threads Nehalem Core 0 Nehalem Core 1 Nehalem Core 2 Nehalem Core 3 Nehalem Core 4 Nehalem Core 5 Nehalem Core 6 Nehalem Core 7 Socket 0 Socket 1

  23. Pairing App & Collector, 2 Sockets With all but avrora, pairing application and collector performs best

  24. Overall Performance Comparison Either use 1 socket, or isolate compiler thread

  25. Conclusions: Scheduling Insights • 1 socket: # application = # collection threads • 2 sockets: • Isolate compilation thread • Pair application and collection threads • Set # application threads = # cores, fewer collection threads • Increasing application frequency is more important than for JVM service threads • Analyzed Java performance given hardware resources

More Related