1 / 51

uPortal Performance Optimization

uPortal Performance Optimization. Faizan Ahmed Architect and Engineering Group Enterprise Systems & Services RUTGERS faizan @rutgers.edu. What to Measure for Performance. Performance Planning What to measure (Set targets) Application Throughput Response time Memory Sizes

tibor
Download Presentation

uPortal Performance Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. uPortal Performance Optimization Faizan Ahmed Architect and Engineering Group Enterprise Systems & Services RUTGERS faizan@rutgers.edu

  2. What to Measure for Performance • Performance Planning • What to measure (Set targets) • Application Throughput • Response time • Memory Sizes • Transaction rate

  3. How to measure performance • Load testing • User Perception • Profiling.

  4. When to Optimize? • Optimize at the analysis and design stage • Determine general characteristics of objects, data, and users • Identify probable performance limitations from the determined specifications. • Eliminate any performance conflicts by extending, altering or restating the specifications.

  5. When Not to Optimize? • Do not optimize at code writing (implementation) stage .

  6. GC History • Garbage Collection (GC) has been around for a while (1960’s LISP, Smalltalk, Eiffel, Haskell, ML, Scheme and Modula-3) • Went mainstream in the 1990’s with Java (then C#) • JVM implementations have improved on GC algorithms/speed over the past decade

  7. GC Benefits/Cost • Benefits • Increased reliability • Decoupling of memory mgmt from program design • Less time spent debugging memory errors • Dangling points/memory leaks do not occur (Note: Java programs do NOT have memory leaks; “unintentional object retention” is more accurate) • Costs • Length of GC pause • CPU/memory utilization

  8. GC Options • Sun’s 1.3 JDK included 3 GC strategies • 1.4 JDK includes 6 and over a dozen command line options • Application type will demand strategy: • Real-time – short and bounded pauses • Enterprise – may tolerate longer or less predictable pauses in favor of higher throughput

  9. GC Phases • GC has two main phases: • Detection • Reclamation • Steps are either distinct: • Mark-Sweep, Mark-Compact • … or interleaved • Copying

  10. Fundamental GC Property “When an object becomes garbage, it stays garbage.”

  11. Reachability • Roots – reference to object in a static variable or local variable on an active stack frame • Root Objects – directly reachable from roots • Live Objects – all objects transitively reachable from roots • Garbage Objects – all other objects

  12. Reachability Example Runtime Stack Heap ref ref x x x

  13. GC Algorithms • Two basic approaches: • Reference Counting – keep a count of references to an object; when a count of zero, object is garbage • Tracing – trace out the graph of objects starting from roots and mark; when complete, unmarked objects are garbage

  14. Reference Counting • An early GC strategy • Advantages: • can be run in small chunks of time • Disadvantages: • does not detect cycles • overhead in incrementing/decrementing counters • Reference Counting is currently out-of favor

  15. Tracing • Basic tracing algorithm is known as mark and sweep • mark phase – GC traverses reference tree, marking objects • sweep phase – umarked objects are finalized/freed

  16. 1. mark-sweep start x Root x x x x 2. end of marking x aRoot x a x x a a x 3. end of sweeping Root Mark-Sweep Example

  17. Mark-Sweep • Problem is fragmentation of memory • Two strategies include: • Mark-Compact Collector – After mark, moves live objects to a contiguous area in the heap • Copy Collector – moves live objects to a new area

  18. 1. end of marking 2. end of compacting x aRoot x a x x a a x Root Mark-Compact Collector Example

  19. Copy Collector Example 1. copying start From x Root x x x x Unused To 2. end of copying Unused To From Root

  20. Performance Characteristics • Mark-Compact collection is roughly proportional to the size of the heap • Copy collection time is roughly proportional to the number of live objects

  21. Copying • Advantages: • Very fast…if live object count is low • No fragmentation • Fast allocation • Disadvantages: • Doubles space requirements – not practical for large heaps

  22. Observations • Two very interesting observations: • Most allocated objects will die young • Few references from older to younger objects will exist • These are known as the weak generational hypothesis

  23. Generations • Heap is split into generations, one young and one old • Young generation – all new objects are created here. Majority of GC activity takes place here and is usually fast (Minor GC). • Old generation – long lived objects are promoted (or tenured) to the old generation. Fewer GC’s occur here, but when they do, it can be lengthy (Major GC).

  24. Sun HotSpot Heap Layout Survivor Spaces Eden From To Young Old Permanent

  25. Before Minor GC Survivor Spaces Eden From To Unused x x x x x x x x x x Young Old Permanent

  26. After Minor GC Survivor Spaces Eden To From Empty Unused Young Old Permanent

  27. GC Notes • Algorithm used will vary based on VM type (e.g., client/server) • Algorithm used varies by generation • Algorithm defaults change between VM releases (e.g., 1.4 to 1.5)

  28. Young Generation Collectors • Serial Copying Collector • All J2SEs (1.4.x default) • Stop-the-world • Single threaded • Parallel Copying Collector • -XX:+UseParNewGC • JS2E 1.4.1+ (1.5.x default) • Stop-the-world • Multi-threaded • Parallel Scavenge Collector • -XX:UseParallelGC • JS2E 1.4.1+ • Like Parallel Copying Collector • Tuned for very large heaps (over 10GB) w/ multiple CPUs • Must use Mark-Compact Collector in Old Generation

  29. Serial vs. Parallel Collector Parallel Collector Serial Collector stop-the-world pause

  30. Old Generation Collectors • Mark-Compact Collector • All J2SEs (1.4.x default) • Stop-the-world • Single threaded • Train (or Incremental) Collector • -Xincgc • About 10% performance overhead • All J2SEs • To be replaced by CMS Collector • Concurrent Mark-Sweep (CMS) Collector • -XX:+UseConcMarkSweepGC • J2SE 1.4.1+ (1.5.x default (-Xincgc)) • Mostly concurrent • Single threaded

  31. Mark-Compact vs. CMS Collector Concurrent Mark-Sweep Collector Mark-Compact Collector stop-the-world pause initial mark concurrent marking remark concurrent sweeping

  32. Intergenerational References • What if an object in the older generation references an object in the younger generation? • Add old-to-young references to root set Young Generation Old Generation x x x x x Root set of references

  33. JDK 1.4.x Heap Option Summary Generation Low Pause Collectors Throughput Collectors Heap Sizes 1 CPU 2+ CPUs 1 CPU 2+ CPUs Young Serial Copying Collector (default) Parallel Copying Collector -XX:+UseParNewGC Copying Collector (default) Parallel Scavenge Collector -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy -XX:+AggressiveHeap -XX:NewSize -XX:MaxNewSize -XX:SurvivorRatio Old Mark-Compact Collector (default) Concurrent Collector -XX:+UseConcMarkSweepGC Mark-Compact Collector (default) Mark-Compact Collector (default) -Xms -Xmx Permanent Can be turned off with –Xnoclassgc (use with care) -XX:PermSize -XX:MaxPermSize

  34. Heap Tuning • Analyze application under realistic load • Run with –verbose:gc and other options to identify GC information • Select GC engine based on need • Size areas of heap based on application profile – e.g., an app that creates many temporary objects may need a large eden area

  35. Verbose Minor GC Example 62134.872: [GC {Heap before GC invocations=1045: Heap def new generation total 898816K, used 778761K eden space 749056K, 100% used from space 149760K, 19% used to space 149760K, 0% used tenured generation total 1048576K the space 1048576K, 24% used compacting perm gen total 32768K, used 26906K the space 32768K, 82% used 62134.880: [DefNew Desired survivor size 138018816 bytes, new threshold 16 (max 16) - age 1: 3718312 bytes, 3718312 total - age 2: 5935096 bytes, 9653408 total - age 3: 2614616 bytes, 12268024 total - age 4: 1426056 bytes, 13694080 total - age 5: 2095808 bytes, 15789888 total - age 6: 3996288 bytes, 19786176 total - age 7: 550448 bytes, 20336624 total - age 8: 4058384 bytes, 24395008 total : 778761K->23823K(898816K), 0.1397140 secs] 1035112K->280173K(1947392K) Heap after GC invocations=1046: Heap def new generation total 898816K, used 23823K eden space 749056K, 0% used from space 149760K, 15% used to space 149760K, 0% used tenured generation total 1048576K, used 256350K the space 1048576K, 24% used compacting perm gen total 32768K, used 26906K the space 32768K, 82% used } , 0.1471464 secs]

  36. Verbose Major GC Example 199831.706: [GC {Heap before GC invocations=9786: Heap def new generation total 898816K, used 749040K eden space 749056K, 99% used from space 149760K, 0% used to space 149760K, 0% used tenured generation total 1048576K, used 962550K the space 1048576K, 91% used compacting perm gen total 32768K, used 27040K the space 32768K, 82% used 199831.706: [DefNew: 749040K->749040K(898816K), 0.0000366 secs]199831.707: [Tenured: 962550K->941300K(1048576K), 3.6172827 secs] 1711590K->1653534K(1947392K) Heap after GC invocations=9787: Heap def new generation total 898816K, used 712233K eden space 749056K, 95% used from space 149760K, 0% used to space 149760K, 0% used tenured generation total 1048576K, used 941300K the space 1048576K, 89% used compacting perm gen total 32768K, used 27007K the space 32768K, 82% used } , 3.6185589 secs]

  37. Programming Tips • No need to call System.gc() • Finalizers must be used with care! • The use of Object pools can be debated • Learn to use ThreadLocal for large Objects (but use with care) • Be cognizant of the number/size of Objects being created • Use caches judiciously (e.g., WeakHashMap’s make bad caches under a heavily loaded system)

  38. Conclusions • GC can be your friend (or enemy) • Put your app on an Object diet • Should always monitor an app with -verbose:gc -XX:+PrintGCDetails • Tune your heap only if there is a problem • A large heap may slow an app (more memory is not always good)

  39. Demonstrate Heap tuning JVM heap tuning was demonstrated for uPortal running on local machine using JDK 1.5.06 . (one hour)

  40. Demonstrate Memory leak A live demonstration was done to show the techniques of finding memory related issues in a java application. uPortal 2.5.2 code was used for examples. (1 hour and 30 min).

  41. Techniques for production env. • lsof command • File descriptors • Verbose gc • Kill -3 • Tune gc

  42. Deployment Architecture/scalability • Capacity planning • Clustering/Load Balancing

  43. Capacity Planning • Today - ~ 3,000 users • Fall 2003 - ~10,000 users • Spring 2004 - ~50,000 • Summer 2004 - ~75,000 • Fall 2004 – 300,00+ • For capacity planning, we recommend using 8% to estimate the peak number of concurrent users (~800 users)

  44. Requirements • SSL – all sessions will be through HTTPS • Maximum response time for a single page request - 5s • Availability – 24x7

  45. One powerful machine with everything

  46. Apache on one machine load balancing uPortal/Tomcat on smaller boxes Database Apache Other Backend Systems Tomcat/uPortal

  47. External cookie-based load balancer with Apache/Tomcat on smaller boxes Database Load Balancer/ SSL Encryption Engine Other Backend Systems Apache/ Tomcat/uPortal

  48. Hardware Chart

  49. Choices • SSL Acceleration • 1 Big Box or Many Little Boxes • Apache Load Balance or “Layer 4 Switch” • Linux/Intel or Solaris/Sparc

  50. ??????????

More Related