1 / 36

uPortal Performance & Memory Issues

uPortal Performance & Memory Issues. Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey. Description of Problem. Amount of memory consumed by uPortal grows consistently Continues to consume memory until there is no memory left

calais
Download Presentation

uPortal Performance & Memory Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. uPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey

  2. Description of Problem • Amount of memory consumed by uPortal grows consistently • Continues to consume memory until there is no memory left • Application stops working properly and hangs • Consistent with definition of a memory leak

  3. Background • Launched myRutgers on uPortal 2.3 • Issue was not seen in our QA • Seeing issue in production since November 2004

  4. Background • Also seen in production by: • Yale University • University of Louisiana at Lafayette • University of California at Irvine • Cornell University

  5. Temporary Workaround • Monitor memory usage of uPortal • When memory drops below 5% bounce JVM.

  6. Issues with Workaround • May be too aggressive • In some cases, JVM may be able to garbage collect • Causes users on that JVM to lose their session • If miss window of opportunity to restart, can take down Apache also

  7. Issues with Workaround • Ultimately, does nothing to resolve memory issue. • Just makes it barely livable

  8. History of Fixes • Removed caching of IPersons from PersonDirectory • CError and CSecureInfo now pass events to wrapped channels. • Restrict access to ChannelFactory’s channel cache, synchronized instantiateChannel method. • Guest sessions created on time out • AbstractMultithreadedChannels were not cleaning out their channel state maps (2 of them).

  9. But…. • 3 Months later, issue still exists. • Previous steps solved memory leaks but still more exist. • The search continues…

  10. What’s Happening Today • Renewed effort to search for memory leaks • Initial Steps taken: • Retooling of Load Tests • Production Snapshots • Incremental Updates • Re-affirming that loadtest system matches production system

  11. Retooling of Load Tests • Attempt to mimic more closely what a user does in production. • More custom layouts • Less people logging out • Hitting more popular channels more aggressively

  12. Retooling of Load Tests • Attempt to accomplish same throughput • Determine average user session length • Determine rate at which users access system

  13. Retooling of Load Tests • Bought test system with same specs/setup as production systems • Ensure database optimizations are the same • Ensure uPortal configuration is the same (i.e. StatsRecorder)

  14. Production Snapshots • Only seeing issue in production • Need to capture production snapshots • JVM Heap Size initially set at 2 GB

  15. Production Snapshots • Lowered JVM Heap Size to 128 MB on machine • Allows us to compare snapshots • When memory reaches 10% take it out of load balancing rotation • Garbage Collect

  16. Production Snapshots • Capture snapshot • Wait past session timeout • Currently set at 15 minutes • Garbage Collect again • Take new snapshot • Analyze Snapshot

  17. Production Snapshots • What do they tell us? • They help us determine what objects are still in memory • Tells us how much memory they are using • Tells us how much memory items they reference are using

  18. Understanding the Snapshots • Use YourKit Java Profiler to capture memory snapshots • YourKit consists of two parts: • Component that runs on server • Local application to open memory snapshots

  19. Understanding the Snapshots • YourKit tells us: • Reports incoming and outgoing references • Totals for objects of each type • How much memory they consume • Allows us to compare snapshots, showing the deltas of each object type. • uPortal community has about 20 licenses for YourKit

  20. Understanding the Snapshots • Name • Objects • Shallow Size • Retained Size

  21. Understanding the Snapshots • Trace the path to the root of the Garbage Collector • Option of seeing first path or multiple paths • In screenshot, we see first five

  22. Understanding the Snapshots • Example of object from “Retained Size” • Only reason this object still exists is because XRTreeFrag has not been GCed.

  23. Understanding the Snapshots • Comparison of two snapshots (users vs. no users) • See that XRTreeFrag retains number of objects

  24. Understanding the Snapshots • Also comparison of (users vs. no users) • See that UserInstance gets garbage collected, as does ChannelStaticData, etc.

  25. Incremental Updates • In order to determine the impact of changes to the uPortal framework, we’ve adopted an incremental update approach. • We apply one “fix” at a time, and monitor its impact.

  26. Incremental Updates • Currently in production… • Threadpool switch from homegrown to Backport Concurrent • Finalizer in UBC_Webmail • In the queue… • Update to AuthorizationImpl

  27. What’s Happening Today • Recently, flurry of activity on JASIG-DEV list about memory issues. • Backport Concurrent Threadpool • AuthorizationImpl • Finalizers in UBC_Webmail

  28. What’s Happening Today • Backport Concurrent Thread Library • Issues with current threadpool • Potential for deadlock or infinite loop • Potential for cleanup to fail in thread workers • UnboundedThreadpool that extends BoundedThreadpool

  29. What’s Happening Today • Backport Concurrent Thread Library (cont) • Action Item • Aaron wrote patch against HEAD to replace thread library • Rutgers manually applied patch to 2.4.1 and placed into production. • Result: • Undetermined: Most students were on Spring Break • Preliminary results indicate may offer performance benefit rather than memory leak fix

  30. What’s Happening Today • AuthorizationImpl • Current Issues • Retaining references to principals • No explicit removal of principal from cache • Copying of map on each newPrincipal call that results in a new principal

  31. What’s Happening Today • AuthorizationImpl • Action Item • Rutgers volunteered to provide fix for HEAD • Fix consists of replacing current newPrincipal method and replacing HashMap with a cache • Patch is scheduled to be loadtested and placed into production • Patch is scheduled to be committed to uPortal HEAD on successful test and deployment

  32. What’s Happening Today • AuthorizationImpl • Consequences of Changes • Introduced a CacheFactory • Not specific to any one part of uPortal • CacheFactory is interface (plug your own in!) • Default CacheFactory using WhirlyCache • Allows for declaring cache settings and policy in XML • Allows for fine-grained caching strategies for each part of uPortal

  33. What’s Happening Today • UBC_Webmail • Issue • Finalizers are not properly cleaning up • Action Item • Rutgers has volunteered to refactor Finalizers

  34. Continuing the Search… • Rutgers, and other members of the uPortal community continue to search for the answer to the memory leaks

  35. What can we do to help? • Finalizer should be a last resort • If a viable open source project exists that fills the requirements, consider using that • Be aware of proper caching (where its needed vs. where its not needed, weak & soft references, etc.) • Avoid circular references wherever possible

  36. The End (finally!) • Any questions, comments, concerns?

More Related