1 / 34

Performance Tuning in a Model 2 Web application

Performance Tuning in a Model 2 Web application. Bill Turner Cornell University wrt1@cornell.edu. Problem Statement, August 2002. Cornell University will begin billing users for actual Internet usage in July 2003

Download Presentation

Performance Tuning in a Model 2 Web application

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Tuningin a Model 2 Web application Bill TurnerCornell Universitywrt1@cornell.edu

  2. Problem Statement, August 2002 • Cornell University will begin billing users for actual Internet usage in July 2003 • A vendor is providing an application that will provide users access to their monthly bills, and information extracted from router logs • An administrative application is needed that will be used by a few hundred network and billing administrators to set ownership and account numbers for IP addresses

  3. Phase 1 • User requirements include that the application be Web based • The vendor application was to be a Java application running on the desktop, but the administrative app would be browser based • The application was being prototyped in Access, but the database would be Oracle • The GUI will be designed by our Web design group (so sample HTML will be provided)

  4. Phase 1 Design Decisions • The application was conceived as a set of JSP’s that would communicate with a BLS through a client object that will provide sessioning • This would use the standard architecture and infrastructure for distributed applications • We could run on the usual servers • Support would be “normal” • Overkill for what we needed to do, but hey, I knew how to do it reasonably fast

  5. Phase 1 design

  6. Phase 1 implementation • Go into production early (October 2002) with a selected group of users • Performance isn’t great • UI doesn’t meet users’ needs the way they thought it would • Now that we’re done, we can start defining the requirements and start the enhancement list

  7. Phase 2 • A little surprise: the vendor application isn’t working out! • New plan: We’ll just put a whole bunch more stuff into the database, and let everybody use the Web application • About 2 orders of magnitude increase in number of users • 2-3x database complexity, and orders of magnitude increase in database size • A few more pages will be needed

  8. Phase 2 • Application complexity also increases, as for each administrative user, we need to distinguish between what is being administered and personal data, and need to provide an interface that doesn’t encumber non-administrative users • Plus of course once we’ve started using it, the administrative users now can contribute meaningfully to the application design

  9. Phase 2 Design Decisions • Since this is now going to be a more complex application, we’ve exceeded the capability of doing it all with JSPs • Maintainability, complexity for the developer • Time to go to Model 2 architecture • All activity being processed by a controller servlet • Servlet communicates with a client object that provides sessioning and through that, the BLS • Servlet directs to the appropriate JSP for display

  10. Model 2 architecture

  11. Phase 2 implementation • In user test, performance is slow (15-30 seconds) but deemed acceptable • UI design involves large nested tables and multiple forms, on one giant page, which nobody wants to give up • Biggest observed performance problem appears to be rendering by browser • Move to production • No complaints received • The next week, the developer goes on vacation

  12. Phase 3 • Begins 3 days after the developer goes away • Performance on big pages (subnets, accounts) for administrative users is terrible • 4 minutes, where we were expecting 15 seconds • The troops are rallied and a lot of changes are made • Performance is still terrible and nobody knows why • The original developer returns to find that fixing whatever is wrong is his highest priority

  13. What issues had been identified? • Some pages were turning out to be much larger than anticipated (6000 rows instead of 600) • Why was this an issue? • Each row on a page produced several parameters in the HTTP request that needed to be processed • Instead of data analysis during the design phase, guesses had been made as to the maximum number of IP addresses in a subnet and the maximum number of IP addresses charged to a single account • This resulted in an order of magnitude more parameters to process than had been expected, or seen in user testing

  14. What issues had been identified?

  15. What issues had been identified? • With the application design, processing of the input parameters included a sort. This was done following a model given by another developer, using a static method in a utility package • Once this was identified as a problem, instead of using the utility method, the application was changed to select the parameters needed to actually process the action, then these were sorted • This cut about 20 seconds off of response time for large forms (time to extract and sort parameters dropped to 300 millisec).

  16. What issues had been identified? • Operations that were fast were still bottlenecks • Looking up information in our LDAP directory • 50 ms execution time seems fast until the user is waiting for it to be done 6000 times (300K ms = 5 minutes) • Caching the results on the BLS meant that at least after the first time a user looked at a page, the response time improved • Added a Properties object and checked there first for the name; if not found, then did the LDAP lookup and added to Properties

  17. What issues had been identified? • A great deal of suspicion about JSP performance • The worst-performing JSPs had been recoded to use static variables, StringBuffers, and MessageFormat <%! protected static String mfdtl05 = "<TD align=… " %> … (in a loop) <% sb.append(MessageFormat.format( mfdtl05, new String[] {var1,var2})); out.println( sb.toString()); %> • This eliminated I/O method calls in the generated Java

  18. What issues had been identified? Method calls in the generated Java for the JSP: (in \tomcat\work\<application> directory, <jsp>n.java) // begin [file="F:\\tomcat\\webapps\\ubb\\User.jsp";from=(29,5);to=(29,34)] if (person.isBillAdmin()) { // end // HTML // begin [file=“ … \\webapps\\ubb\\User.jsp";from=(29,36);to=(30,33)] out.write("\r\n\t\t<TD><INPUT TYPE=\"submit\" NAME=\""); // end // begin [file="F:\\tomcat\\webapps\\ubb\\User.jsp";from=(30,36);to=(30,81)] out.print( Common.CMD_PAGE + Common.PAGE_SUBLEDGERLIST ); // end // HTML // begin [file=“ … \\webapps\\ubb\\User.jsp";from=(30,83);to=(31,3)] out.write("\" VALUE=\"Display SubledgerList">&nbsp;&nbsp;</TD>\r\n\t "); // end // begin [file="F:\\tomcat\\webapps\\ubb\\User.jsp";from=(31,5);to=(31,8)] } // end

  19. What issues had been identified? • A great deal of suspicion about application design • The communication between the Client object and the BLS was CORBA, using standard infrastructure components and following the model of other applications • Objects were all converted to beans when they were retrieved from the CORBA interface • There was suspicion about the conversion – was there excessive object instantiation?

  20. What issues had been identified? • A great deal of suspicion about server performance • When objects were migrated dev/test/prod, they seemed to get slower • This was entirely anecdotal • Systems group did not think this was unreasonable, but was also not very interested in pursuing the issue • An upgrade was already planned, but not in the near future • Browser rendering of nested tables was slow, especially for Netscape

  21. What did the developer do? • Log4j had been added to the project while he was away, so he carried this to an extreme, so there was measurement of each step in each transaction • In servlet, logged time in servlet’s doPost, time to process the form from the request, elapsed time for each Client method call, time spent in each JSP, etc. • request was forwarded, control returned to servlet’s doPost • In Client, logged time in each method call • In BLS, logged time in each method call • And at any spot anybody voiced suspicion about

  22. What did the developer do? boolean checkTiming = false; long start = 0; if (timerLog.isDebugEnabled()) checkTiming = true; … if (checkTiming) start = System.currentTimeMillis(); // code to perform an action if (checkTiming) { long time = System.currentTimeMillis() - start; timerlog.debug("<class>.<method>: <action> took "+ time+" ms."); }

  23. Model 2 architecture Log total time in JSP Log time in each method Log time waiting for BLS Log time in a lot of places

  24. So where was it spending its time? ubb.NetworkBilling - Remote User=wrt1 client.Client - Client.getPerson took 104 ms. ubb.NetworkBilling - Retrieved Person object for wrt1 - William R Turner ubb.NetworkBilling - NetworkBilling.getParam took 2 ms. ubb.NetworkBilling - NetworkBilling.sortCommands: getParameterNames took 0 ms. ubb.NetworkBilling - NetworkBilling.sortCommands: extracting commands took 101 ms. ubb.NetworkBilling - NetworkBilling.sortCommands: sorting took 100 ms. ubb.NetworkBilling - NetworkBilling.sortCommands took 307 ms. client.Client - Client.getRemoteReference took 47 ms. client.Client - Client.getPersonAccounts took 395 ms. ubb.NetworkBilling - NetworkBilling.getParam took 0 ms. client.Client - Client.getRemoteReference took 38 ms. client.Client - Client.getDeptSubledger took 292 ms. client.Client - Client.getRemoteReference took 44 ms. client.Client - Client.getBillingBySubledger: data retrieval took 14008 ms. client.Client - Client.getBillingBySubledger: formatting took 960 ms. client.Client - Client.getBillingBySubledger took 15072 ms. ubb.NetworkBilling - NetworkBilling.prepareSubledgerList took 16209 ms. ubb.NetworkBilling - NetworkBilling.doPost: done gathering data in 19038 ms. ubb.NetworkBilling - NetworkBilling.doPost: got the ServletContext in 198 ms. ubb.NetworkBilling - NetworkBilling.doPost: got the RequestDispatcher in 197 ms. ubb.NetworkBilling- NetworkBilling: forwarding to /SubledgerList.jsp SubledgerList.jsp: execution time was 33944 ubb.NetworkBilling - NetworkBilling.doPost: request dispatcher forward took 34434 ms. ubb.NetworkBilling- NetworkBilling.doPost: total execution time was 53867 ms.

  25. Where was it spending its time? • Did the change of the JSPs to use StringBuffers and MessageFormat instead of just text help? • No, actually it hurt! After all the timing was in, it could be seen that actually 30 seconds went into the JSP alone! • Going back to inline HTML dropped the time required for the JSP formatting the response to 1.5 seconds for the “bad” pages.

  26. Where was it spending its time? • I noticed that the Web server log (earlier slide) indicated that a lot of time was being spent in the Client calls to retrieve data • Corresponding BLS logs showed a lot less time being spent retrieving the data from the database and formatting it 2003-12-01 11:49:54 bls.Server.getBillingBySubledger: 5546 ms • We had a long struggle to explain the disparity • Was it the CORBA infrastructure? • Was it just a big application and that’s how fast the servers were?

  27. Where was it spending its time? • Well, eventually we found that something was wrong with the network configuration on the Web server • This server had been in production for years, and nobody ever noticed because it disproportionately affected large files – up to 5K wasn’t too bad • Discovered this by somebody FTPing a file back and forth to the server • Once these things were corrected, response time on the “bad” pages dropped from 4 minutes to about 30 seconds.

  28. What did we learn? • Most of the performance improvements that had been put in were counterproductive • Need to make changes one at a time - making several changes at once was counterproductive because they tended to cancel each other out • Need to add timing to confirm that your changes are helping

  29. What did we learn? • The prod environment was actually slower than the dev environment • Dev hardware was newer and much faster • Caused problems with the users because the application was more responsive during testing than in production

  30. What did we learn? • Time logging needs to be a part of the application from the beginning • Log4j lends itself to this • Need to time EVERYTHING, not just what you think is slow • Things that we knew were fast could still be a bottleneck • Depends on how many times they are done – are your design assumptions correct? • Things that we thought might be slow were not necessarily a problem

  31. What did we learn? • Not everything we thought we knew was correct! • Let’s look at a timing example for a very well-known performance technique: building strings by using StringBuffers with repeated .append() calls, then a final .toString() call, instead of repeatedly concatenating Strings

  32. Timing example

  33. Lessons learned • You can’t optimize what you don’t measure • The things even senior developers know to be true aren’t necessarily true • Need to tune for the actual servers you will be running on • JVM makes a huge difference! • UI design can have vastly more impact on perceived performance than anything that can be done on the back end

  34. Questions?

More Related