1 / 58

Horizontal Scaling and Reliability Planning and Testing for Heavy Load

Horizontal Scaling and Reliability Planning and Testing for Heavy Load. Steven Goeke Bill Frikken. Outline. Project Background Our Motivation Testing Tools, Techniques, and Methods Results Conclusions. Background on Georgia Tech. Six Colleges 16,000 Graduate and Undergraduate

ilario
Download Presentation

Horizontal Scaling and Reliability Planning and Testing for Heavy Load

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Horizontal Scaling and Reliability Planning and Testing for Heavy Load Steven Goeke Bill Frikken

  2. Outline • Project Background • Our Motivation • Testing Tools, Techniques, and Methods • Results • Conclusions

  3. Background on Georgia Tech • Six Colleges • 16,000 Graduate and Undergraduate • 5000 Faculty and Staff • The NSF ranks Tech 2nd in engineering R&D and 4th in industry-sponsored R&D • Four Campuses

  4. Background on WMU • Carnegie Research Extensive Institution • Seven Colleges • Six Regional Campuses • 28,000 Graduates and Undergraduates • 3,500 Faculty and Staff • Business Technology Research Park

  5. Motivation • It started with Wireless Western • Anytime, anywhere access to resources • A better e-communication infrastructure • Multi-platform, open source, end-of-life system • Be innovative with the solutions

  6. And then along came SIS • Replace a much needed student information system • Eliminate Social Security numbers • Budget challenges – student records fee • Take advantage of a portal solution • GoWMU.wmich.edu – portal delivery • Content development in 4 weeks! • SSO (Single Sign-on) capabilities • Seamless access to Banner Self-Serve, WebCT, ECS, …

  7. We Want a Portal!! • Facilitate student/faculty communication • Enhance the student experience • Prestige • uPortal or Luminis • Banner – 9 years • WebCT – 4 years

  8. Motivation • BuzzPort is becoming mission critical • Expanding user base • Cost savings

  9. Current GT Architecture GT Network WebCT Firewall(s) Load Balancer Private Network Banner Self-Service Others Trusted Network Luminis 3.2 Calendar Luminis 3.2 Calendar Luminis 3.2 Calendar Portal DBs Banner Portal DBs Banner Portal DBs Banner Production Test Development

  10. GT FOS Architecture GT Network WebCT Banner Self-Service Others Load Balancer Private Network WS WS WS WS WS WS WS WS WS Firewall(s) Trusted Network Resource Calendar Resource Calendar Resource Calendar Portal DBs Banner Portal DBs Banner Portal DBs Banner Production Test Development

  11. WMU Architecture • What technologies deliver these various services • Sun hardware • Cisco 11503 Load Balancers • StorageTek D280 Storage Area Network • Single enterprise UserID – “Bronco NetID” • Kerberos • LDAP – Sun JES Directory • “Legacy” provisioning services • Multiple web-authentication schemes

  12. WMU 3-tier architecture

  13. Test and production hardware WMU • Test environment • 3 – Sun V210’s – 1.334GHz, 2GB • 1 backend box - PDS • 2 front-end web servers • Production environment • 2 backend boxes – Sun V480’s – 4, 1.0GHz, 8GB • 3 front-end – Sun V210’s – 2, 1.34GHz, 8GB

  14. Performance and growth • Back-end services are clustered and highly redundant • Veritas HA Cluster for JES • Dual drive paths to SAN • Front-end services are load-balanced • Horizontal scaling wherever possible • Multiple SunFire V1xx and V2xx servers

  15. Testing Tools And Techniques • Georgia Tech: • Radview Webload • 200, 500, 1000 Users • Ramp-Up over 30 minutes to target users • Sustain the load for 30 minutes • Simple Agenda: • Login, navigate to a group, post a message, logout • Measure: • Login Time, First Page Time, Average Page Time, and Response Time

  16. GT Load Test 1 • Date: 3/9/2005 • One web server (280R/2x1.2G/2G mem) • Time: 3:06PM – 3:44PM • Duration: 2327.48 sec • 1000 Sessions

  17. GT Load Test 1 - Results • Max Time to First Page – 5.098 sec (1000VC) • Max Login Time – 9.294 sec (1000VC) • Average Time to 1st Page: 2.337 sec • Average Login Time: 2.913 sec

  18. GT Load Test 2 • Date: 3/9/2005 • Three web servers • Time: 4:04PM – 5:06PM • Duration: 3766.32 • 500 Sessions

  19. GT Load Test 2 - Results • Max Time to First Page – 5.098 sec (1000VC) • Max Login Time – 9.294 sec (1000VC) • Average Time to 1st Page: 2.337 sec • Average Login Time: 2.913 sec

  20. Test Tools • JMeter • Apache tool to provide load testing and performance-based testing and evaluation • Badboy • Export functional test for JMeter load testing • 1000 users within 30 minutes

  21. Test Results – WMU initiated • Date: 6/8/2005 • 1000 Users over 30 minutes • Avg Login Time: 3.5 Seconds • Avg Page Load: ~1 second – 2.4 seconds • Max CPU Utilization • 15% Server 1 • 13% Server 2 • Avg Session Activity – 47 seconds

  22. Test Results – SCT initiated • Date: 6/6/2005 • 1000 Users over 4 Hours (20 min ramp up) • Avg Login Time: 3.932 Seconds • Max Login: 4.76 Seconds • Min Login: 2.758 Seconds • Avg Page Load: ~1 second – 2.4 seconds • Max CPU Utilization – 54% Single Server • Session Activity over 4 hours

  23. Test Results – Joint evaluation • Anticipated environment exceeded expectations • 2 Sources provided validation • Confidence moving ahead

  24. Luminis FOS – Features & Limitations • Limited failover capability - No session persistence • Still have single points of failure • Replicate the LDAP • Replicate the DB • Horizontal scalability at web tier • Phased patching

  25. Conclusions • Luminis FOS significant improvement • More complex • Machine allocation • Will we be implementing it?

  26. Next Steps • Test result conclusions • More stable testing environment • Production considerations • Test needs to resemble production • Horizontally scale before putting into production • Removing single points of failure

  27. Critical Success Factors • Top-level support • Good planning • Flexible project plan • Being “big picture” but still attend to details • Solid infrastructure • Relationships

  28. Questions?

  29. Contact Information • Steven Goeke • steven.goeke@oit.gatech.edu • Georgia Tech www.gatech.edu • Buzzport buzzport.gatech.edu • Bill Frikken • bill.frikken@wmich.edu • Western Michigan Universitywww.wmich.edu • GoWMU portal gowmu.wmich.edu • Office of Information Technology www.wmich.edu/oit

  30. Contact Information • Steven Goeke • Bill Frikken • bill.frikken@wmich.edu • Western Michigan Universitywww.wmich.edu • GoWMU portal gowmu.wmich.edu • Office of Information Technology www.wmich.edu/oit

  31. GT Load Test 3 • Date: 3/10/2005 • Three web servers • Time: 1:21PM – 1:47PM • Duration: 1592.66 sec • 1000 Sessions

  32. GT Load TestLogin/1st Page Times

  33. GT Load Test 1 – Page/Connect/ Response Time

  34. GT Load TestLogin/1st Page Times

  35. Load Test 2 – Page/Connect/ Response Time

  36. GT Load Test 3 – results • Max Time to First Page – 4.067 sec (786VC) • Max Login Time – 0.983 sec (76VC) • Average Time to 1st Page: 2.178 sec • Average Login Time: 0.564 sec

  37. GT Load TestLogin/1st Page Times

  38. GT Load Test 3 – Page/Connect/ Response Time

  39. GT Load Test 4 • Date: 3/10/2005 • Three web servers • Time: 2:18PM – 3:15PM • Duration: 3162.2 sec • 200 Sessions

  40. GT Load Test 4 – results • Max Time to First Page – 1.125 sec (34VC) • Max Login Time – 0.406 sec (150VC) • Average Time to 1st Page: 0.803 sec • Average Login Time: 0.283 sec

  41. GT Load TestLogin/1st Page Times

  42. GT Load Test 4 – Page/Connect/ Response Time

  43. Results (Acadia1, CPU)3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)

  44. Results (Acadia1, Free Memory)3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)

  45. Results (Acadia2, CPU)4:04PM-5:06PM(500VC, 3 Tier)

  46. Results (Acadia2, Free Memory)4:04PM-5:06PM(500VC, 3 Tier)

  47. Results (Acadia3, CPU)4:04PM-5:06PM(500VC, 3 Tier)

  48. Results (Acadia3, Free Memory)4:04PM-5:06PM(500VC, 3 Tier)

  49. Results (Biscayne, CPU)3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)

  50. Results (Biscayne, Free Memory) 3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)

More Related