1 / 25

Performance Testing Basics and Troubleshooting

Understand performance testing types, tools, measurements, and key factors for performance engineers. Explore real-life case studies and their root causes.

moscar
Download Presentation

Performance Testing Basics and Troubleshooting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 性能测试那些事儿 刘博 boliu@thoughtworks.com .

  2. 3 3 WHERE WE ARE BASIC CONCEPT 1 TROUBLESHOOTING 2 1

  3. WHATISPERFORMANCETESTING? • To determine how a system performs in terms of responsiveness and stability under a particular workload.  • To investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

  4. PERFORMANCE TESTING TYPES • Load testing • Configuration testing • Isolation testing • Capacity testing • Stress testing • Soak testing • Spike testing

  5. PERFORMANCETESTINGTOOLS • Neoload, LoadRunner • Silk Performer, Rational Performance Tester • LoadUI, Gatling, Grinder, JMeter

  6. PERFORMANCE TESTING S.O.P • Identify Performance Acceptance Criteria • Plan and Design Tests • Identify, Configure and Validate the Test Environment • Implement, Validate and Verify the Test Script • Execute the Test (Warm-up first) • Analyze Results, Tune, and Retest

  7. KEY MEASUREMENTS • Hardware Resource • CPU (Context Switches/sec, Processor Queue Length) • Memory (Pages/sec) • IO (Average Disk Queue Length, Network Usage) or IOPS • Software Resources • Web Server • Database • Customized Performance Counters • xVM • Logs

  8. KEY MEASUREMENTS • Monitoring Tools • AppDynamics, Dynatrace, New Relic, One APM • Performance counter tool along with testing tools or OS • Zabbix, nagois

  9. KEY FACTOR AS/FOR PERFORMANCE ENGINEER • ALL-ROUND • For the target system • Architecture Design • Cluster Configuration • Network Topology • Capacity of Test Agents • Communication

  10. PSEUDO PRODUCT 1

  11. CASE 1 – PHONE INTERVIEW SLOWS DOWN • Key Measurements • Get Sample • Start Interview • Page to Page • End Interview

  12. CASE 1 – PHONE INTERVIEW SLOWS DOWN • Performance degrade ~10% steadily with build 0615, only on page to page time • CPU usage ~10% higher in build 0615 • No such issue with build 0501 • No error in logs • No such issue on Web Interview • ~ 150 bugs fixed between 0501 and 0615 • No performance bug fixed between 0501 and 0615

  13. CASE 1 – ROOT CAUSE • One base class in common framework modified with extra features, which is NOT supposed to be used by Phone Interview, causes unnecessary load/unload operations in Next/Previous page operations • Simulate clicking to Next/Previous page operations is ultra frequently especially under heavy load • Actions?

  14. PSEUDO PRODUCT 1

  15. CASE 2 – WEB INTERVIEW TIMES OUT • Lots of Web Interviews timed out in production randomly • After a restart everything’s fine but as time goes on, the error recurs • Error calling WS method 'Method'. URL 'URL', Error codes: Client 5, HTTP -1, SOAP 0, TCP 10048 • IIS works well • Load is heavy sometimes but not exceeds upper limit • Cannot reproduce with given load/scenario in house • Not related with anti-virus software or firewall

  16. CASE 2 – CONTINUE INVESTIGATION • Increase load and monitor pages/sec from customized counters • Drop down dramatically when the issue reproduced • Then web tier server could only handle interview in slow rate • Drill down to the entire web interview process in back end, i.e. from client, to web server, and then to interview server • Every request to web server will open a new TCP port! • netstat -an

  17. CASE 2 – ROOT CAUSE • TCP port exhaustion on web tier server • Default release time for TCP TIME_WAIT is 4 minutes in Windows • Actions?

  18. PSEUDO PRODUCT 2

  19. CASE 3 – ERRORS IN MULTI-TENANT ONLY • Error occurs in 10 minutes accuratelywith multi-tenant • No such issue with single-tenant • Massive errors in logs - not so helpful • CPU Usage is higher than single-tenant • GC Activity is much higher (5% to 10% in CPU time) • No use to adjust -Xmx since physical memory is not the bottleneck

  20. CASE 3 – CONTINUE INVESTIGATION

  21. CASE 3 – CONTINUE INVESTIGATION • Architect team guarantees this issue is not relevant with single or multiple tenant • System.gc() is called explicitly in code but exists for long time • System.gc() is called only under specified condition out of test scope • Check Oracle Java Doc on GC policy and confirmed using correct one • Check JVM startup parameters with Ops

  22. CASE 3 – CONTINUE INVESTIGATION

  23. CASE 3 – CONTINUE INVESTIGATION

  24. CASE 3 – ROOT CAUSE • JVM startup parameter configuration on multi-tenant • Add -XX:NewRatio to adjust young generation and old generation to avoid frequently GC • Actions?

  25. THANK YOU • Q & A

More Related