1 / 37

CSC407: Software Architecture Summer 2006 Performance

CSC407: Software Architecture Summer 2006 Performance. Greg Wilson BA 3230 gvwilson@cs.utoronto.ca. Introduction. Getting the right answer is important Getting the right answer quickly is also important If we didn’t care about speed, we’d do things by hand

dawn
Download Presentation

CSC407: Software Architecture Summer 2006 Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC407: Software ArchitectureSummer 2006Performance Greg Wilson BA 3230 gvwilson@cs.utoronto.ca

  2. Introduction • Getting the right answer is important • Getting the right answer quickly is also important • If we didn’t care about speed, we’d do things by hand • Choosing the right algorithm is part of the battle • Choosing a good architecture is the other part • Only way to tell good from bad is to analyze and measure actual performance

  3. Example: File Server • Dedicated server handing out PDF and ZIP files • One CPU • 4 disks: PDFs on #1 and #2, ZIPs on #3 and #4 • Have to know the question to get the right answer • How heavy a load can it handle? • Would it make more sense to spread all files across all disks?

  4. We Call It Computer Science… • …because it’s experimental • Collect info on 1000 files downloaded in 200 sec

  5. Summary Statistics • Analyze all 1000 downloads in a spreadsheet • Yes, computer scientists use spreadsheets… • We’re justified in treating each type of file as a single class

  6. Modeling Requests • The concurrency level is the number of things of a particular class going on at once • Estimate by adding up total download time for PDF and ZIP files separately, and dividing by the actual elapsed time • NPDF = 731.5/200 = 3.7 • NZIP = 3207.7/200 = 16.1 • Round off: download ratio is 4:1

  7. Measuring Service Demands • What load does each request put on the disk and CPU? • Create N files of various sizes: 10KB, 100KB, 200KB, …, 1GB • Put them on a single-CPU, single-disk machine • That’s doing nothing else • Measure download times • TCPU = 0.1046σ – 0.0604 • Hm… • Tdisk = 0.4078σ + 0.2919

  8. Back To The Data • Use Mean Value Analysis to calculate service demands • Remember to divide disk requirements by 2

  9. Thinking With Pictures

  10. …Thinking With Pictures

  11. Observations • After ~20 users, the server saturates • Maximum throughput for PDF files: • 12 files/sec in original configuration • 5 files/sec in balanced configuration • Maximum throughput for ZIP files: • 4.2 files/sec in original configuration • 6.6 files/sec in balanced configuration

  12. Service Level Agreements • SLA requires average download times of 20 sec (ZIP files) and 7 sec (PDF files) • Original configuration: ZIP threshold reached at approximately 100 users, when PDF download time still only ~3 sec • Balanced configuration: ZIP threshold reached at ~165 users, and PDF download time is 6.5 sec • Balanced configuration is strictly superior

  13. How Did We Do That? • Key concern is quality of service (QoS) • Throughput: transactions/second, pages/second, etc. • Response time • And variation in response time • People would rather wait 10 minutes every day than 1 minute 9 days, and 20 minutes the tenth • Availability • 99.99% available = 4.5 minutes lost every 30 days • That’s not good enough for 911

  14. A Simple Database Server • Circles show resources • Boxes show queues • Throughput and response times depend on: • Service demand: how much time do requests need from resources? • System load: how many requests are arriving per second? CPU disk

  15. Classes of Model • An open class is specified by the rate at which requests arrive • Throughput is an input parameter • A closed class is specified by the size of the customer population • E.g., total number of queries to be processed, or total number of system users • Throughput is an output • Can also have load-dependent and load-independent resources, mixed models, etc.

  16. Values We Can Measure • T: length of observation period • K: number of resources in the system • Bi: total busy time of resource i in observation period • Ai: number of request arrivals for resource i • A0 is total number of request arrivals for whole system • Ci: number of service completions for resource i • C0 is completions for whole system • In steady state for large T, Ai = Ci

  17. Values We Can Calculate • Si: mean service time at resource i (Bi/Ci) • Ui: utilization of resource i (Bi/T) • Xi: throughput of resource i (Ci/T) • In steady state, Xi = Ai = Ci = λ • Vi: average visit count for resource i (Ci/C0)

  18. Utilization Law • Utilization Ui = Bi/T = (Bi/Ci)/(T/Ci) • But Bi/Ci is Si, and T/Ci is just 1/λ • So Ui = λSi • I.e., utilization is the throughput times the service time, which makes sense

  19. Service Demand Law • Service demand Di is the total average time required per request from resource i • Di = UiT/C0 • I.e., fraction of time busy, times total time, over number of requests • But UiT/C0 = Ui/(C0/T) = Ui/ λ • I.e., service demand is utilization over throughput • Ui/X0 = (Bi/T)/(C0/T) = Bi/C0 = ViSi • So service demand is average number of visits times mean service time per visit

  20. Little’s Law • Average number of requests being processed at any time = throughput × average time each request stays in the system • So: • 0.5 requests per second (= throughput) • 10 second response time (= time each request stays in system) • There must be 5 servers

  21. Interactive Response Time Law • S clients accessing a database • Each client thinks for Z seconds between requests • Average database response time is R seconds • If M is the average number of clients thinking, and N is the average number of requests at the database, then S = M+N • Little’s Law applied to clients: M = λZ • Little’s Law applied to database: N = λR • So M+N = S = λ(Z+R) • Or R = S/ λ - Z

  22. The Weakest Link • X0 = Ui/Di 1/Di for all resources • So X0 1/max{Di} • Remember Little's Law: N = RX0 • I.e., number of concurrent transactions is response time  throughput • But R is at least the sum of the service demand times • So N  (Di) X0 • Or X0 N/(Di) • So X0 min[1/max{Di}, N/(Di)]

  23. Amdahl's Law • Let: • t1 be a program’s runtime on one CPU • tp be its runtime on p CPUs • ß be the algorithm’s serial fraction

  24. …Amdahl's Law • Example: • Want 32 speedup on 64-processor machine • So ß must be 0.984 • I.e., 98% of the code must run in parallel • Ouch • What if only half the code can run in parallel? • s32 is 1.97 • Ouch again

  25. Hockney's Measures • Every pipeline has some startup latency • So characterize pipelines with two measures: • r is the rate on an infinite data stream • n1/2 is the data volume at which half that rate is achieved • Improve real-world performance by: • Increasing throughput • Decreasing latency r n

  26. Some Quotations • Philosophers have only interpreted the world in various ways; the point, however, is to change it. • Karl Marx • You cannot manage what you do not measure. • Bill Hewlett • Measure twice, tune once. • Greg Wilson

  27. A Simple CGI 5.1 browser /var/apache/httpd 5.3 /local/bin/python 3.3 2.7 /site/cgi-bin/app.cgi 1.8 0.7 0.2 disk I/O /usr/bin/psql 0.3

  28. How Did I Get These Numbers? • Shut down everything else on the test machine • Use ps and truss on Unix • sysinternals.org has lots of tools to help you find things • Use a script instead of a browser • Insert timers in Python and recompile • Could wrap in a timing script, but that distorts things • Measure import times in my own script • Rely on PostgreSQL's built-in monitors • Use a profiler

  29. Profiling • A profiler is a tool that can build a histogram showing how much time a program spent where • Can either instrument or sample the program • Both affect the program's performance • The more information you collect, the more distortion there is • Heisenberg's Law • Most can accumulate data over many program runs • Often want to distinguish the first run(s) from later ones • Caching, precompilation, etc.

  30. …Profiling

  31. A Simple CGI Revisited Can't do much about this 0.2 5.1 browser /var/apache/httpd 5.3 1.8 fork/exec is expensive /local/bin/python 3.3 import 0.6 what's going on here? 2.7 /site/cgi-bin/app.cgi 0.9 1.8 waiting out turn at DB 0.7 0.2 disk I/O /usr/bin/psql 0.3 how many transactions? are they one class?

  32. Room for Improvement • Forking a new Python interpreter for each request is expensive • So keep an instance of Python running permanently beside the web server, and re-initialize it for each request • FCGI/SCGI • Tomcat is usually run this way • The ability to do this is one of the reasons VM-based languages won the server wars

  33. …Room for Improvement • Reimporting the libraries is expensive, too • Rely on cached .pyc files • Or rewrite application around a request-handling loop • Modularity is your friend • Tightly-coupled components cannot be tuned independently • On the other hand, machine-independent code has machine-independent performance

  34. Too Much of a Good Thing

  35. After Our Changes was 5.3 0.2 2.6 browser /var/apache/httpd 2.8 0.1 /local/bin/python 2.5 0.6 this has to be the next target 1.9 /site/cgi-bin/app.cgi 0.1 1.8 0.7 0.2 disk I/O /usr/bin/psql 0.3

  36. When Do You Stop? • An optimization problem on its own • Time invested vs. likely performance improvements • Plan A: stop when you satisfy SLAs • Or beat them—always nice to have some slack • Plan B: stop when there are no obvious targets • Flat performance profiles are hard to improve • Plan C: stop when you run out of time • Plan D: stop when performance is "good enough"

  37. Five Timescales • Human activities fall into natural cognitive categories: • Continuous • Sip of coffee • Fresh pot • Buy some more beans • Harvest time • Tuning a well-written application usually just improves its performance within its category • Revolutions happen when things are moved from one category to another

More Related