1 / 36

Magpie : Distributed Profiling for Performance Analysis

Magpie : Distributed Profiling for Performance Analysis. Paul Barham (joint work with Rebecca Isaacs and Richard Mortier) 11 th November 2002. What is Magpie?. Bottom-up approach to characterising the workload of a distributed system Observe concurrency, communication & latency

Download Presentation

Magpie : Distributed Profiling for Performance Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Magpie: Distributed Profiling for Performance Analysis Paul Barham (joint work with Rebecca Isaacs and Richard Mortier) 11th November 2002

  2. What is Magpie? • Bottom-up approach to characterising the workload of a distributed system • Observe concurrency, communication & latency • Focus is on responsiveness, rather than throughput • Resource consumption is accounted to individual service invocations • CPU, disk and network b/w consumed by each request on each machine in the distributed system • “Low level accounting + higher level annotations” • Online measurement, offline processing • Why? Web Services = “wide area synchronous RPC”

  3. Motivation: Indy PerformanceModelling Toolkit • Indy is a hybrid event/throughput based modelling kernel • Inputs: Topology, hardware, workload • Outputs: Utilizations, response times, bottlenecks, etc. • Scope is multi-tier server farms running .NET web sites • Our goal is to use Magpie to derive a generative model of the system workload suitable for input to Indy • Acquire a workload description with less human effort / knowledge than current approach of “incremental micro-benchmarking” • Extract a detailed model from a live, ‘representative’ system • Measure with a realistic mix of transaction types – caches! • Not just a long-term average across all transactions • Build a probabilistic model of the usage profile which includes “hidden” transaction types, eg error conditions, session state • Complex behaviour may not be easily observable manually • e.g. web transaction type discriminator is not necessarily the URL

  4. Outline of rest of talk • Instrumenting a .NET web site • Some raw data and visualizations • Data analysis by clustering • Current status and future work

  5. Duwamish7: VS.Net Enterprise Sample Site

  6. 10,000ft view of an e-commerce site Web Server SQL Server Client http://someurl.aspx SQL request data web page

  7. A little bit more detail… Web Server(s) SQL Server(s) Client(s) w3wp Cache ASP.Net Business Logic ADO.Net front end Stored Procs DBMS Cache CLR / Jit Compiler • Site-specific content and code shown in purple • Requests can be either static content, or active – execute code to generate HTML • Code may execute SQL queries on database.

  8. Internal Architecture of IIS6 / ASP.NET aspnet_isapi.dll aspnet_isapi.dll Common Language Runtime • w3wp.exe • Per-“site” worker proc • ISAPI extension hosts CLR • ASP.Net, ADO.Net • Threadpools • One for CLR, one for IIS • Nested state machines • One per-request, event driven • Non-blocking upcalls • All async I/O • Uses single IOCompletionPort • Demux using “OVERLAPPED”s • http.sys • aka “Universal listener” • Kernel-mode driver • “TransmitFile() on steroids” • HTTP fragment cache Application ADO.Net ASP.Net CLR Winsock ISAPI “ECB” Type-specific Handlers Static ISAPI CGI Per-Request FSMs DONE RESPONSE LOG AUTHORIZ HANDLE URLINFO AUTHENT START W3_MAIN_CONTEXT Main W3 Threadpool ULATQ HTTPAPI Winsock User I/O Compn Port Kernel http.sys TCP/IP

  9. Low Level Profiling Kernel Tracing • Windows XP has very efficient low-level event tracing built in to the kernel (“ETW”) • Command-line tools for turning on or off tracing of specific system activities • Magpie uses ETW on each machine to capture • Every context switch • File IO • Disk IO • Network send and receive completions • Process and thread creation and deletion • Page faults • Overhead is surprisingly small! (~8MB log for 5 mins)

  10. Stitching together Asynchronous I/O and Thread Synchronization Detours • Public tool written by Galen Hunt (MSR) • Can be used to intercept calls to any DLL and record their arguments etc. • Magpie usage: • Intercept calls to Winsock2 API • Allows communication to be followed (WSASend, WSARecv) • Intercept the HTTPAPI • Follows the processing of HTTP requests by http.sys • Intercept certain Win32 API calls • Observe thread synchronization (e.g. SetEvent, Waits) • Follow async I/Ocompletions (GetQueuedCompletionStatus) • Directly observe threadpool.

  11. Higher Level Annotations ISAPI filter • Extension DLLs loaded into IIS (web server) process • Sees all HTTP requests • Filter registers with IIS to receive particular event notifications • Can examine and modify both incoming and outgoing streams of data • Magpie ISAPI filter: • Allocates a unique identifier to each incoming request and adds it to the HTTP headers • Generates additional trace events on entry and exit • Records cycle counter, request ID & resource usage • “Thread 274 is now working on Request CCC00017”

  12. Following the active content HTTPModule • Plugins for ASP.NET extensibility • Sees all active requests • Executes in the “managed” (CLR) environment • Each active request is processed by multiple HTTPModules, e.g. session, authentication, etc. • Magpie HTTPModule: • Stashes request identifier in (managed) thread local state • Records cycle counter, managed thread id, request ID & resource usage

  13. Accounting Database Activity SQL Profiler • Standard part of SQL Server 2000 distribution • Logs selected events to table or file • (events can be user defined) • Magpie SQL Profiling • “Wraps” original stored procedures using reflection • Wrappers generate user-defined profiler events before and after executing the original stored proc. • Recorded by the SQL Profiler in output trace • Data includes request ID, cycle counter + resource usage • Uses extended stored procedure to get cycle counter + resource usage stats

  14. Intercepting Outgoing RPCs Common Language Runtime Profiler • Two COM interfaces: • ICorProfiler: provides notifications, e.g. function enter/leave, module/class load, thread mappings, .Net remoting, JIT compilation • ICorProfilerCallback: Runtime provides API which allows profiler to examine and modify VM state. • Magpie CLR Profiler: • Monitors CLRgOS thread mappings • Records thread ids, cycle counter + resource usage • Intercepts JIT compilation of relevant ADO.NET classes/methods • Rewrites byte-code (IL) to insert calls to our own profiling functions • Modifies SQL stored procedure invocations to call wrapped versions and pass an extra argument (i.e. request ID) • Lots of fun but very hairy code!

  15. Putting it all together… SQL SQL Server Server Web Web Server Server Re Re - - SQL Profiler SQL Profiler App Logic App Logic written written User User - - IL IL Filter Filter defined defined ADO.NET ADO.NET HTTPModule HTTPModule ASP.NET events events IIS IIS ASP.NET ISAPI ISAPI Exten Exten Stored Stored Wrappers Wrappers - - ded ded CLR CLR Procs Procs SPs SPs DBMS CLR Profiler CLR Profiler DBMS & IL & IL Patcher Patcher Intercept Intercept Intercept Intercept Intercept Intercept Intercept Intercept Intercept HTTP API HTTP API HTTP API WinSock2 API WinSock2 API WinSock2 API WinSock2 API WinSock2 API WinSock2 API HTTP TDS TDS Kernel Kernel Kernel Kernel Pkt Pkt Pkt Pkt http.sys http.sys Pkt Pkt Capture Tap Capture Tap Capture Tap PerfInfo PerfInfo PerfInfo

  16. Raw Event Data – Yuk! incoming HTTP request incoming HTTP request packet Logs sorted together using cycle counter ... ... 574530113665473L 157.58.60.98.3173 > 192.168.187.66.http: P 9987 574530113665473L 157.58.60.98.3173 > 192.168.187.66.http: P 9987 :10368(381) :10368(381) \ \ \ \ ack 80470 win 64240 (DF) (ttl 127, id 27528, len 421) ack 80470 win 64240 (DF) (ttl 127, id 27528, len 421) 574530113724014 TcpReceive svchost.exe 11c 381 192.168.187.066:8 574530113724014 TcpReceive svchost.exe 11c 381 192.168.187.066:8 0 157.058.060.098:3173 0 157.058.060.098:3173 ... ... 574530114445762 f4c + HttpReceiveHttpRequest(ReqQueueHandle=1c0, 574530114445762 f4c + HttpReceiveHttpRequest(ReqQueueHandle=1c0, RequestId=0, RequestId=0, \ \ \ \ Flags=1, pRequestBuffer=faae48, RequestBufferLength=9d0, pBytesR Flags=1, pRequestBuffer=faae48, RequestBufferLength=9d0, pBytesR eceived=0, eceived=0, pOverlapped=faae14) 0 pOverlapped=faae14) 0 574530114613968 f4c 574530114613968 f4c - - HttpReceiveHttpRequest() HttpReceiveHttpRequest() - - > 3e5 > 3e5 IIS worker IIS worker 574530114724562 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8 574530114724562 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8 c,f5ff98,0) c,f5ff98,0) thread picks thread picks 574530114836287 f4c 574530114836287 f4c - - GetQueuedCompletionStatus(,,,,) GetQueuedCompletionStatus(,,,,) - - > 0 5a8d11fc 0 > 0 5a8d11fc 0 574530114947322 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8 574530114947322 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8 c,f5ff98,1b7740) c,f5ff98,1b7740) up request up request 574530115070456 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue Sys 574530115070456 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue Sys tem 4 e50 3913680 tem 4 e50 3913680 from http.sys from http.sys 574530115394668 CSwitch 0 System 4 e50 Ready 574530115394668 CSwitch 0 System 4 e50 Ready - - w3wp.exe 658 f4c 324212 w3wp.exe 658 f4c 324212 574530115456005 f4c 574530115456005 f4c - - GetQueuedCompletionStatus(,,,,) GetQueuedCompletionStatus(,,,,) - - > 1 5a8d11fc fab854 > 1 5a8d11fc fab854 (async I/O Completion) 574530115683276 f4c ! HttpReceiveHttpRequest Ov: fab854 CID: 400 574530115683276 f4c ! HttpReceiveHttpRequest Ov: fab854 CID: 400 00284e2000000 00284e2000000 \ \ \ \ IndyProf label IndyProf label ReqID: 40000284e2000000 LA: 192.168.187.66:80 RA: 157.58.60.98:3 ReqID: 40000284e2000000 LA: 192.168.187.66:80 RA: 157.58.60.98:3 173 173 \ \ \ \ Bytes: 0 Flags: 0 Verb: 4 RawUrl: /duwamish7/categories.aspx?ID= Bytes: 0 Flags: 0 Verb: 4 RawUrl: /duwamish7/categories.aspx?ID= 843 843 574530115882433 12673200697.357422 ccc00663 f4c /duwamish7/categ 574530115882433 12673200697.357422 ccc00663 f4c /duwamish7/categ ories.aspx?ID=843 ories.aspx?ID=843 574530116116697 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8 574530116116697 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8 c,f5ff98,0) c,f5ff98,0) ASP worker ASP worker ... ... 574530116959382 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue w3wp. 574530116959382 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue w3wp. exe 658 6dc 1564714 exe 658 6dc 1564714 thread takes thread takes HttpModule label 574530117373185 12673200697.3574 ccc00663 6dc /duwamish7/categor 574530117373185 12673200697.3574 ccc00663 6dc /duwamish7/categor ies.aspx?ID=843 ies.aspx?ID=843 over over ... ... 574530119411731 6dc + send(s=4f0, buf=2739044, len=7f, flags=0) 574530119411731 6dc + send(s=4f0, buf=2739044, len=7f, flags=0) 574530119656299L 192.168.187.66.4681 > 192.168.187.68.ms 574530119656299L 192.168.187.66.4681 > 192.168.187.68.ms - - sql sql - - s: P 23578:23705(127) s: P 23578:23705(127) \ \ \ \ ack 227230 win 16735 (DF) (ttl 128, id 10136, len 167) ack 227230 win 16735 (DF) (ttl 128, id 10136, len 167) send request to SQL Server send request to SQL Server 574530119721688 6dc 574530119721688 6dc - - send() send() - - > 7f > 7f ... ... 574530120169086 6dc + WSARecv(s=4f0, lpBuffers=499f4fc 16384, dw 574530120169086 6dc + WSARecv(s=4f0, lpBuffers=499f4fc 16384, dw BufferCount=1, BufferCount=1, \ \ \ \ blocking wait blocking wait lpNumberOfBytesRecvd=499f514, lpFlags=499f510, lpOverlapped=0 0, lpNumberOfBytesRecvd=499f514, lpFlags=499f510, lpOverlapped=0 0, \ \ \ \ for reply from for reply from lpCompletionRoutine=0) lpCompletionRoutine=0) 574530120325932 CSwitch 0 w3wp.exe 658 6dc Waiting UserRequest w 574530120325932 CSwitch 0 w3wp.exe 658 6dc Waiting UserRequest w 3wp.exe 658 ec8 3366550 3wp.exe 658 ec8 3366550 SQL Server SQL Server ... ...

  17. KEY: blocked IIS ASP.NET SQL Disk Unaccounted Part of the visualisation of Transaction ccc00663:/duwamish7/categories.asp?ID=843 Visualisation Tools – PowerPoint Macros! Time/s

  18. !HttpReceiveHt IndyProfResp IndyProfReq +HttpSendRespo -HttpSendRespo HttpModBegin WEB.eec HttpModEnd +WSARecv +WSARecv +WSARecv -WSARecv -WSARecv -WSARecv +send +send +send -send -send -send WEB.398 Disk Net RX Net TX 10.051s 10.155s 10.100s Net TX Net RX KEY: Blocked IIS ASP.NET SQL Disk Other Disk SQL:StartProf SQL:StartProf SQL:StartProf SQL:EndProf SQL:EndProf SQL:EndProf -WSARecv -WSARecv -WSARecv SQL.9c4 10.051s 10.155s 10.100s Transaction ccc000b9: /duwamish7/categories.aspx?ID=831

  19. Transactions between 28.8s and 31.0s WEB.cc8 WEB.cc4 WEB.b90 WEB.ad8 WEB.4b0 Disk Net RX Net TX 28.84s 31.07224012s Net TX Net RX Disk SQL.6cc SQL.e84 28.84s 31.07224012s Interleaving of Simultaneous Requests (Each colour is a different transaction, grey = blocked)

  20. Transaction ccc000b9:/duwamish7/book.aspx?ID=37816 GetQueuedComp GetQueuedComp GetQueuedComp GetQueuedComp GetQueuedComp GetQueuedComp HttpSendRespo HttpReceiveHt HttpSendRespo WaitForSingle HttpReceiveHt WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle HttpReceiveHt WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle ContextSwitchI IndyProfResp IndyProfReq SetEvent SetEvent PostQueuedCom PostQueuedCom HttpSendRespo + + + + + + + + + + + + HttpSendRespo ! - - - WaitForSingle - WaitForSingle - WaitForSingle WaitForSingle - WaitForSingle WaitForSingle WaitForSingle WaitForSingle - - - - - - WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle HttpModBegin ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI HttpModEnd WSARecv WSARecv DiskRead DiskRead DiskRead +send send WEB.a18 + + + + + + + + + + + - - - - - - - - - - - - WEB.848 Disk Net RX Net TX 22.6001556s 22.8630055267s Net TX Net RX WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle WaitForSingle ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI ContextSwitchI SQL:StartProf SQL:EndProf SetEvent SetEvent SetEvent SetEvent SetEvent SetEvent DiskRead DiskRead DiskRead SetEvent SetEvent SetEvent SetEvent SetEvent SetEvent Disk + + + + + + + + + + + + - - - - - - - - - - - - SQL.130 22.6001556s 22.8630055267s KEY: blocked IIS ASP.NET SQL Disk Other “An assortment of magical tastes”…

  21. How similar are two requests?

  22. Mining Some Structure… Issues include: • Multiple clocks (at least one per machine) • Lots of concurrency • Only partial orders provided by network traffic • Noisy observations – preemptive scheduling • Aperiodic sampling – irregular “events” Current approach… • Borrowing algorithms from gene-sequence comparison and speech recognition • Construct a “string” representation of traces • Cluster using variant of String Edit Distance

  23. Levenshtein String Edit Distance • Example: • Can be computed in O(|s1|*|s2|) time using simple dynamic programming algorithm • d('', '') = 0 • d(s, '') = d('', s) = |s| • d(s1+ch1, s2+ch2) = • min( d(s1, s2) + (ch1==ch2 ? 0 : 1), • d(s1+ch1, s2) + 1, • d(s1, s2+ch2) + 1 ) appropriate m-eaning ||||| ||||| ||| approximate matching d(s1,s2)=7

  24. Partial ordering using packets… HTTPREQ 46265 | -GetQueuedCompletionStatus a18 | !HttpReceiveHttpRequest a18 | IndyProfReq a18 | +GetQueuedCompletionStatus a18 | CPU 4178433 Block | -------------------------------| HttpModBegin dc0 | +ProcessRequestMain | +OnStateChange | -OnStateChange | +FlushBuffer | +send dc0 | -------------------------------+------------------------------- TDSREQ 23389 = 23389 TDSREQ -send dc0 | Unblock -FlushBuffer | SQL:StartProf 87c +ReadNetlib | CPU 31605 Block +WSARecv dc0 | Unblock CPU 5038985 Block | CPU 31360 Block ACK 23390 | Unblock Unblock | +WaitForSingleObject 87c CPU 13112 Block | -WaitForSingleObject 87c | +SetEvent 87c | CPU 33897 Block | Unblock | DiskRead 87c | CPU 4020880 | +SetEvent 87c | SQL:EndProf 87c | CPU 182984 --------------------------------------------------------------- TDSRESP 8819 = 8819 TDSRESP Unblock | -WSARecv dc0 | -ReadNetlib | +OnStateChange | -OnStateChange | -ProcessRequestMain | HttpModEnd dc0 | +HttpSendRespnseEntityBody dc0 | HTTPRESP 23391 | HTTPRESP 23392 | CPU 4179107 Block | ACK 46296 | HTTPRESP 23393 | HTTPRESP 23394 | HTTPRESP 23395 | ACK 46297 | HTTPRESP 23396 | HTTPRESP 23397 | HTTPRESP 23398 | ACK 46298 | HTTPRESP 23399 | HTTPRESP 23400 | ACK 46299 | ACK 46300 | Unblock | -HttpSendResponseEntityBody dc0 | +PostQueuedCompletionStatus dc0 | -PostQueuedCompletionStatus dc0 | CPU 832046 Block | -GetQueuedCompletionStatus a18 | IndyProfResp a18 | +HttpSendResponseEntityBody a18 | -HttpSendResponseEntityBody a18 | +HttpReceiveHttpRequest(A) a18 | -HttpReceiveHttpRequest a18 | +GetQueuedCompletionStatus a18 | CPU 2089400 Block | Unblock | ACK 23413 | Web Server SQL Server

  25. Example alphabet for trace strings…

  26. A Distance Metric for Magpie Traces • Assign each Magpie instrumentation point a discrete label • Each trace entry has an 8-tuple of resource usage deltas • (Web CPU, Web DISK, WAN Rx, WAN Tx, LAN Rx, LAN Tx, SQL CPU, SQL DISK) • Deterministically flatten the partial order into a total order • Consider as a string of ‘weighted characters’, where weight is the length of observation vector: • e.g.!1 (0 {1 >5 [1 ]0 <1 }4 B0 b4 $0 )0 B0 b0 Q0 q0 • Extend string-edit-distance to use normalised Euclidian distance between between observation ‘vectors’ • Insert/delete cost = ||v|| Substitution cost = ||v1-v2|| “distance of point from origin” “distance between two points in 8D space

  27. Example String Edit Distances…

  28. Trivial Clustering Algorithm… • Doesn’t need to be very fancy… yet! • Uses a ‘representative’ trace as cluster centroid • Pick the best of 5 as in quicksort • Compute distance from each trace to each cluster centroid • Add a “best-so-far” threshold to string-edit algorithm • (dynamic programming algs are monotonic) • Compare inter/intra-cluster mean distances to decide when to create a new cluster • Periodically move ‘singleton’ clusters to an outliers list and try to merge back in at the end • Approx O(N * C) - where C is #clusters

  29. Typical clusters… Centroid: !1(0R0r0)1B0b0Q0q0 0.000000 static_ccc006c9 !1(0R0r0)1B0b0Q0q0 0.088417 static_ccc00168 !1(0R0r0)1B0b0Q0q0 0.003326 static_ccc00618 !1(0R0r0)1B0b0Q0q0 0.060546 static_ccc006c3 !1(0R0r0)1B0b0Q0q0 0.013645 static_ccc00616 !1(0R0r0)1B0b0Q0q0 0.032970 static_ccc006c5 !1(0R0r0)0B0b0Q0q0 0.048854 static_ccc006c4 !1(0R0r0)1B0b0Q0q0 0.043589 static_ccc006c7 !1(0R1r0)1B0b0Q0q0 0.106195 static_ccc0038e !1(0R0r0)1B0b0Q0q0 0.057646 static_ccc00043 !1(0R0r0)1B0b0Q0q0 0.025930 static_ccc00556 !1(0R0r0)1B0b0Q0q0 0.008057 static_ccc0038f !1(0R0r0)1B0b0Q0q0 ... Centroid: !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q0 0.000000 chkout_ccc0041c !7(0{2>18[3]0<0}3B0b3$0)0B0b0Q0q0 0.473224 chkout_ccc00092 !7(0{2>12[3]0<0}3B0b4$0)0B0b0Q0q0 0.273929 chkout_ccc00246 !7(0{2>16[3]0<0}3B0b4$0)1B0b0Q0q0 0.167314 chkout_ccc002a8 !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q0 0.185258 chkout_ccc004be !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q0 0.123095 chkout_ccc00675 !7(0{2>18[3]0<0}3B0b4$0)0B0b0Q0q0 0.100177 chkout_ccc005f4 !7(0{2>17[3]0<0}3B0b4$0)0B0b0Q0q0 0.318734 chkout_ccc001a5 !7(0{2>12[3]0<0}3B0b3$0)0B0b0Q0q0 0.049554 chkout_ccc00165 !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q0 0.369109 chkout_ccc003ac !7(0{2>12[3]0<0}3B0b3$0)0B0b0Q0q0 ... Centroid: !1(0{2}5B0b4$0)0 0.000000 books_ccc00301 !1(0{2}5B0b4$0)0 0.214118 books_ccc00548 !1(0{2}5B0b4$0)0 0.204555 books_ccc006fd !1(0{2}5B0b4$0)0 0.150912 books_ccc0079f !1(0{2}5B0b4$0)0 0.019864 books_ccc00873 !1(0{2}5B0b4$0)0 0.208676 books_ccc00484 !1(0{2}5B0b4$0)0 0.212472 books_ccc0029f !1(0{2}5B0b4$0)0 0.036158 books_ccc00842 !1(0{2}5B0b4$0)0 0.210166 books_ccc00517 !1(0{2}5B0b4$0)0 0.171975 books_ccc00589 !1(0{2}5B0b4$0)0 0.288412 books_ccc003d3 !1(0{4}5B0b4$0)0 0.217915 books_ccc0069c !1(0{2}5B0b4$0)0 0.238472 books_ccc0076e !1(0{2}5B0b3$0)0 Centroid: !1(0{1>4[1]0<1}4B0b3$0)0B0b0Q0q0 0.000000 books_ccc00748 !1(0{1>5[1]0<1}4B0b4$0)0B0b0Q0q0 0.396412 logon_ccc005be !2(0{2>5[1]0<0}4B0b3$0)0B0b0Q0q0 0.273409 books_ccc005d4 !1(0{1>4[1]0<1}4B0b5$0)1B0b0Q0q0 0.069283 books_ccc006e7 !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q0 0.103720 books_ccc00442 !1(0{1>4[1]0<1}4B0b3$0)0B0b0Q0q0 0.044051 books_ccc00040 !1(0{1>4[1]0<1}4B0b3$0)0B0b0Q0q0 0.208350 books_ccc000b9 !1(0{1>4[1]0<1}5B0b4$0)0B0b0Q0q0 0.147137 books_ccc002cf !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q0 0.099524 books_ccc0082b !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q0 0.244620 books_ccc003ed !1(0{1>7[1]0<1}5B0b4$0)0B0b0Q0q0 0.441445 logon_ccc001e0 !2(0{2>5[1]0<0}4B1b3$0)0B0b0Q0q0 0.116913 books_ccc00105 !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q0 0.096342 books_ccc00686 !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q0 ...

  30. Visualisation of all Clusters n2 pairwise distances Distance Transaction ID Transaction ID

  31. Just the ‘active content’… Transaction ID Distance Transaction ID

  32. A First-cut Workload Model… • Just use cluster centroids and sizes to generate ‘similar’ transaction mix? • Pretty good at capturing coarse differences • e.g. Number of SQL requests • Doesn’t deal with continuous distributions very well • e.g. Cache/memory performance, zipf file size distributions • But is it better than just using URL-based averages? • Evaluation metric: • Take the original trace, assume each transaction is replaced by the centroid of its cluster and add up the RMS error. • Evaluate with and without the ‘outlier’ clusters.

  33. Evaluation of Magpie cluster-based model • Results: per-resource RMS errors (across all transactions): • RMS error improvement over just using URLs:

  34. Process Req Blocked Send Pkt C Tx B B Rx C B S S S S S S S S S S S S S S S S S S S S S B B Tx Rx B B C,D 3 3 2 2 3 3 3 1 1 2 4 1 3 4 2 3 1 2 2 1 1 Better Models using Bayesian Learning? • Ongoing discussions with Michael Isard, Mike Tipping and Chris Bishop • Learn probabilistic models of resource usage by different request types • Construct the per-machine FSMs • Possibly apply coupled hidden Markov models (CHMMs)? Web Compute, Disk IO Receive Pkt SQL Waiting Send Pkt (ignore fictitious details  ) time

  35. Current Status… • Using Matlab & Bayes Network Toolkit (BNT) • Start by trying to fit simple HMM to just static request cluster • One discrete hidden state, 8 continuous observed variables with assumed Gaussian distns. • “Priors” computed from mean / var of observations … unfortunately, none of the supplied learning algorithms converge claiming our data is “infinitely improbable”!

  36. Ongoing Work • Investigating better ways of extracting models from the data, esp. machine learning • Transfer Magpie ideas/tools to the Indy team • Use Magpie to learn parameters in the “live” system order to calibrate processor, memory system and cache models (more speculative) • Exploring other types of distributed system, e.g. GXA / Web Services v2  async messaging

More Related