Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application

Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application Nikhil R. Ramteke, Advisor: Prof. Varsha Apte, Department of CSA, IISc 27th May 2011

Multi-tiered Networked Applications • Important Performance Metric • Response time • Utilization • Throughput • Waiting time • Queue length • Arrival Rate • Blocking probability • Average service time DB server Auth servers web server

“login” Machine 1 Machine 7 Machine 4 Machine 2 Machin 8 Machine 5 Machine 3 Machine 6 Machine 9 Web Server Auth Server DB Server Web Server DB Server Web Server DB Server Web Server Auth Server Flow of a request through such a system

PerfCenter • Performance measurement tool, it builds and solves the system model • It takes the system details as an input and built the system model. • System model is built as a network of queues. • Built model is solved either by simulation or by analytical methods. • Open source, available at: • http://www.cse.iitb.ac.in/perfnet/softperf/cape/home/wosp2008page Fig: PerfCenter tool structure

PerfCenter (Input Language) Host Specification: host machine1[2] ram 1000 cpu count 1 cpu buffer 99999 cpu schedP fcfs cpu speedup 1 disk count 1 disk buffer 99999 disk schedP fcfs disk speedup 1 . . .end Server Specification: server web thread count 1 thread buffer 9999 thread schedP fcfs thread size 0.610 staticsize 100 requestsize 0.5 task node2 task node5 task node9 . . .end

Feature Enhancements to PerfCenter • (Problem Definition) • Among various enhancements possible, our contribution is • following:Memory Model: • Memory can be bottleneck while deploying server on host. • Individual server utilization on a device: • PerfCenter can predict the device utilization of host. • But can not estimate the which server has contribution in what • amount, • This feature enables the user to find bottleneck server quickly • Timeout and Retries: • Aimed at capturing the user behavior like “stop-reload”.

Memory Usage ModelingPerfCenter System Model for memory usage:Servers: • Static size of server • Per thread memory usage • Per request memory usage (increases with queue length) • Host: • RAM size for each host • Input language specification: Per server util = (Static Size + Thread size * total threads + Request size * Avg. Queue length of request queue)/ RAM size Server web staticsize 80 thread size 2 requestsize 2 end Host host1 ram 2000 end • Metrics: • util(host_name:ram) //overall RAM util • util(host_name:server_name:ram) //RAM util by a server

Software Design Changes Required for Memory model and Individual Server Utilization Server R R R R R S3 S1 S2 S2 S1 • Memory Model • Added members static size, thread size, request size to the server class software server, • Added members ram size to host class, • No change required to dynamic statistics calculation in simulation • Use average queue length calculated at the end of simulation • Individual server utilization of host devices: • Must keep track of who is issuing request to device • Class member update: total busy time, utilization variables into software queue class. • Some additional bookkeeping during simulation (per server statistics)

Timeouts and Retries: • Characteristics of real users of server systems • Impatience: users abandon if response is not received within their expected time • Retries: Users often retry just after abandoning a request (E.g. “stop-reload” behavior on Web browser) • This behavior is common in client-server based applications. • Timeout may affect system performance in following ways: • Reduction in Throughput, • Completed requests may have already timed out – need to count successful requests separately, • Utilization may decrease due to less throughput, • Average response time decrease due to increase in request timeouts,

Timeouts and Retries: When request is submitted to an application, one of the following things can happen: Request processing is not aborted immediately, processing goes to completion, but request counted as failed Timeout during service [Badput (B)] Possibility of Retry Server Arrival of request Successfully completed [Goodput (G)] Request does not leave the queue immediately, When it is picked by s/w server then it is counted as failed. Timeout in Buffer [Timeout in buffer Rate (Tb)] Drop [Drop rate (D)]

Timeouts and Retries: (PerfCenter system model) Mean timeout value is taken as an input with certain distribution, timeout value of each request is set according to it. Input language: loadparams timeout distribution_name(distribution_parameters) . . end Eg: loadparams timeout exp(0.5) end

Timeouts and Retries: • (PerfCenter system model: ) • Overall G, B, D and Tb can now be estimated with PerfCenter • as follows, • Output Language: • gput() //overall Goodput • bput() // overall Badput • buffTimeout() // overall Timeout in buffer rate • droprate() // overall drop rate

Timeouts and Retries: • Software Design Changes: • Added members timeout flag, mean timeout in to the • Request class, • Added number of request processed, number of request • timeout in buffer, number of request timed out in service, • Goodput, Badput, drop rate, timeout in buffer rate to the • Scenario simulation class. • No extra events are added.

Validation: • Type of System : Open • Service rate : 100 • Arrival rate : Varied from 10 to 100 • Timeout rate : 10 • Timeout distribution : Exponential • Requests simulated : 1000000 • Number of repetition : 20 • Validation done using sanity checks • Results should follow expected rules and trends • Scenario used for validation: • Input File

Results Fig : RAM utilization v/s Arrival Rate

Results Goodput Decreases More request timed out in buffer Fig : G, B, Tb, D v/s Arrival Rate

Results Utilization curve follows Throughput (G + B) Starts decreasing because more requests are timing out in buffer Fig : Utilization, Throughput v/s Arrival Rate

Results Utilization decreases due to more request time outs Fig : Individual server utilization v/s Arrival Rate

Results Avg. Response time decreases due to timeouts Fig : Average Response Time v/s Arrival Rate

Summary of Work Done • Before Midterm: • Background Study • Queuing theory, • Simulation modeling, • Performance issues of multi-tiered systems, • PerfCenter • After Midterm: • Developed an abstraction, an input language and updated • PerfCenter simulation engine for • Adding memory model, • Updating utilization model for Individual server utilization on device, • Adding Timeout and Retries model.

Conclusion:PerfCenter is performance measurement tool, and can now be used by performance analysts with few more useful features added, most important one being timeouts and retries. • Validated our model using test experiment. Illustrative results shows how PerfCenter can be used for estimating application performance in presence of following features. • Memory model • Individual server utilization • Timeout and retries model. • As results show, this can change data center sizing plans. • Future work: • Predicting G, B, Tb, D for individual queuing systems, • More validation is needed to increase confidence in the tool, • More features need to be added to increase power of the tool.

References: • R.P. Verlekar, V. Apte, PP. Goyal, and B. Aggarwal.Perfcenter: A methodology and tool for performance analysis of application hosting centers.MASCOTS '07: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007, pages 201—208. • SupriyaMarathe, Varsha Apte, and AkhilaDeshpande.Perfcenter: Perfcenter: A performance modeling tool for application hosting centers.WOSP '08 Proceedings of the 7th international workshop of Software and Performance, 2008 • Kishor~S. Trivedi.Probability and Statistics With Reliability, Queuing, and Computer Science Applications.PHI Learing Private Limited, Eastern Economy edition, 2009.

References: • 4. Averill M. Law and W. David Kelton.Simulation Modeling and Analysis.Tata Mcgraw-Hill, 2000. • 5. Daniel A. Menasce and Virgilio A. F. Almeida.Scaling for E-Business, Technologies, Models, Performance and Capacity Planning.Prentice Hall PTR, 2000. • 6. SupriyaMarathe.Performance Modeling for Distributed Systems.Master's thesis, IIT Bombay, Mumbai, India, June 2008. • 7. PuramNiranjan Kumar.Validation, Defect Resolution and Feature Enhancements of PerfCenter.Master's thesis, IIT Bombay, Mumbai, India, June 2008.

Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application

Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application

Presentation Transcript

Erythropoietin Modeling and Simulation

Simulation for Grid Computing

Tutorial Mobility Modeling for Future Mobile Network Design and Simulation

System Analysis through Bond Graph Modeling

Chapter 23 CUTTING TOOL TECHNOLOGY

Stochastic Geometry as a tool for the modeling of telecommunication networks

System Analysis through Bond Graph Modeling

SimBiology and MATLAB: A Software Platform for Modeling, Simulation and Analysis of Biological Systems

VERITAS NETBACKUP DATACENTER Local V880 Overview

Datacenter Management with Apache Mesos

Interactive Physically-Based Simulation

Modeling and Parallel Simulation of Multicore Architectures with Manifold

VISTAS Meteorological Modeling 2002 Simulation May 25, 2004 National RPO Modeling Meeting

SIMULATION MODELING AND ANALYSIS WITH ARENA T. Altiok and B. Melamed Chapter 13

CS433 Modeling and Simulation Lecture 04 Statistical Models

Modeling Application Process

Hua Long

CHAPTER 19 GIS MODELS AND MODELING 19.1 Basic Elements of GIS Modeling

Multi-scale Modeling in Systems Biology

CHAPTER 9 Simulation and Design

TALYS: a tool to go from theoretical modeling of nuclear reactions to evaluations Part II

EQE038 – Simulação e Otimização de Processos Químicos