Measuring quality of service on worker node in cluster
Download
1 / 25

Measuring Quality of Service on Worker Node in Cluster - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Measuring Quality of Service on Worker Node in Cluster. Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland. INTRODUCTION. Quality of Service

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Measuring Quality of Service on Worker Node in Cluster' - afia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Measuring quality of service on worker node in cluster

Measuring Quality of Service on Worker Node in Cluster

Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India

Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland

CHEP 06


Introduction
INTRODUCTION

  • Quality of Service

    • Defines goodness of a node for a type of task

    • Needed for better/optimum utilization of resources

  • Computer Division, BARC and IT Division CERN collaborated to explore ways to predict QoS

CHEP 06


Qos definition

Texecution = Wall clock execution time for any task

Tnoload = Wall clock execution time of the task on a given node without load

QoS = Quality of Service

QoS – Definition

  • QoS defines, how better the node is for a given task

  • QoS relates execution times like this

  • QoS varies between 0 to 1

CHEP 06


Methodology
Methodology

  • Three task categories

    • CPU intensive

    • Disk IO intensive

    • Network IO intensive

  • Representative probe programs for each category

  • Load generating program for each category

CHEP 06


Methodology1
Methodology

  • Monitor system metrics

    • Load avg, CPU utilization, Memory utilization, disk utilization, swap utilization etc.

  • Execute probe programs in different load conditions (generated using load generating programs)

  • Correlate probe execution time, system metrics and no load execution time of probe

CHEP 06


Probe selection
Probe Selection

  • Probe should

    • Represent real world applications

    • Have less execution time

    • Non-interactive

  • Selected probes are

    • Linpack for CPU intensive

    • Bonnie for Disk IO intensive

    • Network IO intensive (not considered)

CHEP 06


Load generating programs
Load Generating programs

  • Generate load in given category

  • Should have large execution time

  • Feature for varying the load

  • Two type of Disk IO load

    • Block IO (IO in large data blocks)

    • Character IO (IO in small data blocks)

CHEP 06


Setup
SETUP

  • 32 node cluster

  • Each node consists of

  • EDG Fabric Monitoring System for gathering system metrics

CHEP 06


Cpu probe

(Equation 1)

CPU Probe

  • CPU probe in different loading conditions

  • Correlation using load average

  • Execution time varies linearly with load average

  • Problem in block IO load

CHEP 06


Cpu probe1
CPU Probe

CHEP 06


Cpu probe2
CPU Probe

  • Load average represents combined CPU and IO load

  • CPU probe depends only on CPU load

  • Two ways to achieve it

    • Average CPU load (VmStatR)

    • Calculate available CPU to probe

CHEP 06


Cpu probe3

(Equation 2)

CPU Probe

  • Average CPU Load

    • 1 minute running average of run queue

    • Called VmStatR

    • Predicted QoS will be

CHEP 06


Cpu probe4
CPU Probe

CHEP 06


Cpu probe5

(Equation 3)

CPU Probe

  • Available CPU to probe

    • Calculate using CPU utilization metric

    • Probe is eligible for

      • Available Idle time

      • A share of System and User time

CHEP 06


Cpu probe6
CPU Probe

  • Table shows the comparison between QoS predicted using equation 1 & 3 in Block IO load

  • QoS using Eq. 3 shows correct characteristic

CHEP 06


Comparison of results
Comparison of results

  • Compare the QoS results obtained using the three equations for CPU probe in different loads

    • Equation 1 does not give correct prediction in block IO load conditions

    • Equation 2 & 3 give acceptable results in any load condition

CHEP 06


Cpu probe comparison of results
CPU Probe – Comparison of results

LC – CPU Load

LC+LB – CPU + Block IO Load

LC + LCh – CPU + Character IO Load

LCh + LB – Character + Block IO Load

CHEP 06


Disk io probe
Disk IO Probe

  • Modified ‘Bonnie’ to perform both as block IO and character IO probe

  • Considered block IO probe as most of the applications were block IO intensive

  • Correlate execution time probe under different loading conditions

  • Predicted QoS using the three equations and compared results

CHEP 06


Disk io probe comparison of results
Disk IO Probe – Comparison of results

LC – CPU Load

LC+LB – CPU + Block IO Load

LC + LCh – CPU + Character IO Load

LCh + LB – Character + Block IO Load

CHEP 06


Cmsim results
CMSIM Results

  • Predicted execution time using QoS from Equation 2

  • % error against the measured one acceptable

CHEP 06


Problem areas
Problem Areas

  • Effect of swapping

    • If available memory is less than the size of task

    • Linux kernel dynamically changes the priorities of tasks and swaps tasks accordingly

    • Difficult to predict QoS

CHEP 06



Problem areas1
Problem Areas

  • Metric sampling frequency of monitoring system

    • Immediate metric value ensures better QoS prediction

    • At higher sampling frequency monitoring loads the node

  • Change in state after submission of task

    • QoS can’t consider load changes after submission of task

    • Submission/removal of other task may change QoS

CHEP 06


Conclusion
Conclusion

  • Equation 2 & 3 provides better QoS for CPU bound applications

  • Equation 1 can be used for IO bound applications

  • Successfully predicted for CMSIM – It is mostly cpu bound job

  • Load balancing programs can use derived equations for job submissions

CHEP 06


Thanks

CHEP 06


ad