Measuring quality of service on worker node in cluster
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Measuring Quality of Service on Worker Node in Cluster PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Measuring Quality of Service on Worker Node in Cluster. Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland. INTRODUCTION. Quality of Service

Download Presentation

Measuring Quality of Service on Worker Node in Cluster

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Measuring quality of service on worker node in cluster

Measuring Quality of Service on Worker Node in Cluster

Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India

Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland

CHEP 06


Introduction

INTRODUCTION

  • Quality of Service

    • Defines goodness of a node for a type of task

    • Needed for better/optimum utilization of resources

  • Computer Division, BARC and IT Division CERN collaborated to explore ways to predict QoS

CHEP 06


Qos definition

Texecution = Wall clock execution time for any task

Tnoload = Wall clock execution time of the task on a given node without load

QoS = Quality of Service

QoS – Definition

  • QoS defines, how better the node is for a given task

  • QoS relates execution times like this

  • QoS varies between 0 to 1

CHEP 06


Methodology

Methodology

  • Three task categories

    • CPU intensive

    • Disk IO intensive

    • Network IO intensive

  • Representative probe programs for each category

  • Load generating program for each category

CHEP 06


Methodology1

Methodology

  • Monitor system metrics

    • Load avg, CPU utilization, Memory utilization, disk utilization, swap utilization etc.

  • Execute probe programs in different load conditions (generated using load generating programs)

  • Correlate probe execution time, system metrics and no load execution time of probe

CHEP 06


Probe selection

Probe Selection

  • Probe should

    • Represent real world applications

    • Have less execution time

    • Non-interactive

  • Selected probes are

    • Linpack for CPU intensive

    • Bonnie for Disk IO intensive

    • Network IO intensive (not considered)

CHEP 06


Load generating programs

Load Generating programs

  • Generate load in given category

  • Should have large execution time

  • Feature for varying the load

  • Two type of Disk IO load

    • Block IO (IO in large data blocks)

    • Character IO (IO in small data blocks)

CHEP 06


Setup

SETUP

  • 32 node cluster

  • Each node consists of

    • [email protected] GHz

    • 640 MB memory

    • 40 GB HDD

    • Redhat Linux version 7.3

  • EDG Fabric Monitoring System for gathering system metrics

CHEP 06


Cpu probe

(Equation 1)

CPU Probe

  • CPU probe in different loading conditions

  • Correlation using load average

  • Execution time varies linearly with load average

  • Problem in block IO load

CHEP 06


Cpu probe1

CPU Probe

CHEP 06


Cpu probe2

CPU Probe

  • Load average represents combined CPU and IO load

  • CPU probe depends only on CPU load

  • Two ways to achieve it

    • Average CPU load (VmStatR)

    • Calculate available CPU to probe

CHEP 06


Cpu probe3

(Equation 2)

CPU Probe

  • Average CPU Load

    • 1 minute running average of run queue

    • Called VmStatR

    • Predicted QoS will be

CHEP 06


Cpu probe4

CPU Probe

CHEP 06


Cpu probe5

(Equation 3)

CPU Probe

  • Available CPU to probe

    • Calculate using CPU utilization metric

    • Probe is eligible for

      • Available Idle time

      • A share of System and User time

CHEP 06


Cpu probe6

CPU Probe

  • Table shows the comparison between QoS predicted using equation 1 & 3 in Block IO load

  • QoS using Eq. 3 shows correct characteristic

CHEP 06


Comparison of results

Comparison of results

  • Compare the QoS results obtained using the three equations for CPU probe in different loads

    • Equation 1 does not give correct prediction in block IO load conditions

    • Equation 2 & 3 give acceptable results in any load condition

CHEP 06


Cpu probe comparison of results

CPU Probe – Comparison of results

LC – CPU Load

LC+LB – CPU + Block IO Load

LC + LCh – CPU + Character IO Load

LCh + LB – Character + Block IO Load

CHEP 06


Disk io probe

Disk IO Probe

  • Modified ‘Bonnie’ to perform both as block IO and character IO probe

  • Considered block IO probe as most of the applications were block IO intensive

  • Correlate execution time probe under different loading conditions

  • Predicted QoS using the three equations and compared results

CHEP 06


Disk io probe comparison of results

Disk IO Probe – Comparison of results

LC – CPU Load

LC+LB – CPU + Block IO Load

LC + LCh – CPU + Character IO Load

LCh + LB – Character + Block IO Load

CHEP 06


Cmsim results

CMSIM Results

  • Predicted execution time using QoS from Equation 2

  • % error against the measured one acceptable

CHEP 06


Problem areas

Problem Areas

  • Effect of swapping

    • If available memory is less than the size of task

    • Linux kernel dynamically changes the priorities of tasks and swaps tasks accordingly

    • Difficult to predict QoS

CHEP 06


Problem areas swapping

Problem Areas – Swapping

CHEP 06


Problem areas1

Problem Areas

  • Metric sampling frequency of monitoring system

    • Immediate metric value ensures better QoS prediction

    • At higher sampling frequency monitoring loads the node

  • Change in state after submission of task

    • QoS can’t consider load changes after submission of task

    • Submission/removal of other task may change QoS

CHEP 06


Conclusion

Conclusion

  • Equation 2 & 3 provides better QoS for CPU bound applications

  • Equation 1 can be used for IO bound applications

  • Successfully predicted for CMSIM – It is mostly cpu bound job

  • Load balancing programs can use derived equations for job submissions

CHEP 06


Measuring quality of service on worker node in cluster

Thanks

CHEP 06


  • Login