1 / 19

Monitoring and performance measurement in Production Grid Environments

Monitoring and performance measurement in Production Grid Environments. David Wallom. Overview. Who uses monitoring? Aspects of performance measurement Tools for monitoring Adding a new service into a monitoring framework. Who are the consumers of monitoring?. Grid/VO management

trudy
Download Presentation

Monitoring and performance measurement in Production Grid Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monitoring and performance measurement in Production Grid Environments David Wallom

  2. Overview • Who uses monitoring? • Aspects of performance measurement • Tools for monitoring • Adding a new service into a monitoring framework

  3. Who are the consumers of monitoring? • Grid/VO management • Responsible for designing & maintaining requirements • Verify fulfillment of SLAs by resource providers • System administrators • Notified of problems • Enough information to understand context of problem • End users • View results and compare to problems they are having • Debug user account/environment issues • Advanced users: feedback to Grid/VO

  4. Monitoring from a user perspective • Things that need to work for the Grid? • Can I login? • Is my application[s] available on connected systems? • Can I get to my input data? • What credentials do I need? • Can I get the input data to the application? • How long will my application take to run? • …

  5. Performance Measurement • Depends on monitoring of; • Availability • Usage

  6. Measuring Availability • Test the following grid functionality • User authorization • System information publishing • Data transfer to and from system • Submission of tasks onto the system • Measurement of other functionality • Type of system

  7. Measuring Usage • Within each system need to know; • Current load • e.g. queue lengths, number of running processes on an SMP system • Knowledge of network connectivity • Total throughput rate for a submitted user job

  8. Tools for monitoring availability • Systems status • Grid status • All system and grid status monitoring

  9. Ganglia • Developed out of HPC community, • Will monitor worker as well as system head nodes, • Can have sub nodes reporting to a master to create grid monitoring, • Example: • http://oxgrid-vom.ierc.ox.ac.uk/ganglia/

  10. Big Brother • Designed to monitor individual systems, • Simple interface giving immediate feedback on overall system status, • Different providers can be added for additional services such as different process to be monitored etc. • Can be difficult to look at historical trends though, • Example; • http://cerb-mds.bris.ac.uk/bb/bb.html

  11. Grid Interoperability Test Scripts • Developed by Southampton e-Science Centre, • Tests in series each of the standard grid functionalities for a specified node • Wrapper to test in parallel many systems • Example of the results • http://www.ngs.ac.uk/ops/gits/oxford/NationalGridService.html

  12. INCA • Developed by SDSC and TeraGrid • Extensible framework for monitoring • Tests the following as standard • Static system information • Installed software versions • Network performance • Load both on head and queue system if available • Additionally the UK NGS has developed a plug-in for the GITS tests. • Example • http://inca.grid-support.ac.uk/

  13. Testing the behaviour of a Grid • Define a set of concrete requirements for connected systems • Write tests to verify requirements • Periodically run tests and collect data across all of the system • Publish data and archive for reporting • Automate Steps 3 and 4 to provide real time system status information

  14. Connecting to existing production systems • Determine monitoring requirements for systems to be connected • Write independent tests for service being provided. • Write information providers to fit tests into existing monitoring frameworks

  15. Conclusions • Monitoring must be based on a well known set of requirements for admins (both VO and systems) & users • There are several products available to provide monitoring frameworks, each can be extended beyond initial capabilities • Life would be made a lot simpler if there was a standard monitoring schema which could then be used to plug-in grid and system information into all monitoring frameworks!

More Related