1 / 21

Virtual Host PMP & MONALISA Monitoring Service Shawn McKee smckee@umich

Virtual Host PMP & MONALISA Monitoring Service Shawn McKee smckee@umich.edu Yang Xia yxia@hep.caltech.edu GGF8 Meeting 2003. End to End Performance Issues.

cain-wilson
Download Presentation

Virtual Host PMP & MONALISA Monitoring Service Shawn McKee smckee@umich

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Host PMP & MONALISA Monitoring Service Shawn McKee smckee@umich.edu Yang Xia yxia@hep.caltech.edu GGF8 Meeting 2003

  2. End to End Performance Issues E2E performance requires knowledge of the end-systems along each network path, with the expect E2E performance of the path, taking into account detailed information about the end-system. • What is the CPU, disks, interfaces and memory of the end hosts…and the performance of these? • What is the network interface: type, firmware, parameters and expected performance, both long-term and based upon recent results? • System state is critical; what is the CPU load and estimated bus loading? • What is the upcoming workload in the network, both locally and globally?

  3. Adapting Existing Tools for E2E • Many of the tools being developed for managing data-intensive Grids systems, namely MonALISA, have to address similar issues in monitoring and planning. • We could use some of the same tools and higher level services that will give performance estimates end-to-end based on the above information and/or recent historical data.

  4. Start at the ends: Hosts • We must enable “data acquisition” from the hosts involved. • We need a system which can dynamically download a data gathering application which can run on most systems… • JAVA seems to be the most likely candidate. • Pervasive • Can be cryptographically signed • Permissions can be fine grained • Runs on MANY OS’s

  5. Static Info GUID Operating system and version Processor details NIC info (firmware, brand, type) Memory info TCP stack parameters Dynamic Info Interrupts/sec CPU usage NIC bandwidth, errors, queue lengths Memory usage Bus usage What to “acquire” from each host? • Test Info • IP address of the test destination. • Trace route between host and destination. • Ping test between host and destination. • Iperf test between host and destination.

  6. Host Application • Having a system accessible thru the web and supporting Linux and Windows would give us the broadest initial coverage. • First time users connect to a E2E server or peer-to-peer system and download a signed Java applet to their host.

  7. Starting the Application • The user allows the applet to start (security signing) • The applet starts and creates a GUID for this host and records, in standard format, the “stable” host details. Each new invocation of the applet will verify the currency of the stable information • The applet can provide the GUID and host details to a registration server with or without “anonymization” of identifying details

  8. Path Testing • The series of tests is run by a Java application and could be a defined set of measurements: • Ping (reachability/RTT) • Traceroute (both forward and backward) • One-way loss (each direction) • Iperf (bandwidth EACH way, measured simultaneously) • Missing components are downloaded from servers. • Bandwidth testing is done both ways, simultaneously to find duplex problems on the path • Dynamic host information is recorded at both ends during each sub-test

  9. Test Results • Test results would be saved locally and a summary given to the user. • A Java analysis applet could parse the info, looking for common problems • Results could be “logged” with a central service • Logged events could be further analyzed by central servers with access to current network details. Users could be referred to the most likely problem domain with current contact information provided by the central server

  10. Some Goals • Put the “wizard” knowledge into the applets • Enable ordinary users to perform state of the art testing • Provide a reference set of network testing applications by host type for users • Define a network measurements database for the network users community • Interoperate with PMP stations in the network • Instrument applications to automatically provide data to system

  11. M NALISA MONitoring Agents using a Large Integrated Services Architecture http://monalisa.cacr.caltech.edu Principle Author: Iosif Legrand California Institute of Technology

  12. MonALISA Design Considerations • Act as a true dynamic service and provide the necessary functionally to be used by any other services that require such information (Jini, UDDI - WSDL / SOAP) • - mechanism to dynamically discover all the "Farm Units" used by a community • - remote event notification for changes in the any system • - lease mechanism for each registered unit • Allow dynamic configuration and the list of monitor parameters. • Integrate existing monitoring tools ( SNMP, LSF, Ganglia, Hawkeye …) • It provides: - single-farm values and details for each node - network aspect - real time information - historical data and extracted trend information - listener subscription / notification - (mobile) agent filters and alarm triggers algorithms for prediction and decision-support

  13. CLIENT Lookup Lookup Service Service Lookup Lookup Service Service Register Service ID Register with ID Service Ask for a lease Get a lease for DT JINI – Network Services A Service Registers with at least one Lookup Service using the same ID. It provides information about its functionality and the URL addressed from where interested clients may get the dynamic code to use it. The Service must ask each Lookup Service for a lease and periodically renew it. If a Service fails to renew the lease, it is removed form the Lookup Service Directory. When problems are solved, it can re-register. jar jar Web Web Server Server Publish the “Interface” jar The lease mechanism allows the Lookup Service to keep an up to date directory of services and correctly handle network problems.

  14. MonaLISA: Data Collection Dynamic Thread Pool Other tools (Ganglia, MRT…) PULL SNMP get & walk rsh | ssh remote scripts End-To-End measurements Farm Monitor Configuration Control Trap Listener PUSH snmp trap WEB Server Dynamic loading of modules or agents Trap Agent (ucd – snmp) perl

  15. Data Collection Modules MonaLisa is a monitoring Framework: SNMP (walk and get ) for computing nodes, routers and switches • Scripts , dedicated application (programs) which may be invoked on remote systems • Interface to Gangia • Interface to LSF and PBS • Interface to Hawkeye (Wisconsin) • Interface to LDAP ( MDS ) ( Florida) • Interface to IEPM-BW measurements • Specialized modules for VRVS

  16. Traffic from CERN into Geant IEPM- BW Measurements @ SLAC IEPM - Production Traffic CERN-US Real-time Traffic from CERN into DataTAG DataTAG - - Load on the Farm Nodes @ CALTECH @ CALTECH Global Client / Dynamic Discovery

  17. Global Views : CPU, IO, Disk, Internet Traffic …

  18. Regional Centers Discovery & Data access

  19. Access to historical and real-time values Past values are presented and the GUI remains a registered listener and the new vales are added Real Time Histograms for various parameters

  20. Monitoring VRVS Reflectors

  21. SUMMARY • MonaLisa is able to dynamically discover all the "Farm Units" used by a community and through the remote event notification mechanism keeps an update state for the entire system • Automatic & secure code update (services and clients) . • Dynamic configuration for farms / network elements. Secure Admin interface. • Access to aggregate farm values and all the details for each node • Selected real time / historical data for any subscribed listeners • Active filter agents to process the data and provided dedicated / customized information to other services or clients. • Dynamic proxies and WSDL pages for services. • Embedded SQL Data Base and can work with any relational DB. Accepts multiple customized Data Writers (e.g. to LDAP) as dynamically loadable modules. • Embedded SNMP support and interfaces with other tools ( LSF, Ganglia, Hawkeye…) . Easy to develop user defined modules to collect data. • Dedicate pseudo-clients for repository or decision making units • It proved to be a stable and reliable service

More Related