1 / 24

Iosif Legrand California Institute of Technology

An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems. February 2006. Iosif Legrand California Institute of Technology. The MonALISA Framework.

yitta
Download Presentation

Iosif Legrand California Institute of Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems February 2006 Iosif Legrand California Institute of Technology February 2006 Iosif Legrand

  2. The MonALISA Framework • MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems. • The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications. February 2006 Iosif Legrand

  3. MonALISA is A Dynamic, Distributed Service Architecture • The framework is based on a hierarchical structure of loosely coupled agents acting as distributed services which are independent & autonomous entities able to discover themselves and to cooperate using a dynamic set of proxies or self describing protocols. • An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems manageable in real time. For an effective use of distributed resources, these services provide adaptability and self-organization. February 2006 Iosif Legrand

  4. MonALISA service & Data Handling Lookup Service Lookup Service Client (other service) Web client Postgres MySQL WEB Service WSDL SOAP Data Stores Data Cache Service & DB Discovery Registration Communications via the ML Proxy Client (other service) Java data MonALSIA Service Predicates & Agents Applications • Configuration Control (SSL) User defined loadable Modules to write /sent data February 2006 Iosif Legrand

  5. The MonALISA Discovery System & Services Fully Distributed System with no Single Point of Failure Global Services or Clients Clients , HL services repositories Dynamic load balancing Scalability & Replication Security AAA for Clients Proxies AGENTS Distributed System for gathering and Analyzing Information. MonALISA services Distributed Dynamic Discovery- based on a lease Mechanism and REN Network of JINI-LUSs Secure & Public February 2006 Iosif Legrand

  6. Monitoring Internet2 backbone Network • Test for a Land Speed Record • ~ 7 Gb/s in a single TCP stream from Geneva to Caltech February 2006 Iosif Legrand

  7. The UltraLight Network BNL ESnet IN /OUT February 2006 Iosif Legrand

  8. NETWORKS ROUTERS AS Monitoring Network Topology Latency, Routers February 2006 Iosif Legrand

  9. Monitoring The GLORIAD Ring February 2006 Iosif Legrand

  10. Monitoring Grid sites, Running Jobs, Network Traffic, and Connectivity JOBS TOPOLOGY ACCOUNTING February 2006 Iosif Legrand

  11. Monitoring OSG: Resources, Jobs & Accounting Running Jobs Accounting 42 SITES ~ 4 000 Nodes ( 10 000 CPUs) Thousands of Jobs 60 000 parameters February 2006 Iosif Legrand

  12. FTP Data Transfer between GRID sites Total FTP Traffic per VO February 2006 Iosif Legrand

  13. Bandwidth Challenge at SC2005 151 Gbs ~ 500 TB Total in 4h February 2006 Iosif Legrand

  14. End User / Client Agent LISA- Localhost Information Service Agent • Authorization • Service discovery • Local detection of the hardware and software configuration • Complete end-system monitoring: Per-process load, I/O and network throughputs, etc. • End-to-end performance measurements • Will act as an active listener for all events related with the requests generated by its local applications. February 2006 Iosif Legrand

  15. TCP Settings Host/System Information Network Device Information Host Monitoring at SC2005 • Many “network” problems are actually endhost problems: misconfigured or underpowered end-systems • The LISA application was designed to monitor the endhost and its view of the network. • For SC|05 we developed we used LISA to gather the relevant host details related to network performance • Information on the system information, TCP configuration and network device setup was gathered and accessible from one site. • Future plans are to coordinate this with LISA and deploy this as part of OSG. The Tier-2 centers are a primary target. February 2006 Iosif Legrand

  16. Available Bandwidth Measurements • Embedded Pathload module. February 2006 Iosif Legrand

  17. Coordination Service for Available Bandwidth Measurements • Enforces measurement fairness • Avoids multiple probes on shared network segments • Dynamic configuration of measurements timing • Logs events • Provides service redundancy by using a master-slave model February 2006 Iosif Legrand

  18. Job1 Job Job Job2 Job 31 Job3 Job 32 Monitoring the Execution of Jobs and the Time Evolution SPLIT JOBS LIFELINES for JOBS Summit a Job DAG February 2006 Iosif Legrand

  19. App. Monitoring UDP/XDR UDP/XDR UDP/XDR Time;IP;procID MonitoringData MonitoringData MonitoringData parameter1: value parameter2: value ... App. Monitoring Mbps_out:0.52 Status: reading MB_inout: 562.4 ApMon – Application Monitoring Library of APIs (C, C++, Java, Perl. Python) that can be used to send any information to MonALISA services • Flexibility, dynamic configuration, high communication performance dynamic reloading Config Servlet • Automated system monitoring • Accounting information MonALISA hosts APPLICATION MonALISA Service ApMon APPLICATION MonALISA Service ApMon System Monitoring No Lost Packages ApMon configuration generated automatically by a servlet / CGI script load1:0.24 ApMon Config processes: 97 pages_in:83 February 2006 Iosif Legrand

  20. Optical Switch Optical Switch Optical Switch ML Agent ML Agent ML Agent MonALISA MonALISA MonALISA MonALISA agents to create on demand on an optical path or tree Discovery & Secure Connection 2 3 ML Demon 1 Time to create a path on demand <1s independent of the location and the number of connections Control and Monitor the switch Runs a ML Demon >ml_path IP1 IP4 “copy file IP4” 4 ML proxy services used in Agent Communication February 2006 Iosif Legrand

  21. Monitoring and Controlling Optical Planes Controlling Port power monitoring February 2006 Iosif Legrand

  22. Monitoring Optical Switches Agents to Create on Demand an Optical Path February 2006 Iosif Legrand

  23. Communities using MonALISA Major Communities • OSG • CMS • ALICE • D0 • STAR • VRVS • LGC RUSSIA • SE Europe GRID • APAC Grid • UNAM Grid • ABILENE • ULTRALIGHT • GLORIAD • LHC Net • RoEduNET MonALISA Running 24 X 7 at 250 Sites • Collecting 250,000 parameters in near real-time • Update rate of 25,000 parameter updates per second • Monitoring • 12,000 computers • > 100 WAN Links • Thousands of Grid jobs running con- currently • Demonstrated at: • SC2003 • Telecom World 2003 • WSIS 2003 • SC 2004 • I2 2005 • TERENA 2005 • IGrid 2005 • SC 2005 ABILENE - CMS-DC04 - GRID3 VRVS ALICE February 2006 Iosif Legrand

  24. The MonALISA Architecture Provides: • Distributed Registration and Discovery for Services and Applications. • Monitoring all aspects of complex systems : • System information for computer nodes and clusters • Network information : WAN and LAN • Monitoring the performance of Applications, Jobs or services • The End User Systems, its performance • Video streaming • Can interact with any other services to provide in near real-time customized information based on monitoring data • Secure, remote administration for services and applications • Agents to supervise applications, trigger alarms, restart or reconfigure them, and to notify other services when certain conditions are detected. • The MonALISA framework is used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks. • Graphical User Interfaces to visualize complex information February 2006 Iosif Legrand

More Related