1 / 25


ECHO. A System Monitoring and Management Tool. Yitao Duan and Dawey Huang. Challenge. How can we manage all these machines?. Goal. Aimed at networked system management Better tools for Discovering system states Enhancing system availability Monitoring network and system statistics

Download Presentation


An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang

  2. Challenge • How can we manage all these machines?

  3. Goal • Aimed at networked system management • Better tools for • Discovering system states • Enhancing system availability • Monitoring network and system statistics • Error detection and correction • Fault tolerance for specific network applications (such as web server)

  4. Overview • Distributed agents gathering information • Centralized Control Unit (CCU) monitors and analyzes data. Takes control action if needed • Script language for automatic decision making • Web browser user interface

  5. EchoMe Daemon SNMP Tool

  6. Centralized Control Unit • Information collection • Machine information • Network information • Information analysis • Individual Machine analysis • Collaborative network analysis • Action • System modification • Network routing

  7. Information Collection • Two approaches investigated • EchoMe Daemons running on hosts and reporting system information to server • SNMP to discover router connectivity and states • Daemon mostly for collecting local information. Much more detailed • SNMP for network connectivity

  8. EchoMe Daemon • Automatically discover a node (node reporting stage) • EchoMe Daemon start up as machine boot • Send up OS type/machine info to CCU • Register a session in CCU • CCU sends to node a monitor program base on node’s OS/Machine type and execute it on the node. • Monitor program send up information packet periodically to CCU.

  9. Router Connectivity Discovery by SNMP • Routers implemented SNMP • Program can run on any host within Millennium • Given a router (can get from local host’s gateway information), query its ipRouteTable • Traverse all its neighboring routers, performing the same query • Recursion stops at specified distance

  10. System Information • Number and speed of the CPUs • Total physical and swap memory Installed • System Clock • Uptime • Kernel Version • Percent CPU user, nice, system and idle • One, five and fifteen minute load averages • Number of running processes and total number of processes • Amount of free, shared, buffered, cached and swap memory

  11. Network Information • Network Interfaces • /proc/dev or CTL_NET/AF_LINK • SNMP: interface.ifTable • ARP cache – direct neighbors • /proc/arp or RTF_LLINFO • SNMP: ip.ipNetToMediaTable • Route Table • /proc/route or NET_RT_DUMP • SNMP: ip.ipRouteTable

  12. Information Analysis • CCU  a relational database • Front end, parsing engine • Individual Node Analysis • Collaborative Analysis

  13. Parsing Engine • IPACKET is in standard XML format • IPACKET use incremental update, new packet specifies differences from previous packet. • Parsing Engine parses the IPACKET into objects and does the insertion to iface accordingly. • <ID ??> <DATATYPE> DATA </DATATYPE></ID>

  14. IFACE Tables • The client node register an unique nodeid in iface_node_table • It starts a session for reporting information to CCU • Each time, client node reports information by sending up an information packet. (ipacket) • CCU process this packet, create an unique statement id from iface_index_table and parse information into each iface_?DATA_table.

  15. Individual Node Analysis • Clean up iface_?data_table by transferring and categorizing data into each nodes’ own data table. • A background process runs on CCU. • Examples: • Network statistic overtime table • Network route change reporting • Network usage of nodes. (packets, tcp/udp connection counts) • Node’s system state overtime table • Node’s configuration change table

  16. Collaborative Analysis • Group up specify information in the iface_?data_tables and ninfo_?data_tables to generate special tables for user viewing/analysis. • Examples • Network connectivity graph • Network graph between two node or route • Network snapshot table • All nodes’ current network statistic table • All nodes’ current state table

  17. Interface to View Analysis • Web interface • Viewable under web browser • Web session • Display analysis • Take action input from user • Java Servlet + JSP • Security control • Data Objects map with tables in collaborative analysis

  18. Action • Daemon capable of receiving and executing binary programs from CCU • Command module issues command in response to certain events • Add pseudo interface to a host • Reroute a host • Initialize new program • Etc.

  19. Security • OpenSSL encryption • EchoMe Daemon Run as nobody • System Modification Program needs to do suexec (ROOT PASSWORD requires)

  20. System Stat Table

  21. Transcripts for SNMP Router Discovery …… Iterating neighbors of .... IP address: IP address: IP address: IP address: IP address: IP address: IP address: IP address: IP address: IP address: In getIPRouteTable. nHops = 8 Setting target to ……

  22. Partial Router Connectivity on Millennium Discovered by SNMP

  23. Conclusion • Information collection methods feasible • Automatic discovery • Comprehensive and accurate information about system • Needs user feedback

  24. Future Work • More (or less) features based on user feedback • User interface • More on information analysis and decision making • Fully deploy on millennium

More Related