1 / 13

Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group,

SCMS. Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer Engineering,Faculty of Engineering Kasetsart University Bangkok, Thailand Phone: (662) 942 8555 Ext.. 1416 Fax: (662) 5614621

dori
Download Presentation

Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCMS Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer Engineering,Faculty of Engineering Kasetsart University Bangkok, Thailand Phone: (662) 942 8555 Ext.. 1416 Fax: (662) 5614621 Email: pu@smile.cpe.ku.ac.th An Extensible Cluster Management Tool for Beowulf Cluster

  2. Motivation • Beowulf Cluster becomes one of the most widely used platform for high performance computing • Very large and complex Beowulf Cluster start to appear • System management is still a challenging task. There are needs for • The effective way to navigate and interact with cluster components. • Mechanism and tools to perform collective commands • Some services such as monitoring, fault detection and recovery • Special software tools that recognize special characteristics and needs of the cluster administration task

  3. SCMS: An Extensible Cluster Management Tool for Beowulf Cluster • A collection of system management tools for Beowulf Cluster • Package includes • Portable real-time monitoring • Parallel Unix command • Alarm system • Large collection of graphical user interface tools for users and system administrator • Checking user status • Remote software installation • System disk space and process space status • Boot up and shutdown nodes • Change node configuration remotely • Web/VRML interface • Current version 1.1 only support RedHat Linux

  4. Portable Real-time Monitoring • Provides a global access to node information • Interface to local OS and get node information • Collect the information to a single point • Provides heartbeat and node health diagnostic • Provides API for application to access the information. The API is available in C, Java, and TCL/TK . • System Architecture • Client/Server • Layered Architecture

  5. Configuration Management Task Scheduling Performance Monitoring Parallel Unix command Resource Management API ( C, TCL, Java) SMA System Information Repository CMA CMA CMA CMA CMA HAL API HAL LOCAL OS (LINUX) System Architecture • CMA - Control and Monitoring Agent • Get system information from local operating system on each node • Portability is achieved using HAL (Hardware Abstraction Layer) • SMA - System Management Agent • Running on management node to collect information from CMA • RMI - Resource Management Interface • Library that provides interface to functionality of SMA

  6. pps -aux command data data data command command ps -aux ps -aux ps -aux Parallel Unix Command • Parallel version of commonly used unix commands such as pps, pls, prm • Follows the scalable unix tool model (Lusk and Gropp 1994) • Graphical user interface for these commands • Ease of use • Filtering output data

  7. Notification/action Config Alarm Manager Detector Detector Detector Detector Alarm System • Set of daemons that monitor important system parameters • Processor utilization, Memory usage, Main board temperature and more • User can specify the condition to alarm and action to be taken • Issues the alarm and shutdown some part of the system if needed • Notification is sent using email. Future release will include pager, ICQ and speech synthesis

  8. SCMS Utilities SCMS Comes with many GUI utilities • Node status • Control Panel • Disk Space • Process Status • Shutdown/Reboot • Remote login • User status • Package Installation

  9. SCMS Screen Shot

  10. Web Generator Web Tree System Config Web server VRML World Generator VRML World External Network Real time Monitoring KCAP Web and VRML based Interface for SCMS • Two versions of Web Interface are available • KCAP : Normal web interface • KCAP-VR : VRML Interface that allows you to walk and interact with your cluster • Java Applet is used to report real-time system information

  11. KCAP and KCAP-VR Screen shot

  12. Application Application MPI Node OS Node OS Node OS Node OS KSIX (Kasetsart System Interconnect eXecutive) Node Hardware Node Hardware Node Hardware Node Hardware Interconnection Network Future Works • KSIX: A frame work to support parallel tools and applications • Offer features such as • process control, signal delivery • Naming services • Event based communication

  13. Remote Queue Task Node Allocator Task Queue Scheduler Cluster Nodes SQMS: SMILE Queuing Management System • Batch scheduler for sequential an parallel task • Static and dynamic load balancing • Reconfigurable scheduling policy • Auto docking between cluster Submitter

More Related