1 / 35

O.S.C.A.R.

O.S.C.A.R. Open Source Cluster Applications Resources. Overview. What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions. History. CCDK (Community Cluster Development Kit) OCG (Open Cluster Group) OSCAR (the Open Source Cluster Application Resource)

telma
Download Presentation

O.S.C.A.R.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. O.S.C.A.R. Open Source Cluster Applications Resources

  2. Overview • What is O.S.C.A.R.? • History • Installation • Operation • Spin-offs • Conclusions

  3. History • CCDK (Community Cluster Development Kit) • OCG (Open Cluster Group) • OSCAR (the Open Source Cluster Application Resource) • IBM, Dell, SGI and Intel working closely together • ORNL – Oak Ridge National Laboratory

  4. First Meeting • Tim Mattson and Stephen Scott • Decided on these: • That the adoption of clusters for mainstream, high-performance computing is inhibited by a lack of well-accepted software stacks that are robust and easy to use by the general user. • That the group embraces the open-source model of software distribution. Anything contributed to the group must be freely distributable, preferably as source code under the Berkeley open-source license. • That the group can accomplish its goals by propagating best-known practices built up through many years of hard work by cluster computing pioneers.

  5. Initial Thoughts • Differing architectures (small, medium, large) • Two paths of progress, R&D and ease of use • Primarily for non-computer-savvy users. • Scientists • Academics • Homogeneous system

  6. Timeline • Initial meeting in 2000 • Beta development started the same year • First distribution, OSCAR 1.0 in 2001 at LinuxWorld Expo in New York City • Today up to OSCAR 5.1 • Heterogeneous system • Far more robust • More user friendly

  7. Supported Distributions – 5.0

  8. Installation • Detailed Installation notes • Detailed User guide • Basic idea: • Configure head node (server) • Configure image for client nodes • Configure network • Distribute node images • Manage your own cluster!!

  9. Head Node • Install by running ./install_cluster eth1 script • GUI will auto-launch • Chose desired step in GUI, make sure each step is complete before proceeding onto next one • All the configuration can be done from this system from now on

  10. Subversion is used Default is the OSCAR SVN Can set up custom SVN Allows for up to date installation Allows for controlled rollouts of multiple clusters OPD also has powerful command line functionality (LWP for proxy servers) Download

  11. Select & Configure OSCAR packages • Customize server up to your liking/needs • Some packages can be customized • This step is very crucial, choice of packages can affect performance as well as compatibility

  12. Installation of Server Node • Simply installs packages which were selected • Automatically configures the server node • Now the Head or Server is ready to manage, administer and schedule jobs for it’s client nodes

  13. Build Client Image • Choose name • Specify packages within the package file • Specify distribution • Be wary of automatic reboot if network boot is manually selected as default

  14. Building the Client Image …

  15. Define Clients • This step creates the network structure of the nodes • It’s advisable to assign IP based on physical links • GUI short-comings regarding multiple IP spans • Incorrect setup can lead to an error during node installation

  16. Define Clients

  17. Setup Networking • SIS – System Installation Suite • SystemImager • MAC addresses are scanned for • Must link a MAC to a node • Must select network boot method (rsync, multicast, bt) • Must make sure clients support PXE boot or create boot CDs • Own Kernel can be used if the one supplied with SIS does not work

  18. Client Installation and Test • After the network is properly configured, installation can begin • All nodes are installed and rebooted • Once the system imaging is complete, a test can be run to ensure the cluster is working properly • At this point, the cluster is ready to begin parallel job scheduling

  19. Operation • Admin packages are: • Torque Resource Manager • Maui Scheduler • C3 • pfilter • System Imager Suite • Switcher Environment Manager • OPIUM • Ganglia

  20. Operation • Library packages: • LAM/MPI • OpenMPI • MPICH • PVM

  21. Torque Resource Manager • Server on Head node • “mom” daemon on clients • Handles job submission and execution • Keeps track of cluster resources • Has own scheduler but uses Maui by default • Commands are not intuitive, documentation must be read • From OpenPBS • http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:administration_guide:ch4.1.1_torque_overview

  22. Maui Scheduler • Handles job scheduling • Sophisticated algorithms • Customizable • Much literature on it’s algorithms • Has a commercial gen. of Maui called Moab • Accepted as the unofficial HPC standard for scheduling • http://www.clusterresources.com/pages/resources/documentation.php

  23. C3 - Cluster Command Control • Developed by ORNL • Collection of tools for cluster administration • Commands: • cget, cpush, crm, cpushimage • cexec, cexecs, ckill, cshutdown • cnum, cname, clist • Cluster Configuration Files • http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:administration_guide:ch4.3.1_c3_overview

  24. pfilter • Cluster traffic filter • Default is that client nodes can only send outgoing communications, outside the scope of the cluster • If it is desirable to open up client nodes, pfilter config file must be modified

  25. System Imager Suite • Tool for network Linux installations • Image based, can even chroot into image • Also has database which contains cluster configuration information • Tied in with C3 • Can handle multiple images per cluster • Completely automated once image is created • http://wiki.systemimager.org/index.php/Main_Page

  26. Switcher Environment Manager • Handles “dot” files • Does not limit advanced users • Designed to help non-savvy users • Has guards in place that prevent system destruction • Which MPI to use – per user basis • Operates on two levels: user and system • Modules package is included for advanced users (and used by switcher)

  27. OPIUM • Login is handled by the Head node • Once connection is established, client nodes do not require authentication • Synchronization run by root, at intervals • It stores hash values of the password in .shh folder along with a “salt” • Password changes must be done at the Head node as all changes propagate from there

  28. Ganglia • Distributed Monitoring System • Low overhead per node • XML for data representation • Robust • Used in most cluster and grid solutions • http://ganglia.info/papers/science.pdf

  29. LAM/MPI • LAM - Local Area Multicomputer • LAM initializes the runtime environment on a select number of nodes • MPI 1 and some of MPI 2 • MPICH2 can be used if installed • Two tiered debugging system exists: snapshot and communication log • Daemon based • http://www.lam-mpi.org/

  30. Open MPI • Replacement for LAM/MPI • Same team working on it • LAM/MPI relegated to upkeep only, all new development in Open MPI • Much more robust (OS, schedulers) • Full MPI-2 compliance • Much higher performance • http://www.open-mpi.org/

  31. PVM – Parallel Virtual Machine • Same as LAM/MPI • Can be run outside of the scope of Torque and Maui • Supports Windows nodes as well • Much better portability • Not as robust and powerful as Open MPI • http://www.csm.ornl.gov/pvm/

  32. Spin-offs • HA-OSCAR - http://xcr.cenit.latech.edu/ha-oscar/ • VMware with OSCAR - http://www.vmware.com/vmtn/appliances/directory/341 • SSI-OSCAR - http://ssi-oscar.gforge.inria.fr/ • SSS-OSCAR - http://www.csm.ornl.gov/oscar/sss/

  33. Conclusions • Future Direction • Open MPI • Windows, Mac OS?

More Related