150 likes | 175 Views
Condor and DRBL. Bruno Gonçalves & Stefan Boettcher Emory University. Motivation. Maximize computing power while minimizing costs Optimize the use of the resources that are already available Maximize resource availability
E N D
Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University
Motivation • Maximize computing power while minimizing costs • Optimize the use of the resources that are already available • Maximize resource availability • Permit peaceful coexistence with previously existing Operating Systems Condor Week 2006
Software • Fedora Core Linux http://fedora.redhat.com/ • Other distributions can be used as well • Diskless Remote Book on Linux (DRBL) http://drbl.sourceforge.net • Condor clustering softweare http://www.cs.wisc.edu/condor/ Condor Week 2006
Hardware • Server (complete machine) • Large HDD • Several network cards • Client (stripped down machine) • CPU • RAM • Network Card Condor Week 2006
DRBL • Uses PXE or Etherboot to let clients boot through the network • All files can be located at the server and accessed via NFS (clients don’t need harddrives!) • Server only provides file sharing and user authentication, all software uses the clients own resources to run Condor Week 2006
DRBL Installation (I) # drblsrv -i • Updates the system (similarly to “up2date”, etc…) • Makes sure relevant services (dhcpd, NFS, NIS, tftpboot, etc..) are installed • Configures necessary services • Selects the kernel to be used by clients Condor Week 2006
DRBL Installation II # drblpush -i • Which network interfaces to use • Client booting options (text/gui) • How many clients and hostnames • MAC address to IP/hostname binding (if any) • “Pushes” all the configurations to the clients (creating new clients if necessary) • Needs to be run anytime we want to change the structure of the cluster Condor Week 2006
Structure Internet DRBL server/Firewall Central Manager 192.168.110.x 192.168.120.x Compute nodes Condor Week 2006
Condor Installation # ./condor_install • All machines share the same password files • All filesystems are NFS mounted and shared between all the machines • Configure condor for all DRBL clients even nonexistent ones. Condor Week 2006
Dedicated Cluster • Number of configured clients can be larger than number of machines (easily add more machines) • Clients boot to text mode • Condor configured for dedicated resources Condor Week 2006
Windows Computer Lab • Number of nodes should correspond to number of machines • MAC address binding can be used for extra security • Nodes can PXEBoot when they’re available for computation (evening / holidays / vacations) and go back to windows when strictly necessary (morning) • Condor’s checkpointing (and flocking) utilities allow for jobs to be ran in whichever resources are available at a given time Condor Week 2006
Centralized Cluster management • drbl-doit • Run command on all clients • drbl-cp-host, drbl-rm-host • cp/rm file or directory to all clients • drbl-useradd, drbl-userdel • add/del user accounts • drbl-client-service • Control services on clients (drbl-client-service condor start) Condor Week 2006
Advantages • Flexible • Easily add and remove machines (plug and play) • Usable for both dedicated and opportunistic clustering • Stable • Running for months without problems even with nodes being added, removed and upgraded • Both clients and server can be rebooted without (too much) harm • Efficient • “Biggest bang for your buck” Condor Week 2006
Disadvantages • Not ideal for IO intensive applications (NFS overhead) • Communication between nodes on different subnets are routed through server • All communication with outside world has to go through server Condor Week 2006
The End Questions? Suggestions?