240 likes | 372 Views
CE + WN installation and configuration. Vanessa Hamar Universidad de Los Andes – Mérida, Venezuela 12 th EELA Tutorial Lima, 24-29 September,2007. Outline. What is a Computing Element (CE) ? What is a Torque Server ? What is a Worker Node?
E N D
CE + WN installation and configuration Vanessa Hamar Universidad de Los Andes – Mérida, Venezuela 12th EELA Tutorial Lima, 24-29 September,2007
Outline • What is a Computing Element (CE) ? • What is a Torque Server ? • What is a Worker Node? • How to install and configure a Computing Element with Torque Server. • How to install and configure a Worker Node with Torque
What is CE? • The CE is a service representing a computing resource. • Its main functionality is job management (job submission, job control, etc.). • For job submission, the CE can work in: • push model (where the job is pushed to a CE for its execution). • pull model (where the CE asks the WMS for jobs).
What is Torque? • TORQUE(Tera-scale Open-source Resource and QUEue management) is a resource management providing control over batch jobs and distribuited compute resource. • The Torque System is composed by a: • pbs_server which provides the basic batch services such as receiving/creating a batch job or protecting the job against system crashes. • job_scheduler which contains the site's policy used to decide which job must be executed. • pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user.
What is a Worker Node? • The Worker Node (WN) is a set of clients required to run jobs sent by the CE via the Local Resource Management System. It currently includes the: • gLite I/O Client, • the Logging and Bookkeeping Client, • the R-GMA Client and • the WMS Checkpointing library.
Installing CE + Torque Server WN + Torque
Preliminary and common steps • Start from an instalation of SLC 3.0.8 • Install JAVA SDK • Remove LAM and Postfix • Check the hostname • Install and configure ntp daemon • Install X.509 host certificates /etc/grid-security and check their file permissions. • Install the latest version of glite-yaim • Install the middleware
Installing pre-requisites • JAVA is not included in distribution. Install it separately (>= 1.4.2_08) • apt-get install j2sdk
Installing pre-requisites • Depending on the packages set you selected when installing the operating system, it may be possible that lam package is installed on your WN. Please remove lam. apt-get remove lam • There is a known installation conflict between the 'torque-clients' rpm and the 'postfix' mail client (Savannah. bug #5509). If you are going to install Torque, uninstall postfix package apt-get remove postfix
Installing pre-requisites • Check the FQDN hostname • Ensure that the hostnames of your machines are correctly set. Run the command: hostname -f
Installing pre-requisites • Syncronization among all gLite nodes is mandatory. Install ntp if not already available for your system: • apt-get install ntp • Add your time server in /etc/ntp.conf • restrict <time_server_IP_address> mask 255.255.255.255 nomodify notrap noquery • server <time_server_name> • (you can use ntp-1.infn.it – IP 193.206.144.10) • Edit /etc/ntp/step-tickers adding your(s) time server(s) hostname • If you are running a firewall, you will have to allow inbound comminication on the NTP port: • -A INPUT -s <NTP-serverIP-1> -p udp --dport 123 -j ACCEPT • Activate the ntpd service with the following commands: • ntpdate <your ntp server name> • service ntpd start • chkconfig ntpd on • You can check ntpd’s status with: • ntpq -p
Installing pre-requisites • Install glite-yaim • apt-get install glite-yaim-core • apt-get install glite-yaim-clients
Installing pre-requisites • Request host certificates for the CE to a CA • Copy host certificate (hostcert.pem and hostkey.pem) in /etc/grid-certificates. • Change the permisions • chmod 644 hostcert.pem • chmod 400 hostkey.pem
Installing CE+Torque Server via apt • All the configuration values to sites have to be configured in a site configuration file using key-value pairs. • This file is shared among all the different gLite node types. So edit once and keep it in a safe place • Create a copy of /opt/glite/yaim/examples/site-info.def template (coming from the glite-yaim-core package) to your reference directory for the installation (e.g. /root/siteinfo): • cp /opt/glite/yaim/examples/site-info.def /root/siteinfo/site-info.def • A good syntax test for your site configuration file is to try to source it manually running the command: • source site-info.def
Installing CE+Torque Server via apt • The configuration is stored in a directory structure which will be extended in the near future. Currently the following files are used: site-info.def and the vo.d directory.
Installing CE+Torque Server via apt • The /root/siteinfo/vo.d directory • Each file name in this directory has to be the lower-cased version of e VO name defined in site-info.def. The matching file should contain the definitions for that VO and will overwrite the ones which are defined in site-info.def. • SW_DIR=$VO_SW_DIR/eela DEFAULT_SE=$CLASSIC_HOST STORAGE_DIR=$CLASSIC_STORAGE_DIR/eela
Installing CE+Torque Server via apt • vi /opt/glite/yaim/etc/wn-list.conf limaXX.ring.pucp.edu.pe limaXX.ring.pucp.edu.pe …..
Installing CE+Torque Server via apt • Install the node • /opt/glite/yaim/bin/yaim -i -s /root/siteinfo/site-info.def -m glite-CE • Configure the node • /opt/glite/yaim/bin/yaim -c -s /root/siteinfo/site-info.def -n lcg-CE_torque -n MPI_CE -n BDII_site
Installing CE+Torque Server via apt • If the installation is performed successfully, the following components are installed: • gLite in /opt/glite • Condor in /opt/condor-x.y.x (where x.y.z is the current condor version) • Globus in /opt/globus • Tomcat in /var/lib/tomcat5 • Torque in /var/spool/pbs
Installing CE+Torque Server via apt • Edit /etc/ssh/sshd_config and add the following lines at the end: HostbasedAuthentication yes IgnoreUserKnownHosts yes IgnoreRhosts yes • Restart the server with: /sbin/service sshd restart
Installing CE+Torque Server via apt • On the CE generate an updated version of /etc/ssh/ssh_know_hosts by running: • edg-pbs-shostsequiv • edg-pbs-knownhosts • Copy that file into all the WorkerNodes.
Installing WN Server via apt Install the node /opt/glite/yaim/bin/yaim -i -s /root/siteinfo/site-info.def -m glite-WN -m glite-torque-client-config Configure the node /opt/glite/yaim/bin/yaim -c -s /root/siteinfo/site-info.def -n WN_torque
References • https://twiki.cern.ch/twiki/bin/view/LCG/GenericInstallGuide301 • https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide310