Local cluster setup notes
From ATLAS-TRIUMF
Some notes on our local cluster as it is set up. I will try to organize them better as I understand more.
Contents |
[edit] To add a user:
- Create a home directory /srv/home/<username> on one of the local machines, chown UID:UID /srv/home/<username> where UID is the UID you are going to use for the user. Copy from /etc/skel to /srv/home/<username> at least the .bashrc file.
- If you want to match a trshare user id, try and make sure you create the account with both the correct UID and the same number as the GID. If the UID and GID do not match, logins across the cluster will get messy and confusing...
- It is very important to make sure you create the new user's home directory NOT in the default /home/<username> but rather in the physical local home directory, /srv/home/<username>
- Log on (as myself) to the cluster master machine
- cd NISfiles
- svn update
- make useradd (and follow the instructions to add the user)
- as root, "service autofs reload"
- as root on all clients, "service autofs reload"
[edit] To add a node:
- logout all users (/home will be unmounted so this is really important)
- log in as root
- umount /home
- umount /data (or whatever your scratch area that you are planning to share is called)
- mkdir /srv/data
- mkdir /srv/home
- put /srv/data and /srv/home in /etc/fstab
- mount /srv/data and /srv/home
- edit /etc/exports
- edit /etc/hosts.allow (amandad, portmap, mountd)
- edit /etc/hosts.deny
- on atlascm (as myself):
- cd NISfiles
- edit atlas-hosts.txt
- edit auto.data
- add new client
- make commit
- make install
- make clean
- back on the new node, as root:
- chkconfig nfs on
- authconfig : check "use NIS", on the next page it will ask for NIS settings which should be: domain - atlas.triumf.ca, server - atlascm.triumf.ca (for SL5, this is command-line: authconfig --enablenis --nisdomain atlas.triumf.ca --nisserver atlascm.triumf.ca --update ; then, and Kel says this is actually NOT needed for us, vi /etc/nsswitch.conf and correct hosts: files nis dns back to hosts: files dns)
- service nfs start
- ypmatch atlas_hosts netgroup
- service ypbind restart
- service autofs reload
Then check the firewall settings and optionally set up nagios and ganglia checks for the node.
[edit] To add a node in ganglia:
- yum install ganglia-gmond
- scp yourusername@arthurpc:/etc/gmond.conf /etc/gmond.conf
- /etc/init.d/gmond stop
- chkconfig --del gmond
- chkconfig --level 345 gmond on
- chkconfig --list | grep gmond
- /etc/init.d/gmond start
[edit] To add a node in nagios:
Install nagios (do this on all nodes which require monitoring other than ping)
edit /etc/yum.repos.d/epel.repo
< enabled=1 --- > enabled=0 9d8 < includepkgs= nagios* nrpe
yum install nagios yum install nrpe # for remote checking yum install nagios-plugins-{users,ping,procs,load,disk,nrpe} yum install nagios-plugins-{http,ssh,ftp,mysql} # may as well have these
Add to /etc/hosts.allow:
- --- for nrpe ---
nrpe: localhost atlas-tier3-ds01.triumf.ca ( for machines other than ds01 but should be ok for ds01 also,
service nrpe start chkconfig nrpe on
)
--Isabel 13:54, 31 October 2007 (PST)

