Upgrade the cluster to SL5

From ATLAS-TRIUMF

Jump to: navigation, search

[edit] Upgrading from SL4 to SL5

I now have a kickstart for SL5.5 (one for 32-bit, one for 64-bit). They are based on Kel's kickstarts for TRIUMF kickstart. (to burn: cdrecord -scanbus or cdrecord dev=ATAPI -scanbus to find the cdwriter device name, then cdrecord -dev=6,0,0 triumf-ks-boot.iso, or whatever the device was); to make the .iso with a custom .cfg, use Kel's script: /triumfcs/trshare/kray/kickstarts/sl54-x86_64/make-ksImage

Before running the kickstart I rsync /etc /var and /root.

When I ran the kickstart, I selected "custom layout" and kept the original data and home partitions with their original mount points (which I first identified with df to make sure I was keeping the correct /dev/sdxy partitions) and reformatted the / and /boot partitions.

I left the default firewall disabled (I run my own later) and switch SELinux to Disabled (nasty Athena problems if you don't do that...). I did not enable kdump. I left the TRIUMF default time setup.

I use network login and enable and configure NIS (see Local cluster setup notes for parameters).

I therefore don't need to create any local accounts to log in as myself.

I log on as myself, and do su -l. Then I restore the original /etc/ssh contents so I don't get those "keys have changed" messages using ssh. I also restore the original /etc/exports, and copy in /etc/hosts.allow from a good SL5 node.

I copy in the /etc/krb5.conf and /etc/gmond.conf files from nodes that are already set up in SL5.

I install ganglia-gmond, nrpe and nagios and turn them on.

I do chkconfig nfs on and start nfs (for some reason I forgot this in the kickstart).

I make root mail go to atlast3admin and turn on logwatch.

I comment out the last line in /etc/hosts.

[edit] Below here is purely historical

Athena production cache kits from 15.6.3 onward are only being distributed in SL5, so it is time to upgrade the cluster.

This is the current status of the ATLAS machines.

My first upgrade was my desktop machine, isabel1.

I started from the TRIUMF SL5.4 64-bit kickstart. (to burn: cdrecord -scanbus or cdrecord dev=ATAPI -scanbus to find the cdwriter device name, then cdrecord -dev=6,0,0 triumf-ks-boot.iso, or whatever the device was); to make the .iso with a custom .cfg, use Kel's script: /triumfcs/trshare/kray/kickstarts/sl54-x86_64/make-ksImage

When I ran the kickstart, I selected "custom layout" and kept the original data and home partitions with their original mount points (which I first identified with df to make sure I was keeping the correct /dev/sdxy partitions) and reformatted the / and /boot partitions.

I left the default firewall disabled (I run my own later) and switch SELinux to Disabled (nasty Athena problems if you don't do that...). I did not enable kdump. I left the TRIUMF default time setup.

I use network login and enable and configure NIS (see Local cluster setup notes for parameters).

I therefore don't need to create any local accounts to log in as myself.

There's an annoying but apparently harmless message when I log in that "$HOME/.dmrc is being ignored. User's $HOME must be owned by user and not writable by others". It does find my desktop, though. Access to the home directory (which is on this machine) does seem a bit slow, though.

It requires quite a bit of work to get the data and home partitions properly exported to the cluster, though I can see the rest of the cluster fine. The crucial thing is the /etc/exports file, which has to list the partitions I want to export (duh...). I run my iptables firewall script and edit /etc/hosts.allow and hosts.deny, as well as add the atlas mount to /etc/auto.mount

Installing openAFS is a bit of a pain.

  yum install openafs 
  yum install kernel-module-openafs--`uname -r` 
  yum install openafs-client 
  yum install openafs-krb5
  yum install krb5-workstation
  yum install pam_krb5

Edit /etc/krb5.conf and /usr/vice/etc/ThisCell to put in the CERN stuff (copy it from another node...).

Don't forget all the chkconfig stuff for AFS and NFS services (sorry for lack of precision, my documents are full of swear words here and a bit hard to read).

Reboot the machine to get all the AFS and NFS stuff working.... yes, I know you should be able to restart the services individually, but it was just much easier to reboot.

Firefox out of the box is 3.0.15 which is good, except that the Java version I get from the repositories is out by one minor release from the most recent one and a lot of Java things whine. But for the first time EVER on this computer I can get sound with EVO out of the box! Note: this is because the TRIUMF kickstart replaces the default Java RPMs with versions rebuilt at TRIUMF which work properly. Kel has since updated the TRIUMF Java version and Java is now completely happy with NO intervention from me, which is fantastic.

Thunderbird is 2.0.0.22 which is not so good; I have to reinstall Lightning 0.9 and Google Provider to get my calendars back (Lightning 0.9 is in one of the epel repos).

I do all the missing extras from the ATLAS-Canada SL5 setup. A lot are included in the TRIUMF kickstart.

I set up Nagios and Ganglia, again following the existing Local cluster setup notes.

I remove Openoffice 2 with yum and install Openoffice 3 from their website (tar file of rpms, just follow instructions) so that I can open .docx and .xlsx files. I get some error messages (java-runtime-machine-related) when I do this but at least I can open the files.

Then I installed NX Client and the FreeNX server.

Then, after much puzzling about why I wasn't getting log watch files:

  yum install logwatch

And one more thing to fix a problem with dq2-put:

 yum install compat-ldap

which I would not have had if I had REALLY done all the steps in the ATLAS-Canada SL5 setup which includes yum install compat-openldap.

For my personal desktop I also like to run cssh, so I edited rpmforge.repo:

 enabled = 1
 includepkgs = clusterssh perl-X11-Protocol 

and installed clusterssh.

Personal tools