TRIUMFAtlasTier3
From ATLAS-TRIUMF
[edit] Tier-3 @ TRIUMF
For our local group's computing needs, we have a Tier-3 cluster, that currently consists of a:
- 2x20TB Raid-6 storage servers (2 and 4 cores respectively)
- 2 dual quad-core machines
- 4 dual hexa-core machines which each have 16 TB Raid-6 data disks
The storage server is configured with xrootd to serve the files and also acts as the head node for the PBS batch system.
The Tier-3 uses the ATLASLocalRootBase and is part of the TRIUMF ATLAS NIS cluster.
Storage Server
To log into the older storage nodes do: ssh atlas-tier3-ds02.triumf.ca (or ds03)
There are two visible disks with ~8TB each /data/ds02_1 and /data/ds02_2 that are Raid-6 configured. Please store your data there. So far there are no restrictions, so please use the available diskspace responsibly.
The new machines atlas-tier3-c[10-13].triumf.ca also contain considerable disk, aggregated into a /global filesystem using GlusterFS. This volume is available to all ds02/ds03 and c[8-13]. Please put your data in /global/username. Currently no quotas are enforced on this disk. If the 64TB starts to disappear, quotas may be trivially enforced.
Note that the /global volume collects data from /srv/data disks on all the c[10-13] machines. This means that /global/foo/bar file will be located at /srv/data/foo/bar on one of the machines. It is strongly encouraged to write to /global and not to /srv/data on c[10-13]. This is because writing to /global will enforce automatic load balancing at the time of writing, so that files are evenly distributed among the machines. Otherwise GlusterFS will periodically rebalance, which is a time-consuming task, and perplexing to the user who finds that a file has been moved from one machine to another.
To install a client to mount /global and gain access you need a 64-bit SL5 machine. The install glusterfs-core-3.2.2-1.x86_64.rpm and glusterfs-fuse-3.2.2-1.x86_64.rpm from the GlusterFS download pages. Issuing the command as root mkdir /global && mount -t glusterfs atlas-tier3-c10.triumf.ca:/global /global will create a /global mount.
Batch System
The Tier-3 uses the PBS batch system. Jobs are for now submitted from atlas-tier3-ds02 with qsub and monitored with qstat. For qstat, option -r gives you all running jobs, -a all jobs, -q all queues and -f jobid full detail of the corresponding job. Here is a nice reference for user commands etc. Currently there are three queues, short, medium and long with the following configuration:
Queue Memory CPU Time Walltime Node Run Que Lm State
long -- 48:00:00 72:00:00 -- 0 0 -- E R
medium -- 06:00:00 12:00:00 -- 0 0 -- E R
short -- 00:15:00 00:30:00 -- 0 0 -- E R
The 16 nodes of the two new dual quad-core machines are currently included in the cluster and their status can be checked with pbsnodes:
atlas-tier3-c8.triumf.ca
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux atlas-tier3-c8.triumf.ca 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 13:05:22 CST 2009 x86_64,
sessions=? 0,nsessions=? 0,nusers=0,idletime=2500258,totmem=5945064kb,availmem=5838544kb,physmem=16430832kb,ncpus=8,
loadave=0.00,netload=39541887,state=free,jobs=? 0,rectime=1240527393
atlas-tier3-c9.triumf.ca
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux atlas-tier3-c9.triumf.ca 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 13:05:22 CST 2009 x86_64,
sessions=? 0,nsessions=? 0,nusers=0,idletime=142580,totmem=5945064kb,availmem=5820212kb,physmem=16430832kb,ncpus=8,
loadave=0.00,netload=66485121,state=free,jobs=? 0,rectime=1240527354
This is an example for job submission I recently tested for parallel processing of a cosmic ray sample using ten nodes.
Example
The submission script submit_job.sh takes three inputs, the list of input files, the first and last section. It calls the second script run_job_cosmic.sh, which contains the actual submission for the individual PBS jobs.
E.g. In the example I do: ./submit_job.sh cosmic_list.txt 1 10
In the example, the data is read from the storage server disk and the output is written to the local /tmp on the worker nodes to reduce IO. At the end of the job the output is copied to the storage server. The /tmp disk has 100 GB, so please keep your temporary job output below ~10 GB and don't forget to clean up at the end of your job.
If you are interested in running a vnc client, here are instructionson how to do that.

