How to setup a Cronus User Interface
From ATLAS-TRIUMF
Contents |
[edit] Introduction
Cronus is the ATLAS code name for a distributed workload management system based on Condor glide-in technology. It was pioneered by the CDF CAF group, and has been in use within ATLAS production since late 2006. This User Interface installation will enable easy access for user analysis, and is intended to allow evaluation for the user. Once installed, it interfaces seemlessly to Ganga.
[edit] What am I getting myself into?
You will be running a personal Condor Scheduler(Schedd) which will maintain a queue of your jobs and pass them on to a 'regional' Scheduler for processing. A short while after submission ,your personal Schedd can be switched off, and it will get updated status information the next time it is started. The Schedd must be restarted on the same machine(or rather file system). Initially, for testing, there will only be one regional Schedd. The requirements for this machine are only that it stays up and on the network most of the time.
[edit] Requirements
- An X509 user certificate - part of the ATLAS VO and the Canada VOMS group(optional to start with)
- A range of 100 inbound ports must be open in any firewall. If the regional Schedd is on-site this probably is not an issue.
- bash - I don`t do csh but you can adapt if you like
[edit] Installation instructions
- First get a proxy with grid-proxy-init
- Download the installation script UIinstall.sh
- $ chmod u+x UIinstall.sh
- $ ./UIinstall.sh
- $ cd condor-6.9.1_UI
- $ source setup.sh
- $ condor_master
[edit] Testing
- Now condor_status and condor_q should work
- $ cd condor_var/examples/
- $ condor_submit helloWorld.jdl
- You should see the state change Q -> R -> C and then output in output/N.0.out
- "condor_q -l" shows many things
- GridJobId = "condor rodvm01.triumf.ca rodvm00.triumf.ca 2807.0"
- Job has reached the regional scheduler and is in it`s queue with id 2807
- "condor_q -name rodvm01.triumf.ca -l 2807" to see detail in that queue
- GridJobId = "condor rodvm01.triumf.ca rodvm00.triumf.ca 2807.0"
[edit] More fun
- Graphical usage Monitor
- "condor_userprio" to see usage shares and priorities. Lowest "effective priority" gets next job start.

