LArMonTools Tips

From ATLAS-TRIUMF

Jump to: navigation, search

[edit] LArMonTools Release Information

Overview of Releases

A overview of the release can be found at Here

All development is currently in the Trunk.

[edit] LArMonTools Performance

Memory and CPU time for LArMonTools as a function tag Here .

Memory and CPU Time for LArMonTools as a function of tool Here .

[edit] LArMonTools Coding and Validation Information

This section is intended to provide information on tools for validation and debugging. The Software Development work book can be found at (https://twiki.cern.ch/twiki/bin/view/Atlas/SoftwareDevelopmentWorkBook). It is strongly recommended that all developers and managers read the ATLAS coding standards (https://twiki.cern.ch/twiki/bin/view/Atlas/CodingStandards).

Information on how to use SVN can be found at (https://twiki.cern.ch/twiki/bin/view/Atlas/SoftwareDevelopmentWorkBookSVN)

Note: make sure you LD_LIBRARY_PATH contains your test_area.

1) The standard validation recipe for Rel 15 and 16 https://twiki.cern.ch/twiki/bin/view/Atlas/RecoRealData. One should test their code in the appropriate nightly (generally the most recent nightly that compiled for AtlasTier0 release 15.5.X.Y). The required validations typically include one cosmic and one collision simulation transform.

Reco_trf.py AMI=q120
Reco_trf.py AMI=q121

Please double check the standard validation recipe web page (https://twiki.cern.ch/twiki/bin/view/Atlas/RecoRealData) for the most recent procedure.

Also one should compile their code with the options used in the nightly compilation:

gmake -j6 PEDANTIC=1 VERBOSE=1

2) Monitoring of CPU time and Memory done can be done with perfmon. An example is:


RAWtoESD_trf.py inputBSFile=/afs/cern.ch/atlas/project/rig/data/data10_7TeV.00153565.physics_L1CaloEM.merge.RAW._lb0420._0001.1 maxEvents=250 autoConfiguration=everything --athenaopts=--stdcmalloc preExec=rec.doDetailedPerfMon=True,,rec.doNameAuditor=True,,from@PerfMonComps.PerfMonFlags@import@jobproperties@as@pmjp,,pmjp.PerfMonFlags.doPostProcessing=True outputDQMonitorFile=mymon.root outputESDFile=myesd.pool.root outputMuonCalibNtup=mymuoncalib.root outputNTUP_TRKVALIDFile=mytrkvalidntup.root outputTAGComm=mytagcom.pool.root


2) Here is a useful script to run on BS to ESD with only the minimum required for LArMonTools (Note: ESD to AOD is not intended to work and the input BS file name may need to be changed). The perfmon output can be found in the results_bstoesd. Additional job options can be included in the Reco_trf.py by adding them to the preInclude list in the transform arguments.

#!/bin/bash
for i in bstoesd esdtoaod merged; do
    if [ -d "results_${i}/" ];then
	echo "Remove results_$i before starting."
	exit 1
    fi
done
touch starttime
Reco_trf.py inputBSFile=/scratchdisk2/inugent/daq.ATLAS.0091900.physics.IDCosmic.LB0001.SFO-2._0001.data skipEvents=0 maxEvents=-1 trigStream=IDCosmic beamType=cosmics conditionsTag=COMCOND-ES1C-001-01 geometryVersion=ATLAS-GEO-03-00-00 outputESDFile=myESD.pool.root HIST=myMergedMonitoring.root preExec=rec.abortOnUncheckedStatusCode=False,,rec.doTrigger=False,,rec.doInDet=False,,rec.doMuon=False,,rec.doEgamma=False,,rec.doJetMissingETTag=False,,rec.doMuonCombined=False,,rec.doTau=False,,rec.doMonitoring=True,,rec.doPerfMon=True,,rec.doDetailedPerfMon=True,,DQMonFlags.doGlobalMon=False,,DQMonFlags.doTileMon=False,,DQMonFlags.doCaloMon=False,,DQMonFlags.doMuonCombinedMon=False preInclude=RecExCommon/RecoUsefulFlags.py,RecExCommission/MinimalCommissioningSetup.py,RecJobTransforms/UseOracle.py,RecJobTransforms/debugConfig.py --ignoreunknown --athenaopts='--stdcmalloc'

touch endtime
for i in $PARTS; do
    if [ ! -f ntuple_${i}.pmon.gz -o ntuple_${i}.pmon.gz -ot starttime ]; then
	echo "Job didn't finish like it should (problems in ${i})"
	exit 1
    fi
done

MONFILE_bstoesd=Monitor.root
if [ ! -f $MONFILE_bstoesd -o $MONFILE_bstoesd -ot starttime ]; then
    echo "Did not find bstoesd monitor output file ($MONFILE_bstoesd): "
    exit 1
fi

THISDIR=$PWD

for i in bstoesd; do
    RESULTDIR=$THISDIR/results_${i}
    mkdir $RESULTDIR
    PMONFILE=ntuple_${i}.pmon.gz
    cp $THISDIR/$PMONFILE $RESULTDIR
    cd $RESULTDIR
    perfmon.py $PMONFILE
    PMONSUMMARY=ntuple_$i.perfmon.summary.txt
    if [ $? != 0 -o ! -f $PMONSUMMARY ]; then
	echo "Problems in perfmon.py $PMONFILE"
	cd $THISDIR
        exit 1
    fi
    /afs/cern.ch/user/s/sschaetz/public/monmanagers.sh $PMONSUMMARY
    if [ ! -f monmanagers.txt -o ! -f monmanagers_CPU.txt ];then
	echo "Problems in producing monmanagers summary files for $i step."
	cd $THISDIR
        exit 1
    fi
    cd $THISDIR
done
cp $MONFILE_bstoesd $THISDIR/results_bstoesd

touch reallyendtime

3) A job option file for running LArRODMonTool.

from LArROD.LArRODFlags import larRODFlags
larRODFlags.doDSP.set_Value_and_Lock(True) # T
larRODFlags.readRawChannels.set_Value_and_Lock(True) # T
from LArConditionsCommon.LArCondFlags import larCondFlags
larCondFlags.useShape.set_Value_and_Lock(True)
# do single Version ?
larCondFlags.SingleVersion.set_Value_and_Lock(True)
larCondFlags.OFCShapeFolder.set_Value_and_Lock("")

from LArROD.LArRODFlags import larRODFlags
larRODFlags.readDigits.set_Value_and_Lock(True)

4) A modified version of the online job options which does not include any other sub detector.

# -- About data format
# -- I believe it should not be our job to set these flags.
# -- They should be retrieved automatically online. Same for BField.
from AthenaCommon.GlobalFlags import globalflags
globalflags.DataSource.set_Value_and_Lock('data')
globalflags.InputFormat.set_Value_and_Lock("bytestream")
globalflags.ConditionsTag.set_Value_and_Lock('COMCOND-MONC-001-00')
globalflags.DetDescrVersion.set_Value_and_Lock('ATLAS-GEO-03-00-00')


# -- Data type
from AthenaCommon.BeamFlags import jobproperties
jobproperties.Beam.beamType = 'cosmics'


# -- Detector flags
from AthenaCommon.DetFlags import DetFlags
DetFlags.all_setOff()  #Switched off to avoid INDET geometry problems
DetFlags.Calo_setOn()  
DetFlags.digitize.all_setOff()

# -- Output flags
from RecExConfig.RecFlags import rec
rec.doESD = True
rec.doAOD = False
rec.doCBNT = False
rec.doJiveXML = False
rec.doWriteESD = False
rec.doWriteAOD = False
rec.doWriteTAG = False

rec.readESD = False
rec.readRDO = True
from AthenaCommon.AthenaCommonFlags import athenaCommonFlags
athenaCommonFlags.BSRDOInput = [
'/castor/cern.ch/grid/atlas/DAQ/2009/00127963/physics_L1Calo/data09_calophys.00127963.physics_L1Calo.daq.RAW._lb0000._SFO-1._0001.data'
 ]
athenaCommonFlags.EvtMax = 500
athenaCommonFlags.SkipEvents = 0


# -- Reco flags
rec.Commissioning = False
rec.doTruth = False
rec.doInDet = False
rec.doMuon = False
rec.doMuonCombined = False # if True, crash in LArMuId from ImpactInCaloAlgInDet
rec.doEgamma = False
rec.doTau = False
rec.doJetMissingETTag = False
rec.doLArg = True
rec.doTile = True

# -- Monitoring flags
rec.doMonitoring = True

from AthenaMonitoring.DQMonFlags import DQMonFlags
DQMonFlags.doGlobalMon = False
DQMonFlags.doEgammaMon = False
DQMonFlags.doMissingEtMon = False
DQMonFlags.doTauMon = False
DQMonFlags.doTileMon = False
DQMonFlags.doJetTagMon = False

# -- Debug flags
rec.doPerfMon = True
rec.doDetailedPerfMon = True
rec.doDumpProperties = True
rec.abortOnUncheckedStatusCode.set_Value_and_Lock(False)
rec.OutputLevel = INFO

################################
# LAR/CALO MONITORING SWITCHES #
################################
if 'CONFIG' not in dir():
    CONFIG = 'LArMon'

print "DEBUG: CONFIG =", CONFIG

if CONFIG=='CaloMon':
    DQMonFlags.doLArMon = False
    DQMonFlags.doCaloMon = True

elif CONFIG=='DSPMon':
    DQMonFlags.doLArMon = True
    DQMonFlags.doCaloMon = False
    rec.doTile = False

    from LArROD.LArRODFlags import larRODFlags
    larRODFlags.doDSP.set_Value_and_Lock(True) # T
    larRODFlags.readRawChannels.set_Value_and_Lock(True) # T
    from LArConditionsCommon.LArCondFlags import larCondFlags
    larCondFlags.useShape.set_Value_and_Lock(True)
 
    # Turn Clusters reco OFF
    from CaloRec.CaloRecFlags import jobproperties
    jobproperties.CaloRecFlags.doCaloTopoCluster.set_Value_and_Lock(False)
    jobproperties.CaloRecFlags.doCaloCluster.set_Value_and_Lock(False)
    jobproperties.CaloRecFlags.doEmCluster.set_Value_and_Lock(False)
    jobproperties.CaloRecFlags.doCaloEMTopoCluster.set_Value_and_Lock(False)
    
else: # default = LArMon
    DQMonFlags.doLArMon = True
    DQMonFlags.doCaloMon = False
    rec.doTile = False

    # Turn Clusters reco OFF
    from CaloRec.CaloRecFlags import jobproperties
    jobproperties.CaloRecFlags.doCaloTopoCluster.set_Value_and_Lock(False)
    jobproperties.CaloRecFlags.doCaloCluster.set_Value_and_Lock(False)
    jobproperties.CaloRecFlags.doEmCluster.set_Value_and_Lock(False)
    jobproperties.CaloRecFlags.doCaloEMTopoCluster.set_Value_and_Lock(False)

# -- Dirty fixes - to be understood 
rec.doDetStatus = False    # Added by Henric as it crashes on access to GLOBAL_OFL 
rec.doHist = False         # Added by J.L. as it crashes on some TrigHist service
rec.doTrigger = False      # Added by J.L. If you turn trigger ON, it's crashing

# -- Some basic reco settings from Walter
include("RecExOnline/SimpleLarCondFlags.py")

# -- To read raw channels from DSP instead of building them from Digits
if CONFIG=='DSPMon': # should be TRUE all the time
    from LArROD.LArRODFlags import larRODFlags
    larRODFlags.readDigits.set_Value_and_Lock(True)
else: # can be switched to True or False
    from LArROD.LArRODFlags import larRODFlags
    # Changed by Fabien for a comics run 32 sample
    larRODFlags.readDigits.set_Value_and_Lock(True)
    
# -- Not sure why we still need this one
LArDigitKey = "FREE"

# -- Main jobOpt
include("RecExCommon/RecExCommon_topOptions.py")

# -- private parameters
if CONFIG=='DSPMon':
    from LArMonTools.LArMonToolsConf import LArRODMonTool
    LArRODMonTool.SkipKnownProblematicChannels = False
    LArRODMonTool.SkipNullPed = False
    
# -- printout
print "CHECK POINT PRINTING"
globalflags.print_JobProperties()
  
# -- Over-writes come at the end
MessageSvc = Service("MessageSvc")
MessageSvc.OutputLevel = INFO


5) Valgrind is a useful tool for checking memory usage and identifying the source of crashes and memory leaks in your software. Here is an example command to run valgrind, where TopJobOption.py is the top job option that you want to run. More information on valgrind can be found at (https://twiki.cern.ch/twiki/bin/view/Atlas/UsingValgrind).

valgrind  --leak-check=yes  --trace-children=yes --num-callers=8 --show-reachable=yes `which athena.py` --stdcmalloc TopJobOption.py >! valgrind_fix1.log 2>&1

Valgrind can also be useful for tracking down memory leaks or other crashes. First, one needs to include modify the requirements file in the cmt directory for the package you want to monitor.

private
macro cppdebugflags '$(cppdebugflags_s)'
macro_remove componentshr_linkopts "-Wl,-s"


Then you need to remake the package

gmake clean
gmake binclean
cmt config
source setup.sh
gmake


Now you can run you transform job with valgrind from your working directory.

valgrind --tool=memcheck --trace-children=yes --num-callers=8 --leak-check=yes --show-reachable=yes <your job option commands>

An example is shown below:

valgrind --tool=memcheck --trace-children=yes --num-callers=8 --leak-check=yes --show-reachable=yes /afs/cern.ch/atlas/software/builds/nightlies/15.5.X.Y/AtlasTier0/rel_2/InstallArea/share/bin/Reco_trf.py inputBSFile=/afs/cern.ch/user/g/gencomm/w0/problematicEvents/daq.extract.evt1673836run92048._0001.data,/afs/cern.ch/user/g/gencomm/w0/problematicEvent/daq.extract.evt1756246run91890._0001.data,/afs/cern.ch/user/g/gencomm/w0/problematicEvents/daq.extract.evt1756635run91890._0001.data,/afs/cern.ch/user/g/gencomm/w0/problematicEvents/daq.extract.evt3167858run91890._0001.data skipEvents=0 maxEvents=-1 trigStream=IDCosmic beamType=cosmics conditionsTag=COMCOND-ES1C-001-01 geometryVersion=ATLAS-GEO-03-00-00 outputESDFile=myESD.pool.root HIST=myMergedMonitoring.root preExec=rec.abortOnUncheckedStatusCode=False,,rec.doTrigger=False,,rec.doInDet=False,,rec.doMuon=False,,rec.doEgamma=False,,rec.doJetMissingETTag=False,,rec.doMuonCombined=False,,rec.doTau=False,,rec.doMonitoring=True,,rec.doPerfMon=True,,rec.doDetailedPerfMon=True,,DQMonFlags.doGlobalMon=False,,DQMonFlags. doTileMon=False,,DQMonFlags.doCaloMon=False,,DQMonFlags.doMuonCombinedMon=False preInclude=RecExCommon/RecoUsefulFlags.py,RecExCommission/MinimalCommissioningSetup.py,RecJobTransform/UseOracle.py,RecJobTransforms/debugConfig.py --ignoreunknown --athenaopts=--stdcmalloc


Note: to run Valgrind on the Reco_trf.py one requires a large amount of memory. You may need to make sure your computer has enough resources.

Personal tools