HighLAND
Implementing your own analysis package - a tutorial

WORK IN PROGRESS !!! For the moment have a look at the tutorial package highland2/tutorialAnalysis

Here we explain step by step the tutorialAnalysis package, which can be used to understand most elements of the framework.

This example contains two analyses algorithms, one, called tutorialAnalysis.

Each analysis algorithm inherits ultimately from the class AnalysisAlgorithm, which has many methods, somo of them mandatory.

Creating the analysis package

To create your own analysis the first thing you should do is to create a cmt package. After going into the folder where highland2 and psyche leave just type

cmt create highland2/tutorialAnalysis v0r0

This will create a folder tutorialAnalysis inside highland2 with a subfolder v0r0. Inside two folders will appear cmt and src. The file with instructions on how to compile and link your code is the requirements file, under the cmt folder. This file is basically empty. We should fill it with the appropriate instructions. You can have a look at the requirements file under tutorialAnalysis

package highland2/tutorialAnalysis
version v2r10
manager Anselmo Cervera <anselmo.cervera@cern.ch>
author nd280-software@mailman.t2k.org
branches cmt src doc app
#indicate the packages that are use by this one
use nueCCAnalysis * highland2
# Build methods to include.
document doxygen doxygen -group=documentation src/*.cpp src/*.h src/*.hxx src/*.cxx ../doc/*.dox
# Build information used by packages that use this one.
macro tutorialAnalysis_cppflags " -DTUTORIALANALYSIS_USED"
macro tutorialAnalysis_linkopts " -L$(TUTORIALANALYSISROOT)/$(tutorialAnalysis_tag) -ltutorialAnalysis "
macro tutorialAnalysis_stamps " $(TUTORIALANALYSISROOT)/$(tutorialAnalysis_tag)/tutorialAnalysis.stamp"
# The paths to find this library.
path_remove PATH "$(TUTORIALANALYSISROOT)"
path_prepend PATH "$(TUTORIALANALYSISROOT)/$(tutorialAnalysis_tag)"
path_remove LD_LIBRARY_PATH "$(TUTORIALANALYSISROOT)"
path_prepend LD_LIBRARY_PATH "$(TUTORIALANALYSISROOT)/$(tutorialAnalysis_tag)"
path_remove DYLD_LIBRARY_PATH "" \
Darwin "$(TUTORIALANALYSISROOT)"
path_prepend DYLD_LIBRARY_PATH "" \
Darwin "$(TUTORIALANALYSISROOT)/$(tutorialAnalysis_tag)"
# The library to build
library tutorialAnalysis *.cxx ../dict/*.cxx
# The executables to build (.exe files)
application RunTutorialAnalysis ../app/RunTutorialAnalysis*.cxx
application RunUseTutorialAnalysis ../app/RunUseTutorialAnalysis*.cxx
# tests
document doxygen doxygen -group=documentation ../scripts/* ../doc/*.dox

Once we have the proper requirements file to compile we go inside the cmt directory and do:

cmt config
make

Notice that cmt config has to be used only the first time, to create the Makefile file. At this point nothing will be compiled because there are no source code files under the src folder.

Start filling the algorithm

The first files we should put there are tutorialAnalysis.hxx and tutorialAnalysis.cxx, which should contain the declaration and definition of the class tutorialAnalysis and its methods. This class should inherit from AnalysisAlgorithm or from another analysis algorithm (inheriting ultimately from AnalysisAlgorithm

AnalysisAlgorithm has some pure virtual methods (=0) that must be implemented in the derived analysis.

virtual bool Initialize() = 0;
virtual void DefineProductions() = 0;
virtual void DefineInputConverters() = 0;
virtual void DefineSelections() = 0;
virtual void DefineCorrections() = 0;
virtual void DefineSystematics() = 0;
virtual void DefineConfigurations() = 0;
virtual void DefineMicroTrees(bool addBase=true) = 0;
virtual void DefineTruthTree() = 0;
virtual void FillMicroTrees(bool addBase=true) = 0;
virtual void FillToyVarsInMicroTrees(bool addBase=true) = 0;
virtual void FillTruthTree() = 0;

But don't worry, as we will see below, most of them are already implemented in the baseAnalysis class.

Other methods are optional and allow fintunning of the analysis or complicated cases

virtual void Finalize(){}
virtual bool InitializeSpill(){return true;}
virtual void FinalizeSpill(){}
virtual void InitializeBunch(){}
virtual void FinalizeBunch(){}
virtual void InitializeConfiguration(){}
virtual bool FinalizeConfiguration(){return true;}
virtual void InitializeToy(){}
virtual void FinalizeToy(){}
virtual void InitializeSelection(const SelectionBase&){}
virtual void FinalizeSelection(const SelectionBase&){}
virtual void FillCategories(){}
virtual void FillConfigTree(){}

From the list of

Including an event selection

baseAnalysis does not define any selection. Selections should be added to the selection manager in the method DefineSelections:

//********************************************************************
//********************************************************************
/* In this method the user will add to the SelectionManager (accessed by sel() ) the selections to be run,
defined in other files. In general an analysis has a single selection, which could have multiple branches.
Each of the branches is in fact a different selection (different cuts and actions), but defining them as branches
we ussualy gain in speed and simplicity since the steps that are common are applied only once.
Sometimes the user does not want to expend time merging two selections in a single one (as branches), but preffers to run
both independently. This is in general done for quick checks (are the selections mutually exclusive ?, etc).
This is possible in highland2. An example on how to do that is shown below.
In this case we add two selections to the SelectionManager provided:
- Name of the selection (kTrackerTutorial, kTrackerTutorialBranches)
- Title, the one dumped by the DrawingTools::DumpSelections() method. It is an explaination of the selection
- Pointer to the selection. The argument in the constructor (false) indicates that the
step sequence is not broken when a cut is not passed. In this way we can save intermediate information for events
not passing the entire selection
*/
// Add a simple selection with no branches
sel().AddSelection("kTrackerTutorial", "tutorial selection", new tutorialSelection(false));
// Add a more complicated selection with branches
sel().AddSelection("kTrackerTutorialBranches", "tutorial selection with branches", new tutorialBranchesSelection(false));
// Disable the ones not enabled in parameters file
if (!ND::params().GetParameterI("tutorialAnalysis.Selections.RunSelectionWithoutBranches"))
sel().DisableSelection("kTrackerTutorial");
if (!ND::params().GetParameterI("tutorialAnalysis.Selections.RunSelectionWithBranches"))
sel().DisableSelection("kTrackerTutorialBranches");
}

We can now compile and run the executable, provided input and output files:

../$TUTORIALANALYSISCONFIG/RunTutorialAnalysis.exe -o output.root input.root

The output file contains few root trees:

tree description
default nominal selection
truth all true signal events
header POT information
config configuration of the analysis

The default tree will not contain much information. Only the variables added automatically:

Add variables to the micro-tree

The next step is to add variables to the micro-tree such that we can understand the selection. This is done in the DefineMicroTrees method.

//********************************************************************
//********************************************************************
/* We call Micro-trees to the standard analysis trees appearing in the output file.
There is always a Micro-Tree call "default" which should contain the basic info to understand our selection.
The user can add extra Micro-Trees by adding configurations to the analysis (see DefineConfigurations method above).
Here we give an example of different variables that can be added. Have a look at OutputManager.hxx under highlandCore
to see all available methods.
*/
// -------- Add variables to the analysis tree ----------------------
// Variables from baseTrackerAnalysis (run, event, ..., tracker related stuff, ...)
if (addBase) baseTrackerAnalysis::DefineMicroTrees(addBase);
// --- Single variables -------
AddVarD( output(), selmu_costheta, "muon candidate reconstructed cos(theta) w.r.t. to neutrino direction"); // Double
AddVarF( output(), selmu_truemom, "muon candidate true momentum"); // Float
AddVarI( output(), selmu_detectors, "muon candidate detectors"); // Integer
AddVarC( output(), selmu_sense, "muon candidate sense"); // Char
AddVarF( output(), selmu_deltaPhi, "angle between muon candidate and the proton candidate (HMP track)"); // Float
AddVarF( output(), selmu_pT, "muon candidate reconstructed transverse momentum w.r.t to neutrino direction"); // Float
AddVarI( output(), nLongTPCTracks, "number of long TPC tracks"); // Integer
// --- Vector variables -------
// selmu_ntpcs is the counter, which is added automatically as integer variable
AddVarVI(output(), selmu_tpc_det, "muon candidate TPC number", selmu_ntpcs);
AddVarVI(output(), selmu_tpc_nnodes, "muon candidate #nodes in each TPC", selmu_ntpcs); // Vector of integers
AddVarVF(output(), selmu_tpc_mom, "muon candidate reconstructed momentum in each TPC", selmu_ntpcs); // Vector of floats
AddVarVD(output(), selmu_tpc_truemom, "muon candidate true momentum in each TPC", selmu_ntpcs); // Vector of doubles
// --- 3D and 4D vector variables (i.e. directions and positions) -------
AddVar4VF(output(), selmu_pos, "muon candidate reconstructed position"); // 4-Vector of floats
AddVar3VF(output(), selmu_dir, "muon candidate reconstructed direction"); // 3-Vector of floats
// --- Matrix variables -------
// AddVarMF(output(), selmu_tpc, "",selmu_necals,-30,4);
// --- 3D and 4D matrix variables (i.e. directions and positions for constituents) -------
AddVar4MF(output(), selmu_tpc_pos, "muon candidate true position in each tpc", selmu_ntpcs);
AddVar3MF(output(), selmu_tpc_dir, "muon candidate true direction in each tpc", selmu_ntpcs);
// Now we had toy-dependent variables
// There could be many variables that are toy-dependent. We don't need to save all of them as toy variables,
// but only the ones we are interested in plotting for different toys.
// TOY VARIABLES ARE VERY SPACE CONSUMING SO WE SHOULD MINIMISE ITS NUMBER !!!!
// --- single toy variables -------
AddToyVarF(output(), selmu_mom, "muon candidate reconstructed momentum");
// --- vector toy variables -------
AddToyVarVF(output(), selmu_tpc_dedx, "muon candidate dEdx (CT) in each TPC", NMAXTPCS);
}

Adding systematics

//********************************************************************
//********************************************************************
/* Systematics will modify the effective number of events passing the selection and the distribution of the observables.
After corrections have been applied it is possible to
"manipulate" again the information using Systematics. In this case the purpose is different: we want to test several values of
a given detector property and check the impact on the number of selected events. This allows propagating systematic errors numerically.
There are two kind of systematics:
- Variations: modify the input data (momentum, PID variables, etc). The selection is run on the modified data such that
the result of the selection can be different
- Weights: do not modify the input data. The selection is not affected. Simply a weight is added to the event.
Since events with different values of a given observable (i.e. momentum ) can have different weights,
distributions of that observable may change.
In the above example about the deposited energy (in DefineCorrections(), the correction introduced cannot be perfectly known.
The 4% and 6% mentioned have an error (i.e. 0.5%).
This error should be propagated as a systematic. A given number of toy experiments will be run with different values of the scaling parameter
for the deposited energy (i.e. for muons 0.93, 0.935, 0.95, ..., following
a gaussian distribution with mean 0.94 and sigma 0.005). If a cut on the deposited energy (or a variable using it)
is applied the number of selected events could differ depending on the scaling applied.
The RMS of the number of selected events for all toy experiments represents the systematic error induced by the deposited energy scaling.
*/
// Some systematics are defined in baseTrackerAnalysis (have a look at baseTrackerAnalysis/vXrY/src/baseTrackerAnalysis.cxx)
// ----- We can add here extra systematics ----------
// An example of variation systematic, which is added to the EventVariationManager
evar().AddEventVariation(kTutorialVariation, "TutorialVarSyst", new tutorialVariationSystematics());
// A example of weight systematic, which is added to the EventWeightManager
eweight().AddEventWeight( kTutorialWeight, "TutorialWeightSyst", new tutorialWeightSystematics());
}

Adding configurations

//********************************************************************
//********************************************************************
/*
The user can run several analyses in parallel minimising the acces to disk.
Those parallel analyses are call configurations. Although this might be extended in the future, currenly
configurations only allow you to specify which systematics errors will be propagated, the number of toy experiments
and the random seed.
A tree for each configuration is produced in the output file, with the same name as the configuration.
By default a single configuration (called "default") is run, producing
a single tree (with name default). This tree does not have any systematics enabled and hence it represents the nominal selection.
*/
// Some configurations are defined in baseTrackerAnalysis (have a look at baseTrackerAnalysis/vXrY/src/baseTrackerAnalysis.cxx)
Int_t ntoys = ND::params().GetParameterI("baseAnalysis.Systematics.NumberOfToys");
Int_t randomSeed = ND::params().GetParameterI("baseAnalysis.Systematics.RandomSeed");
// Add a new configuration called "tutorial_syst" with the following properties:
// - the number of toys defined in the baseAnalysis parameters file
// - the random seed for toy experiments defined in the baseAnalysis parameters file
// - The ToyMaker used to CreateTheToyExperiments (random throws for each systematic parameter).
// Have a look at baseToyMaker in baseAnalysis/vXrY/src
// - This configuration has two systematics kTutorialWeight and kTutorialVariation, defined above
AddConfiguration(conf(), tutorial_syst, ntoys, randomSeed, new baseToyMaker(randomSeed));
conf().EnableSystematic(kTutorialWeight, tutorial_syst); // Enable the kTutorialWeight in the tutorial_syst configuration
conf().EnableSystematic(kTutorialVariation, tutorial_syst); // Enable the kTutorialVariation in the tutorial_syst configuration
// Disable the configuration when requested
if (!ND::params().GetParameterI("tutorialAnalysis.Systematics.EnableTutorialSystematics"))
conf().DisableConfiguration(tutorial_syst);
}

Computing efficiencies: the truth tree

The main trees in the output file are the "default" and "truth" trees. The "default" tree contains all selected events (the ones that passed a given cut, ussualy specified in the parameters file). This is the tree used to optimise the selection cuts, to make selection plots and to compute or plot the purity of the selection. The main reason for having a separate tree, call "truth", is to allow the efficiency computation. The "default" tree does not contain all true signal events, because it will be too big. The preselection for that tree is only based in reconstructed quantities (selection cuts). Instead, to compute true signal efficiencies we need all true signal events to be present in the output tree. The "truth" tree fullfils this condition. Since, in general, this tree will contain many events, only a reduced set of variables (the ones needed to compute the efficiency) is stored in that tree.SUMMARY: The "truth" tree also appears in the output file. It contains all interactions in which we are interested in regardless on whether the selection was passed or not. This is the tree that should be used to compute signal efficiencies

//********************************************************************
//********************************************************************
//! \if INTERNAL
hello
//! \endif
//! [tutorialAnalysis_DefineTruthTree_info]
The main trees in the output file are the "default" and "truth" trees. The "default" tree contains all selected events (the ones that passed a given cut, ussualy
specified in the parameters file). This is the tree used to optimise the selection cuts, to make selection plots and to compute or plot the purity of the selection.
The main reason for having a separate tree, call "truth", is to allow the efficiency computation. The "default" tree does not contain all true signal events,
because it will be too big. The preselection for that tree is only based in reconstructed quantities (selection cuts). Instead, to compute true signal efficiencies
we need all true signal events to be present in the output tree. The "truth" tree fullfils this condition. Since, in general, this tree will contain many events, only a
reduced set of variables (the ones needed to compute the efficiency) is stored in that tree.
SUMMARY: The "truth" tree also appears in the output file. It contains all interactions in which we are interested in regardless on whether
the selection was passed or not. This is the tree that should be used to compute signal efficiencies
//! [tutorialAnalysis_DefineTruthTree_info]
// Variables from baseTrackerAnalysis (run, event, ...)
baseTrackerAnalysis::DefineTruthTree();
//--- muon variables -------
AddVarI( output(), truelepton_pdg, "true lepton PDG");
AddVar4VF(output(), truelepton_pos, "true lepton position");
// Moved to baseAnalysis !!!!!!!
// AddVarF( output(), truelepton_mom, "true lepton momentum");
// AddVarF( output(), truelepton_costheta, "true lepton cos(theta) w.r.t. neutrino direction");
// AddVar3VF(output(), truelepton_dir, "true lepton direction");
}

WORK IN PROGRESS !!! For the moment have a look at the tutorial package highland2/tutorialAnalysis