Creating a Validation Mode in VIBE#
There are a few things to note about any new mode that is added to VIBE:
Indexing by group
When adding a new mode to VIBE you must place it in the correct directory. Firstly all validation modes must be added to the analysis_validation_modes directory. Inside ‘analysis_validation_modes’ is a number of subdirectories for different groups like so (dated Dec 02, 2024):
$ ls vibe/analysis_validation_modes
example physics tracking
Most validation modes will reside in physics, however if you have a specific group that is distinct from the ones already listed, open an issue in VIBE detailing what working group you need added and the one of the maintainers will assist with adding that group.
Naming the python file
The naming convention for the python file is straight forward and can be altered later if needed. All modes should have their names starting with the name of the python class inside the .py file followed by any additional idenfiying information. For this example I will choose to place my file in the example group subdirectory and name is like so:
$ touch vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
And just like that we have created the python file that will be our new mode!
Validation Mode Template#
A ‘mode’ in VIBE is defined within a python class that inherits from ValidationModeBaseClass. It must contain two methods, if not defined VIBE will not run. Below is an example mode containing the two required methods:
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
import basf2
from typing import List
from vibe.core.utils.misc import fancy_validation_mode_header
from vibe.core.validation_mode import ValidationModeBaseClass
from vibe.core.helper.histogram_tools import HistVariable, Histogram, HistComponent
__all__ = [
"DocsExampleMode"
]
@fancy_validation_mode_header
class DocsExampleMode(ValidationModeBaseClass):
name = 'docExampleMode'
def create_basf2_path(self) -> basf2.Path:
path = basf2.Path()
...
return path
@property
def analysis_validation_histograms(self) -> List[Histogram]:
...
return [hist1, hist2,...]
The name must be unique from any other names defined in any other modes. This unqiue name is used to identify the mode that is currently running throughout VIBE.
The create_basf2_path method is where you will add all your basf2 specific code and build your path which is returned by the method
The analysis_validation_histograms property allows a mode to create custom histograms by creating a list of Histogram objects defined specifically for your mode.
Note that the class is decorated by the fancy_validation_mode_header, this is for documentaion purposes and is necessary for every new mode.
The basf2 path#
create_basf2_path
This function contains the basf2 steering patch which write out one n-tuple file. This function must return a basf2 path!
Warning
Please do not use the common variablesToNtuple function provided in the modularAnalysis package. Instead use the custom variables_to_validation_ntuple() wrapper function provided by the framework like so:
def create_basf2_path(self) -> basf2.Path:
path = basf2.Path()
...
self.variables_to_validation_ntuple(
decay_str="docExample",
variables = ['list', 'of', 'variables'],
path = path
)
return path
Warning
b2luigi pickles the basf2 path before sending the project to the grid, so you have to make sure that the provided basf2 path is picklable. Nevertheless aliases are supported. The pickleability of the basf2 path is checked by the unit tests.
Let us expand our DocExampleMode template to include a working basf2 path:
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
import basf2
import modularAnalysis as ma
from typing import List
from vibe.core.validation_mode import ValidationModeBaseClass
from vibe.core.helper.histogram_tools import HistVariable, Histogram, HistComponent
__all__ = [
"DocsExampleMode"
]
class DocsExampleMode(ValidationModeBaseClass):
name = 'docExampleMode'
def create_basf2_path(self) -> basf2.Path:
path = basf2.Path()
ma.fillParticleList(...)
ma.reconstructDecay(
decayString="B0:sl_rec -> pi-:rec mu+:rec",
cut = "",
path=path
)
self.variables_to_validation_ntuple(
decay_str="B0:sl_rec",
variables = ["weMbc(m1,0)", "isSignal",...],
path = path
)
return path
@property
def analysis_validation_histograms(self) -> List[Histogram]:
...
return [hist1, hist2,...]
After defining our path our path manipulation can be as complex as necessary so long as the resultant path can be pickled! Lastly we write out our ntuples using self.variables_to_valudation_ntuple as discussed before. And with that our basf2 path is complete! Note we don’t have to worry about loading input mdst or defining output paths. VIBE handles all that in the background for you.
The histogram list#
For plotting, a list of Histogram objects has to be provided in your mode class. For example:
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
#--
from vibe.core.helper.histogram_tools import HistVariable, Histogram, HistComponent
#---
@property
def analysis_validation_histograms(self) -> List[Histogram]:
return [
Histogram(
name='Mbc',
title=r"$B \rightarrow \pi \ell \nu$",
hist_variable = HistVariable(
df_label=makeROOTCompatible(variable='weMbc(m1,m0)'),
label = r"$M_{bc}$ (cleaned ROE)",
unit = r"GeV/$c^2",
bins = 50,
scope = (5.24, 5.29)
),
hist_components = [
HistComponent(
label='All'
)
],
)
]
The output of this plot is shown in page 11 of docs/validation_framework_tutorial.pdf
Which is super neat! All that information was rendered nicely to a histogram as intended. We can go one step further though, if say we wish to have multiple components overlayed on a single histogram this is also possible in VIBE. We can add to our hist_components variable of Histogram.
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
#--
from vibe.core.helper.histogram_tools import HistVariable, Histogram, HistComponent
#---
@property
def analysis_validation_histograms(self) -> List[Histogram]:
return [
Histogram(
name='Mbc',
title=r"$B \rightarrow \pi \ell \nu$",
hist_variable = HistVariable(
df_label=makeROOTCompatible(variable='weMbc(m1,m0)'),
label = r"$M_{bc}$ (cleaned ROE)",
unit = r"GeV/$c^2",
bins = 50,
scope = (5.24, 5.29)
),
hist_components = [
HistComponent(
label='All'
)
],
),
Histogram(
name='Mbc matched',
title=r"$B \rightarrow \pi \ell \nu$",
hist_variable = HistVariable(
df_label=makeROOTCompatible(variable='weMbc(m1,m0)'),
label = r"$M_{bc}$ (cleaned ROE)",
unit = r"GeV/$c^2",
bins = 50,
scope = (5.24, 5.29)
),
hist_components = [
HistComponent(
label='Signal',
additional_cut_str='isSignalAcceptMissingNeutrino == 1',
color = 'purple'
),
HistComponent(
label='Background',
additional_cut_str='isSignalAcceptMissingNeutrino == 0',
color = 'cyan'
),
],
)
]
Again looking to the validation_framework_tutorial.pdf we can see how our components were rendered to the histogram.
Efficiency calculation - Optional#
If a release by release efficiency should be calculated for the mode, the get_number_of_signal_for_efficiency method should be implemented into your mode class. This method returns the number of signal events using panda’s dataframes. The framework automatically calculates the efficiency using the correct number of produced events
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
#--
import pandas as pd
#---
def get_number_of_signal_for_efficiency(self, df : pd.DataFrame) -> float:
return df['isSignalAcceptMissingNeutrino'].sum()
Offline Cuts - Optional#
If you want to manipulate the panda’s dataframe offline (e.g apply cuts, add columns etc) you can do this via implementing the offline_df_manipulation method:
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
#--
import pandas as pd
#---
def offline_df_manipulation(self, df : pd.DataFrame) -> pd.DataFrame:
df = df.sample(frac=1.0).groupby(by=["__event__"]).head(1) # applying randomg BCS offline
return df
Warning
Whatever is applied here is applied to any plot or efficiency calculation.
In the workflow, the rootfiles in the merged output directory are opened and the data manipulation is processed. The standard output of this data manipulation is a parquet file (https://parquet.apache.org/) saved in the offline_processing directory. This parquet file is used for the plotting section of VIBE. If you require that the manipulated data also be saved as another valid output type for other validation purposes outside of VIBE, this can be achieved by using the @add_output_formats decorator like so:
# vibe/analysis_validation_modes/example/docExampleMode_validation_mode.py
import basf2
import modularAnalysis as ma
from typing import List
from vibe.core.validation_mode import ValidationModeBaseClass, add_output_formats
from vibe.core.helper.histogram_tools import HistVariable, Histogram, HistComponent
__all__ = [
"DocsExampleMode"
]
class DocsExampleMode(ValidationModeBaseClass):
name = 'docExampleMode'
# -- other methods of our VIBE mode
@add_output_formats('root')
def offline_df_manipulation(self, df : pd.DataFrame) -> pd.DataFrame:
df = df.sample(frac=1.0).groupby(by=["__event__"]).head(1) # applying randomg BCS offline
return df
In this example VIBE will also run a task (in parallel to the parquet task) that creates a root file with the changes made in offline_df_manipulation.
Note
There is no need to specify the parquet file in the decorator as this is by default always created.
Available Output Formats
parquet
root
If there is another format you wish to have your output data in, open an issue and we can add this to VIBE and can be toggled using the decorator.
And there we have it, our basic validation mode in VIBE along with all the bells and whistles you can add to it.