Mode Configuration#

Version update (v0.8.0)

In version 0.8.0 all configuration files are uniformly made to be TOML files. The contents of these TOML files are validated using Pydantic.

Version Update (v0.6.0)

Since version 0.6.0 you do not have to manually register your mode, but the mode is automatically registered dependent on where you defined it. For example: A mode defined under vibe.analysis_validation_modes.physics is automatically associated with the DPGroup.physics.

To get your mode to run you have to define a config with the information about the datasets this mode should run on. This is handled via a toml file with the same name as the python file in which you defined your mode. The file has to be placed in the same directory as the python file containing your mode. For example: You created the mode MyExampleMode under vibe.analysis_validation_modes.physics.my_example_mode.py, then the path to your config is vibe.analysis_validation_modes.physics.my_example_mode.toml.

The config contains several blocks of information:

1. Information about the datasets to run on: In this code block all datasets the mode will run on have to be defined with unique identifiers. The code block is created by the keyword [dataset_dict] and each identifier within this block must start with [dataset_dict.your_identifier]. Each dataset has an lpn which is the only required parameter to be set and can either be a path on the grid or a local path. In the latter case, one has to add the additional argument offline = true. Furthermore, for each dataset it is possible to define individual kwargs that are used in the processing of the datasets. An example dataset_dict could look like this:

[dataset_dict]
    [dataset_dict.release-08-00-00]
    lpn = "/path/to/lpn/on/grid/mdst"

    [dataset_dict.release-08-01-00]
    lpn = "/path/to/another/lpn/on/grid/mdst"
    kwargs = {"test": 1}

    [dataset_dict.prerelease-09-00-00c]
    lpn = "/path/to/local/files/on/kekcc/mdst"
    offline = true

Tip

Setting local datasets that are run offline is especially useful for testing either the mode works like intended or not!

Note

As offline=false is the default, this does not have to be explicitly specified for datasets that are run on the grid with gbasf2.

When running on offline on kekcc or any LSF compatible server it is possible to exploit the lsf queue. By setting the additional parameter batch = true for a dataset, each input file contained in the lpn will spawn a lsf job. This is especially useful if one wants to run offline over a large number of input files. A example setting would look like this:

[dataset_dict]
    [dataset_dict.prerelease-09-00-00c]
    lpn = "/path/to/local/files/on/kekcc/mdst/*"
    batch = true

With this settings for each file in /path/to/local/files/on/kekcc/mdst/ a lsf job is spawned.

Note

The batch flag should be considered a sub-flag of offline. Meaning when offline=true we are telling VIBE to run locally on a single progress running jobs sequentially. When batch=true VIBE will run locally and will spawn multiple processes via the batch system, essentially parallelizing VIBEs workflow.

Warning

You have to make sure that the specified folder does only contain the input files you want to run on and no additional files.

1. Additional gbasf2 parameters (optional): It is possible to set the gbasf2 CPU time or give additional gbasf2 parameters in the [gbasf2_settings] block. For example:

[gbasf2_settings]
    gbasf2_cpu_time = 1000
    gbasf2_additional_params = "--banned_site=LCG.Torino.it,LCG.ULAKBIM.tr"

Note

Note that these settings are applied to all datasets run on with gbasf2.

Mode Configuration Validator Models#

pydantic model ModeConfigModel[source]#

Bases: BaseModel

This Pydantic model aims to validate all configuration TOMLs of VIBE modes. This model is an entry point in which all datasets will have their structure validated.

Specifically, we must validate that each value of the dataset_dict (or skim_dataset_dict) is validated by the DatasetDictConfigModel. Note we forbid any unknown fields in this model, meaning a ValidationError will be raised if an unknown field is encountered.

Note

This is slightly misleading for skim modes. When running a skim production mode in VIBE the dataset_dict should not be set as VIBE will manage this in the TOML file during runtime. Instead, the skim_dataset_dict

We utilise the model_validator decorator to perform a before AND after check on the input data.

Before Model Validator

This function is ran BEFORE the input data from the TOML is parsed into the Pydantic Model validation. In the before model validator, we check that if the skim_dataset_dict is parsed for skim_production modes. If parsed, we need to set the dataset_dict to a default value of {} as this Pydantic Model requires a dataset_dict field to be entered for its validation.

After Model Validator

This function is ran AFTER the input data has been validated by the Pydantic model and is not instantiated as a Pydantic Model class. This after validator is checking that the values of the dataset_dict (or skim_dataset_dict) parse the DatasetDictConfigModel Pydantic model.

Show JSON schema
{
   "title": "ModeConfigModel",
   "description": "This Pydantic model aims to validate all configuration TOMLs of VIBE modes.\nThis model is an entry point in which all datasets will have their\nstructure validated.\n\nSpecifically, we must validate that each value of the `dataset_dict` (or `skim_dataset_dict`)\nis validated by the `DatasetDictConfigModel`. Note we forbid any unknown fields in this model,\nmeaning a ValidationError will be raised if an unknown field is encountered.\n\n.. note::\n    This is slightly misleading for skim modes. When running a skim production mode in VIBE\n    the `dataset_dict` should not be set as VIBE will manage this in the TOML file during runtime.\n    Instead, the `skim_dataset_dict`\n\nWe utilise the model_validator decorator to perform a before AND after check on the input data.\n\n**Before Model Validator**\n\n.. code-block python\n\n    @model_validator(mode=\"before\")\n    @classmethod\n    def check_if_skim_dataset_was_parsed(cls, data: Any) -> Any:\n        if isinstance(data, dict):\n            if \"skim_dataset_dict\" in data.keys():\n                data['dataset_dict'] = {}\n        return data\n\nThis function is ran BEFORE the input data from the TOML is parsed into the Pydantic Model validation. In the before model validator, we check that if the `skim_dataset_dict` is parsed for skim_production modes.\nIf parsed, we need to set the `dataset_dict` to a default value of {} as this Pydantic Model requires a\n`dataset_dict` field to be entered for its validation.\n\n**After Model Validator**\n\n.. code-block python\n\n    @model_validator(mode=\"after\")\n    def check_dataset_dict_formatted_correctly(self) -> Self:\n        for dataset_name, dataset_config in self.dataset_dict.items():\n            DatasetDictConfigModel(dataset_name=dataset_name, **dataset_config)\n\n        for skim_dataset_name, skim_dataset_config in self.skim_dataset_dict.items():\n            DatasetDictConfigModel(dataset_name=skim_dataset_name, **skim_dataset_config)\n\n        return self\n\nThis function is ran AFTER the input data has been validated by the Pydantic model and is not instantiated\nas a Pydantic Model class. This after validator is checking that the values of the `dataset_dict` (or `skim_dataset_dict`)\nparse the `DatasetDictConfigModel` Pydantic model.",
   "type": "object",
   "properties": {
      "dataset_dict": {
         "additionalProperties": {
            "additionalProperties": true,
            "type": "object"
         },
         "title": "Dataset Dict",
         "type": "object"
      },
      "skim_dataset_dict": {
         "additionalProperties": {
            "additionalProperties": true,
            "type": "object"
         },
         "default": {},
         "title": "Skim Dataset Dict",
         "type": "object"
      },
      "gbasf2_settings": {
         "additionalProperties": true,
         "default": {},
         "title": "Gbasf2 Settings",
         "type": "object"
      },
      "additional_mode_settings": {
         "additionalProperties": true,
         "default": {},
         "title": "Additional Mode Settings",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "dataset_dict"
   ]
}

Config:
  • extra: str = forbid

Fields:
Validators:
field additional_mode_settings: dict[str, Any] = {}#
Validated by:
field dataset_dict: dict[str, dict] [Required]#
Validated by:
field gbasf2_settings: dict[str, Any] = {}#
Validated by:
field skim_dataset_dict: dict[str, dict] = {}#
Validated by:
validator check_dataset_dict_formatted_correctly  »  all fields[source]#
validator check_if_skim_dataset_was_parsed  »  all fields[source]#
pydantic model DatasetDictConfigModel[source]#

Bases: BaseModel

This Pydantic model validates the configuration of a given input dataset.

A model_validator (after) is used to check that if batch=True, then offline must also be True. If not, an assertion error is raised.

Show JSON schema
{
   "title": "DatasetDictConfigModel",
   "description": "This Pydantic model validates the configuration of a given input dataset.\n\nA `model_validator` (after) is used to check that if `batch=True`, then `offline` must also be `True`.\nIf not, an assertion error is raised.",
   "type": "object",
   "properties": {
      "dataset_name": {
         "description": "(Internal use only) Name of the dataset. This field is excluded from parsing.",
         "title": "Dataset Name",
         "type": "string"
      },
      "lpn": {
         "description": "The path to the dataset. This can be a local path, glob pattern, or gbasf2 logical path name (LPN).",
         "pattern": "^\\/(?:[^\\/]+\\/)*[^\\/]+$",
         "title": "Lpn",
         "type": "string"
      },
      "globaltags": {
         "default": [],
         "description": "List of global tags to be set at the start of the basf2 path. Default is an empty list.",
         "items": {
            "type": "string"
         },
         "title": "Globaltags",
         "type": "array"
      },
      "gbasf2_submission_campaign": {
         "default": "",
         "description": "Submission campaign suffix attached to the end of the gbasf2 project name.",
         "title": "Gbasf2 Submission Campaign",
         "type": "string"
      },
      "kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments passed to the `create_basf2_path` method of ValidationModeBaseClass.",
         "title": "Kwargs",
         "type": "object"
      },
      "offline": {
         "default": false,
         "description": "Flag indicating that the dataset must be run offline (i.e., not submitted to the Belle II grid).",
         "title": "Offline",
         "type": "boolean"
      },
      "batch": {
         "default": false,
         "description": "Flag to tell VIBE to submit the dataset to the LSF batch system. Requires `offline=True`.",
         "title": "Batch",
         "type": "boolean"
      }
   },
   "required": [
      "dataset_name",
      "lpn"
   ]
}

Fields:
Validators:
field batch: bool = False#

Flag to tell VIBE to submit the dataset to the LSF batch system. Requires offline=True.

Validated by:
field dataset_name: str [Required]#

(Internal use only) Name of the dataset. This field is excluded from parsing.

Validated by:
field gbasf2_submission_campaign: str = ''#

Submission campaign suffix attached to the end of the gbasf2 project name.

Validated by:
field globaltags: list[str] = []#

List of global tags to be set at the start of the basf2 path. Default is an empty list.

Validated by:
field kwargs: dict[str, Any] = {}#

Additional keyword arguments passed to the create_basf2_path method of ValidationModeBaseClass.

Validated by:
field lpn: str [Required]#

The path to the dataset. This can be a local path, glob pattern, or gbasf2 logical path name (LPN).

Constraints:
  • pattern = ^/(?:[^/]+/)*[^/]+$

Validated by:
field offline: bool = False#

Flag indicating that the dataset must be run offline (i.e., not submitted to the Belle II grid).

Validated by:
validator check_for_batch_true  »  all fields[source]#
validator declare_lpn_type  »  all fields[source]#
This function will dynamically set what type of LPN we are working with that being
  • Offline single rootfile

  • Offline globbed directory

  • Single datablock grid LPN

  • Grid collection