Variable registry

The variable registry is a convenient tool to store and get easily any plotting information of variables. The idea is to store the plotting information of each variable in a yaml file, and then get or modify this information using dedicated functions. Each variable is identified by a variable_key that is used to get the information from the registry.

Create the registry

To create a variable registry, you just need to input the list of variable keys you want to store information for. The function create_variable_registry() will automatically create a variable_registry.yaml with default information for each variable:

from plothist import create_variable_registry

variable_keys = ["variable_0", "variable_1", "variable_2"]

create_variable_registry(variable_keys)

For each variable in the list, the following information is stored by default in the variable_registry.yaml file:

variable_0:
    name: variable_0
    bins: auto
    range:
    - min
    - max
    label: variable_0
    log: false
    legend_location: best
    legend_ncols: 1
    docstring: ''


variable_1:
    ...

The structure being build automatically, it is then easy to modify the plotting information by hand inside the yaml file.

To add new variables to an already existing variable_registry.yaml file, you only need to add the new variable keys to the variable_keys list and call create_variable_registry() again. By default, the information on the variables already present in the registry is not overwritten. The hand-written modifications are kept, unless the reset parameter is set to True.

Getting the plotting information

To get the plotting information of a variable, you can use the get_variable_from_registry() function, which returns a dictionary with the plotting information:

from plothist import get_variable_from_registry

variable = get_variable_from_registry("variable_0")

print(variable)
# {'name': 'variable_0', 'bins': 50, 'range': [min, max], 'label': 'variable_0', 'log': False, 'legend_location': 'best', 'legend_ncols': 1, 'docstring': ''}

Update the registry

Multiple functions are available to modify the plotting information of the variables in the registry, add or remove some parameters.

Binning and ranges

The update_variable_registry_binning() function automatically updates the number of bins parameter in the yaml file to the length of [numpy.histogram_bin_edges](https://numpy.org/doc/2.1/reference/generated/numpy.histogram_bin_edges.html#numpy-histogram-bin-edges) minus one (the bins are regular) and automatically updates the range parameter in the yaml file to the min and max values of the variable in the dataset:

from plothist import update_variable_registry_binning

update_variable_registry_binning(df, variable_keys)

The number of bins and the range has been updated for all the variables in variables_keys. The yaml file is now:

variable_0:
    name: variable_0
    bins: 121 # = len(numpy.histogram_bin_edges(df["variable_0"], bins="auto")) - 1
    range:
    - -10.55227774892869    # min(df["variable_0"])
    - 10.04658448558009     # max(df["variable_0"])
    label: variable_0
    log: false
    legend_location: best
    legend_ncols: 1
    docstring: ''


variable_1:
    ...

Then, you may manually modify the yaml to get a more suitable binning and range to display in the plot.

Calling this function again on the same variable keys will not overwrite their bins or range parameter, unless the overwrite parameter is set to True.

Add or modify variable properties

You can also add new plotting properties or modify the existing ones by using the update_variable_registry() and a custom dictionary:

from plothist import update_variable_registry

new_properties = {
    "text": "default_text",
    "more_info": None,
    "new_property": False,
    "custom_list": [1, "a", True],
    "custom_value": 0,
}

update_variable_registry(new_properties, variable_keys)

This will add the new properties to the yaml file to all the variables in variable_keys:

variable_0:
    name: variable_0
    bins: 121
    range:
    - -10.55227774892869
    - 10.04658448558009
    label: variable_0
    log: false
    legend_location: best
    legend_ncols: 1
    docstring: ''
    text: default_text
    more_info: null         # None is converted to null in yaml
    new_property: false     # False is converted to false in yaml
    custom_list:            # The list is displayed on multiple lines
    - 1
    - a
    - true                  # True is converted to true in yaml
    custom_value: 0


variable_1:
    ...

The same get_variable_from_registry() function can be used to get the new properties.

To modify existing properties, you have to call update_variable_registry() with the new properties and the overwrite parameter set to True. It will overwrite the existing properties values with the new ones.

Remove parameters

To remove a parameter from the plotting information, you can use the remove_variable_registry_parameters() function:

from plothist import remove_variable_registry_parameters

remove_variable_registry_parameters(["bins", "range", "log", "legend_ncols", "new_property"], variable_keys)

The yaml file is updated:

variable_0:
    name: variable_0
    label: variable_0
    legend_location: best
    docstring: ''
    text: default_text
    more_info: null
    custom_list:
    - 1
    - a
    - true
    custom_value: 0


variable_1:
    ...

Simple example

Here is an example of how to create, update, and use the variable registry to plot histograms. A similar example can be found in Correlations with variable registry.

from plothist import (
    make_hist,
    plot_hist,
    create_variable_registry,
    update_variable_registry,
    update_variable_registry_binning,
    get_variable_from_registry,
    add_text,
)
import matplotlib.pyplot as plt

variable_keys = ["variable_0", "variable_1", "variable_2"]

# Create the registry
create_variable_registry(variable_keys)

# Update the number of bins and range
update_variable_registry_binning(df, variable_keys)

# Add custom info
update_variable_registry({"text": "my analysis"}, variable_keys)

for variable_key in variable_keys:
    # Get the variable information using the key. variable is a dictionary
    variable = get_variable_from_registry(variable_key)

    fig, ax = plt.subplots()

    # Make the histogram using the variable information from the registry
    h = make_hist(df[variable["name"]], bins=variable["bins"], range=variable["range"])
    plot_hist(h, ax=ax)

    # Get the label and range from the registry
    ax.set_xlabel(variable["label"])
    ax.set_xlim(variable["range"])
    ax.set_ylabel("Entries")

    # Get the custom text from the registry
    add_text(variable["text"], ax=ax)

    fig.savefig(f"{variable_key}.pdf", bbox_inches="tight")

Advanced example

It is sometimes useful to plot the same variable with different plotting parameters. A variable is identified by its variable_key using get_variable_from_registry(), and the name is the variable name in the dataset.

Example: to plot a zoom on a variable but still keep the original one, you can create a new variable key with the same name and different plotting parameters:

variable_0:
    name: variable_0
    bins: 121
    range:
    - -10
    - 10
    label: $Variable_{0}$
    log: false
    legend_location: best
    legend_ncols: 1
    docstring: ''

variable_0_zoom:
    name: variable_0
    bins: 121
    range:
    - -1
    - 1
    label: $Zoom of Variable_{0}$
    log: false
    legend_location: upper right
    legend_ncols: 1
    docstring: ''


variable_1:
    ...

and then just call the new variable key:

variable_keys = ["variable_0", "variable_0_zoom", "variable_1"]

for variable_key in variable_keys:
    variable = get_variable_from_registry(variable_key)
    ...