Variable registry
The variable registry is a convenient tool to store and get easily any plotting information of variables. The idea is to store the plotting information of each variable in a yaml file, and then get or modify this information using dedicated functions. Each variable is identified by a variable_key that is used to get the information from the registry.
Create the registry
To create a variable registry, you just need to input the list of variable keys you want to store information for. The function create_variable_registry() will automatically create a variable_registry.yaml with default information for each variable:
from plothist import create_variable_registry
variable_keys = ["variable_0", "variable_1", "variable_2"]
create_variable_registry(variable_keys)
For each variable in the list, the following information is stored by default in the variable_registry.yaml file:
variable_0:
name: variable_0
bins: auto
range:
- min
- max
label: variable_0
log: false
legend_location: best
legend_ncols: 1
docstring: ''
variable_1:
...
The structure being build automatically, it is then easy to modify the plotting information by hand inside the yaml file.
To add new variables to an already existing variable_registry.yaml file, you only need to add the new variable keys to the variable_keys list and call create_variable_registry() again. By default, the information on the variables already present in the registry is not overwritten. The hand-written modifications are kept, unless the reset parameter is set to True.
Getting the plotting information
To get the plotting information of a variable, you can use the get_variable_from_registry() function, which returns a dictionary with the plotting information:
from plothist import get_variable_from_registry
variable = get_variable_from_registry("variable_0")
print(variable)
# {'name': 'variable_0', 'bins': 50, 'range': [min, max], 'label': 'variable_0', 'log': False, 'legend_location': 'best', 'legend_ncols': 1, 'docstring': ''}
Update the registry
Multiple functions are available to modify the plotting information of the variables in the registry, add or remove some parameters.
Binning and ranges
The update_variable_registry_binning() function automatically updates the number of bins parameter in the yaml file to the length of [numpy.histogram_bin_edges](https://numpy.org/doc/2.1/reference/generated/numpy.histogram_bin_edges.html#numpy-histogram-bin-edges) minus one (the bins are regular) and automatically updates the range parameter in the yaml file to the min and max values of the variable in the dataset:
from plothist import update_variable_registry_binning
update_variable_registry_binning(df, variable_keys)
The number of bins and the range has been updated for all the variables in variables_keys. The yaml file is now:
variable_0:
name: variable_0
bins: 121 # = len(numpy.histogram_bin_edges(df["variable_0"], bins="auto")) - 1
range:
- -10.55227774892869 # min(df["variable_0"])
- 10.04658448558009 # max(df["variable_0"])
label: variable_0
log: false
legend_location: best
legend_ncols: 1
docstring: ''
variable_1:
...
Then, you may manually modify the yaml to get a more suitable binning and range to display in the plot.
Calling this function again on the same variable keys will not overwrite their bins or range parameter, unless the overwrite parameter is set to True.
Add or modify variable properties
You can also add new plotting properties or modify the existing ones by using the update_variable_registry() and a custom dictionary:
from plothist import update_variable_registry
new_properties = {
"text": "default_text",
"more_info": None,
"new_property": False,
"custom_list": [1, "a", True],
"custom_value": 0,
}
update_variable_registry(new_properties, variable_keys)
This will add the new properties to the yaml file to all the variables in variable_keys:
variable_0:
name: variable_0
bins: 121
range:
- -10.55227774892869
- 10.04658448558009
label: variable_0
log: false
legend_location: best
legend_ncols: 1
docstring: ''
text: default_text
more_info: null # None is converted to null in yaml
new_property: false # False is converted to false in yaml
custom_list: # The list is displayed on multiple lines
- 1
- a
- true # True is converted to true in yaml
custom_value: 0
variable_1:
...
The same get_variable_from_registry() function can be used to get the new properties.
To modify existing properties, you have to call update_variable_registry() with the new properties and the overwrite parameter set to True. It will overwrite the existing properties values with the new ones.
Remove parameters
To remove a parameter from the plotting information, you can use the remove_variable_registry_parameters() function:
from plothist import remove_variable_registry_parameters
remove_variable_registry_parameters(["bins", "range", "log", "legend_ncols", "new_property"], variable_keys)
The yaml file is updated:
variable_0:
name: variable_0
label: variable_0
legend_location: best
docstring: ''
text: default_text
more_info: null
custom_list:
- 1
- a
- true
custom_value: 0
variable_1:
...
Simple example
Here is an example of how to create, update, and use the variable registry to plot histograms. A similar example can be found in Correlations with variable registry.
from plothist import (
make_hist,
plot_hist,
create_variable_registry,
update_variable_registry,
update_variable_registry_binning,
get_variable_from_registry,
add_text,
)
import matplotlib.pyplot as plt
variable_keys = ["variable_0", "variable_1", "variable_2"]
# Create the registry
create_variable_registry(variable_keys)
# Update the number of bins and range
update_variable_registry_binning(df, variable_keys)
# Add custom info
update_variable_registry({"text": "my analysis"}, variable_keys)
for variable_key in variable_keys:
# Get the variable information using the key. variable is a dictionary
variable = get_variable_from_registry(variable_key)
fig, ax = plt.subplots()
# Make the histogram using the variable information from the registry
h = make_hist(df[variable["name"]], bins=variable["bins"], range=variable["range"])
plot_hist(h, ax=ax)
# Get the label and range from the registry
ax.set_xlabel(variable["label"])
ax.set_xlim(variable["range"])
ax.set_ylabel("Entries")
# Get the custom text from the registry
add_text(variable["text"], ax=ax)
fig.savefig(f"{variable_key}.pdf", bbox_inches="tight")
Advanced example
It is sometimes useful to plot the same variable with different plotting parameters. A variable is identified by its variable_key using get_variable_from_registry(), and the name is the variable name in the dataset.
Example: to plot a zoom on a variable but still keep the original one, you can create a new variable key with the same name and different plotting parameters:
variable_0:
name: variable_0
bins: 121
range:
- -10
- 10
label: $Variable_{0}$
log: false
legend_location: best
legend_ncols: 1
docstring: ''
variable_0_zoom:
name: variable_0
bins: 121
range:
- -1
- 1
label: $Zoom of Variable_{0}$
log: false
legend_location: upper right
legend_ncols: 1
docstring: ''
variable_1:
...
and then just call the new variable key:
variable_keys = ["variable_0", "variable_0_zoom", "variable_1"]
for variable_key in variable_keys:
variable = get_variable_from_registry(variable_key)
...