Plot result of a fit

This section shows how to plot the result of a fit with pyhf, RooFit or zfit.

pyhf

Here is an example of a fit with pyhf, whose result is then shown using plothist:

import pyhf
import numpy as np
from plothist import make_hist, plot_data_model_comparison

pyhf.set_backend("numpy")

# Define model
data_yield = [100, 50]
model = pyhf.simplemodels.uncorrelated_background(
    signal=[20, 0.0],
    bkg=[75.0, 50.0],
    bkg_uncertainty=[np.sqrt(75.0), np.sqrt(50.0)],
)
data = data_yield + model.config.auxdata
# Maximum likelihood fit
bestfit_pars = pyhf.infer.mle.fit(data, model)
# Extract fit results
model_yields = model.main_model.expected_data(bestfit_pars, return_by_sample=True)

bins = [0, 0.5, 1.0]
data_hist = make_hist(bins=bins)
signal_hist = make_hist(bins=bins)
background_hist = make_hist(bins=bins)

# Set explicitly the content and variance of each bin.
# For the signal and background histograms, we assume no variance.
data_hist[:] = np.c_[data_yield, data_yield]
background_hist[:] = np.c_[model_yields[0], [0.0, 0.0]]
signal_hist[:] = np.c_[model_yields[1], [0.0, 0.0]]

fig, ax_main, ax_comparison = plot_data_model_comparison(
    data_hist=data_hist,
    stacked_components=[background_hist, signal_hist],
    stacked_labels=["Background", "Signal"],
    model_uncertainty=False,
    xlabel="Variable",
    ylabel="Entries",
    comparison="pull",
)
ax_main.set_xticks([0, 0.5, 1.0])
ax_main.tick_params(axis="x", which="minor", bottom=False, top=False)
ax_comparison.set_xticks([0, 0.5, 1.0])
ax_comparison.tick_params(axis="x", which="minor", bottom=False, top=False)

fig.savefig("pyhf.svg", bbox_inches="tight")
pyhf example

In this trivial case, the pulls are null because the model has enough freedom to perfectly fit the two data points.

RooFit or zfit

Two steps are necessary when you want to use plothist to plot the result of a fit using RooFit or zfit.

Getting the PDF

First you need to convert your functions to scipy functions. This can be done using a small function.

With RooFit

This should be called after you have fitted your model and you have a RooAbsPdf object. The idea is to evaluate the RooFit PDF at a large number of points and then interpolate it with a scipy function. This function can then be saved with pickle, or used directly in the following step.

Warning

For a complex PDF that depends on multiple observables, be sure get the correct PDF projection before calling this function. If it doesn’t work, you can use the other method described in Getting RooFit PDFs from the canvas.

import numpy as np
from scipy.interpolate import interp1d
import pickle


def save_pdf(var, pdf, path="pdf.pkl", n_points=10000):
    """
    Save a RooFit PDF as a scipy.interpolate.interp1d function.

    Parameters
    ----------
    var : RooRealVar
       The variable to evaluate the PDF at.
    pdf : RooAbsPdf
       The PDF to save.
    path : str, optional
       The path to save the PDF to. Should end with `.pkl`. Default is "pdf.pkl".
    n_points : int, optional
       The number of points to evaluate the PDF at. Default is 10000.

    Returns
    -------
    pdf_func : scipy.interpolate.interp1d
       The PDF as a function.

    Notes
    -----
    The PDF is saved as a scipy.interpolate.interp1d function with pickle.
    """

    pdf_x = np.zeros(n_points)

    xlim = (var.getMin(), var.getMax())
    # Get a sample of x values
    x = np.linspace(*xlim, n_points)

    for i in range(len(x)):
        var.setVal(x[i])
        # Evaluate the PDF at the given x value
        pdf_x[i] = pdf.getVal(var)

    # Interpolate the PDF
    pdf_func = interp1d(x, pdf_x)

    with open(path, "wb") as f:
        print(f"Saving model to {f.name}")
        pickle.dump(pdf_func, f)

    return pdf_func

With zfit

This should be called after you have fitted your model and you have a zfit.pdf.BasePDF object. The idea is the same as for RooFit: evaluate the PDF at a large number of points and then interpolate it with a scipy function. This function can then be saved with pickle, or used directly in the following step.

from scipy.interpolate import interp1d
import pickle


def save_pdf(var, pdf, path="pdf.pkl", n_points=10000):
    """
    Save a PDF from zfit as a callable function.

    Parameters
    ----------
    var : zfit.Space
        The variable to evaluate the PDF at.
    pdf : zfit.pdf.BasePDF
        The PDF to save.
    path : str, optional
        The path to save the PDF to. Default is "pdf.pkl".
    n_points : int, optional
        The number of points to evaluate the PDF at. Default is 10000.

    Returns
    -------
    pdf_func : scipy.interpolate.interp1d

    Notes
    -----
    The PDF is saved as a scipy.interpolate.interp1d function with pickle.
    """

    lower, upper = var.limits
    x = np.linspace(lower[-1][0], upper[0][0], n_points)

    # Evaluate the PDF at the given points
    pdf_x = zfit.run(pdf.pdf(x, norm_range=var))

    # Interpolate the PDF
    pdf_func = interp1d(x, pdf_x)

    with open(path, "wb") as f:
        print(f"Saving model to {f.name}")
        pickle.dump(pdf_func, f)

    return pdf_func

Renormalize the PDF

A pdf_func you get from a scipy function or from the saved pickle file for RooFit or zfit has an area of 1. When you want to plot it, you need to multiply it by the bin width of your histogram, the number of expected events in the range for this PDF and divide by the integral of the PDF in the range. The small function below performs this renormalization:

from scipy.integrate import quad


def renormalize(pdf, x_range, n_bins, n_data):
    """
    Renormalize a PDF to its corresponding number of data events.

    Parameters
    ----------
    pdf : callable
       The PDF to renormalize.
    x_range : tuple
       The range of the PDF.
    n_bins : int
       The number of bins. Regular binning is assumed.
    n_data : int
       The number of predicted data events in the x_range associated to the pdf.

    Returns
    -------
    pdf : callable
       The renormalized PDF.
    """

    xmin, xmax = x_range
    bin_width = (xmax - xmin) / n_bins
    integral = quad(pdf, xmin, xmax)[0]
    # Note: If x_range is equal to the full range of the PDF, the integral is equal to 1.

    def renormalized_pdf(x):
        return pdf(x) * n_data * bin_width / integral

    return renormalized_pdf

Note

The renormalize function could also be done with a lambda x: function, but it should be use with precautions, see here.

Then you can use plot_model() or plot_data_model_comparison() (see Advanced example using asymmetry comparison) to plot the PDF and do all sort of comparisons with the plothist interface:

Advanced asymmetry comparison

Getting RooFit PDFs from the canvas

Some PDFs normalization are not easy to get from the RooFit PDF object. If the two steps above did not work, you can use the canvas to get the PDF. This solution has the advantage of being already normalized to the data sample. The disadvantage is that the resulting PDF is bin dependent, so when plotting your data, you need to use the same bins as the ones used to create the canvas.

To get the PDFs from the canvas, you first need plot the desired PDFs on a frame with plotOn(). Then, you need to save the canvas as a root file with canvas.SaveAs("root_file.root").

The main idea is that when you do a plotOn() on a frame, the function is saved as a TGraph object. You can then get the x and y values of the graph and interpolate it to get a function. The function is then saved in a list with the name of the function. The PDF order in the list is the same as the order you used to plot them on the frame:

import ROOT
from scipy.interpolate import interp1d


def get_pdf_list(root_file_name, canvas_name="canvas"):
    # Open the ROOT file
    root_file = ROOT.TFile(root_file_name, "READ")

    # Get the TCanvas from the file
    canvas = root_file.Get(canvas_name)

    pdf_list = []
    pdf_names = []

    ## If you have multiple pads, you need to specify which one you want to get the PDF from
    # pad = canvas.GetPrimitive("pad_name")
    ## Then loop over the primitives of the pad and not the canvas
    # for obj in pad.GetListOfPrimitives():

    for obj in canvas.GetListOfPrimitives():
        if isinstance(obj, ROOT.TGraph) and not isinstance(obj, ROOT.TGraphAsymmErrors):
            # Get the x and y values of the TGraph
            pdf_names.append(obj.GetName())
            x_values = obj.GetX()
            y_values = obj.GetY()

            # Interpolate the TGraph to get a function
            pdf_func = interp1d(x_values, y_values)

            pdf_list.append(pdf_func)

    print(f"\nPDFs from {root_file_name} saved in the list:")
    for k_name, pdf_name in enumerate(pdf_names):
        print(f"\t[{k_name}] {pdf_name}")
    print()

    return pdf_list