.. _plot-fit-result-label:

====================
Plot result of a fit
====================

This section shows how to plot the result of a fit with ``pyhf``, ``RooFit`` or ``zfit``.

pyhf
****

Here is an example of a fit with ``pyhf``, whose result is then shown using ``plothist``:

.. code-block:: python

   import pyhf
   import numpy as np
   from plothist import make_hist, plot_data_model_comparison

   pyhf.set_backend("numpy")

   # Define model
   data_yield = [100, 50]
   model = pyhf.simplemodels.uncorrelated_background(
       signal=[20, 0.0],
       bkg=[75.0, 50.0],
       bkg_uncertainty=[np.sqrt(75.0), np.sqrt(50.0)],
   )
   data = data_yield + model.config.auxdata
   # Maximum likelihood fit
   bestfit_pars = pyhf.infer.mle.fit(data, model)
   # Extract fit results
   model_yields = model.main_model.expected_data(bestfit_pars, return_by_sample=True)

   bins = [0, 0.5, 1.0]
   data_hist = make_hist(bins=bins)
   signal_hist = make_hist(bins=bins)
   background_hist = make_hist(bins=bins)

   # Set explicitly the content and variance of each bin.
   # For the signal and background histograms, we assume no variance.
   data_hist[:] = np.c_[data_yield, data_yield]
   background_hist[:] = np.c_[model_yields[0], [0.0, 0.0]]
   signal_hist[:] = np.c_[model_yields[1], [0.0, 0.0]]

   fig, ax_main, ax_comparison = plot_data_model_comparison(
       data_hist=data_hist,
       stacked_components=[background_hist, signal_hist],
       stacked_labels=["Background", "Signal"],
       model_uncertainty=False,
       xlabel="Variable",
       ylabel="Entries",
       comparison="pull",
   )
   ax_main.set_xticks([0, 0.5, 1.0])
   ax_main.tick_params(axis="x", which="minor", bottom=False, top=False)
   ax_comparison.set_xticks([0, 0.5, 1.0])
   ax_comparison.tick_params(axis="x", which="minor", bottom=False, top=False)

   fig.savefig("pyhf.svg", bbox_inches="tight")

.. image:: ../img/pyhf_example.svg
   :alt: pyhf example
   :width: 500

In this trivial case, the pulls are null because the model has enough freedom to perfectly fit the two data points.

RooFit or zfit
**************

Two steps are necessary when you want to use ``plothist`` to plot the result of a fit using ``RooFit`` or ``zfit``.

Getting the PDF
===============

First you need to convert your functions to scipy functions. This can be done using a small function.

With RooFit
-----------

This should be called after you have fitted your model and you have a ``RooAbsPdf`` object. The idea is to evaluate the ``RooFit`` PDF at a large number of points and then interpolate it with a scipy function. This function can then be saved with pickle, or used directly in the following step.


.. warning::

   For a complex PDF that depends on multiple observables, be sure get the correct PDF projection before calling this function. If it doesn't work, you can use the other method described in :ref:`plot-roofit-canvas-solution-label`.


.. code-block:: python

   import numpy as np
   from scipy.interpolate import interp1d
   import pickle


   def save_pdf(var, pdf, path="pdf.pkl", n_points=10000):
       """
       Save a RooFit PDF as a scipy.interpolate.interp1d function.

       Parameters
       ----------
       var : RooRealVar
          The variable to evaluate the PDF at.
       pdf : RooAbsPdf
          The PDF to save.
       path : str, optional
          The path to save the PDF to. Should end with `.pkl`. Default is "pdf.pkl".
       n_points : int, optional
          The number of points to evaluate the PDF at. Default is 10000.

       Returns
       -------
       pdf_func : scipy.interpolate.interp1d
          The PDF as a function.

       Notes
       -----
       The PDF is saved as a scipy.interpolate.interp1d function with pickle.
       """

       pdf_x = np.zeros(n_points)

       xlim = (var.getMin(), var.getMax())
       # Get a sample of x values
       x = np.linspace(*xlim, n_points)

       for i in range(len(x)):
           var.setVal(x[i])
           # Evaluate the PDF at the given x value
           pdf_x[i] = pdf.getVal(var)

       # Interpolate the PDF
       pdf_func = interp1d(x, pdf_x)

       with open(path, "wb") as f:
           print(f"Saving model to {f.name}")
           pickle.dump(pdf_func, f)

       return pdf_func

With zfit
---------

This should be called after you have fitted your model and you have a ``zfit.pdf.BasePDF`` object. The idea is the same as for ``RooFit``: evaluate the PDF at a large number of points and then interpolate it with a scipy function. This function can then be saved with pickle, or used directly in the following step.

.. code-block:: python

    from scipy.interpolate import interp1d
    import pickle


    def save_pdf(var, pdf, path="pdf.pkl", n_points=10000):
        """
        Save a PDF from zfit as a callable function.

        Parameters
        ----------
        var : zfit.Space
            The variable to evaluate the PDF at.
        pdf : zfit.pdf.BasePDF
            The PDF to save.
        path : str, optional
            The path to save the PDF to. Default is "pdf.pkl".
        n_points : int, optional
            The number of points to evaluate the PDF at. Default is 10000.

        Returns
        -------
        pdf_func : scipy.interpolate.interp1d

        Notes
        -----
        The PDF is saved as a scipy.interpolate.interp1d function with pickle.
        """

        lower, upper = var.limits
        x = np.linspace(lower[-1][0], upper[0][0], n_points)

        # Evaluate the PDF at the given points
        pdf_x = zfit.run(pdf.pdf(x, norm_range=var))

        # Interpolate the PDF
        pdf_func = interp1d(x, pdf_x)

        with open(path, "wb") as f:
            print(f"Saving model to {f.name}")
            pickle.dump(pdf_func, f)

        return pdf_func


Renormalize the PDF
===================

A ``pdf_func`` you get from a scipy function or from the saved pickle file for ``RooFit`` or ``zfit`` has an area of 1. When you want to plot it, you need to multiply it by the bin width of your histogram, the number of expected events in the range for this PDF and divide by the integral of the PDF in the range. The small function below performs this renormalization:

.. code-block:: python

   from scipy.integrate import quad


   def renormalize(pdf, x_range, n_bins, n_data):
       """
       Renormalize a PDF to its corresponding number of data events.

       Parameters
       ----------
       pdf : callable
          The PDF to renormalize.
       x_range : tuple
          The range of the PDF.
       n_bins : int
          The number of bins. Regular binning is assumed.
       n_data : int
          The number of predicted data events in the x_range associated to the pdf.

       Returns
       -------
       pdf : callable
          The renormalized PDF.
       """

       xmin, xmax = x_range
       bin_width = (xmax - xmin) / n_bins
       integral = quad(pdf, xmin, xmax)[0]
       # Note: If x_range is equal to the full range of the PDF, the integral is equal to 1.

       def renormalized_pdf(x):
           return pdf(x) * n_data * bin_width / integral

       return renormalized_pdf

.. note::
   The renormalize function could also be done with a ``lambda x:`` function, but it should be use with precautions, see `here <https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result>`_.

Then you can use :func:`plot_model() <plothist.plotters.plot_model>` or :func:`plot_data_model_comparison() <plothist.plotters.plot_data_model_comparison>` (see :ref:`advanced-asymmetry-label`) to plot the PDF and do all sort of comparisons with the ``plothist`` interface:

.. image:: ../img/asymmetry_comparison_advanced.png
   :alt: Advanced asymmetry comparison
   :width: 500


.. _plot-roofit-canvas-solution-label:
Getting RooFit PDFs from the canvas
===================================

Some PDFs normalization are not easy to get from the ``RooFit`` PDF object.
If the two steps above did not work, you can use the canvas to get the PDF.
This solution has the advantage of being already normalized to the data sample.
The disadvantage is that the resulting PDF is bin dependent, so when plotting your data, you need to use the same bins as the ones used to create the canvas.

To get the PDFs from the canvas, you first need plot the desired PDFs on a frame with ``plotOn()``.
Then, you need to save the canvas as a root file with ``canvas.SaveAs("root_file.root")``.

The main idea is that when you do a ``plotOn()`` on a frame, the function is saved as a ``TGraph`` object. You can then get the ``x`` and ``y`` values of the graph and interpolate it to get a function. The function is then saved in a list with the name of the function. The PDF order in the list is the same as the order you used to plot them on the frame:

.. code-block:: python

   import ROOT
   from scipy.interpolate import interp1d


   def get_pdf_list(root_file_name, canvas_name="canvas"):
       # Open the ROOT file
       root_file = ROOT.TFile(root_file_name, "READ")

       # Get the TCanvas from the file
       canvas = root_file.Get(canvas_name)

       pdf_list = []
       pdf_names = []

       ## If you have multiple pads, you need to specify which one you want to get the PDF from
       # pad = canvas.GetPrimitive("pad_name")
       ## Then loop over the primitives of the pad and not the canvas
       # for obj in pad.GetListOfPrimitives():

       for obj in canvas.GetListOfPrimitives():
           if isinstance(obj, ROOT.TGraph) and not isinstance(obj, ROOT.TGraphAsymmErrors):
               # Get the x and y values of the TGraph
               pdf_names.append(obj.GetName())
               x_values = obj.GetX()
               y_values = obj.GetY()

               # Interpolate the TGraph to get a function
               pdf_func = interp1d(x_values, y_values)

               pdf_list.append(pdf_func)

       print(f"\nPDFs from {root_file_name} saved in the list:")
       for k_name, pdf_name in enumerate(pdf_names):
           print(f"\t[{k_name}] {pdf_name}")
       print()

       return pdf_list