.. _user_guide_evaluation:

Running Evaluation Challenges
=============================

pepbench provides a standardized **evaluation framework** for PEP
extraction pipelines via :class:`pepbench.evaluation.PepEvaluationChallenge`.
Each **challenge** is defined by a (pipeline, dataset) pair and yields
metrics aggregated at different levels.

Key classes and functions
-------------------------

* :class:`pepbench.evaluation.PepEvaluationChallenge` – runs evaluation
  across a dataset of annotated samples.
* :class:`pepbench.evaluation.ChallengeResults` – tuple-like container
  for aggregated and per-sample results.
* :func:`pepbench.evaluation.score_pep_evaluation` – default scoring
  function.

Initialising a challenge
------------------------

You need:

* a dataset subclassing
  :class:`pepbench.datasets.BasePepDatasetWithAnnotations`
  (e.g., :class:`EmpkinsDataset` with ``only_labeled=True``), and
* a scoring function (usually :func:`score_pep_evaluation`).

.. code-block:: python

   from pepbench.datasets import EmpkinsDataset
   from pepbench.evaluation import PepEvaluationChallenge, score_pep_evaluation

   ds = EmpkinsDataset(
       base_path="/path/to/empkins",
       only_labeled=True,
       exclude_missing_data=True,
       label_type="average",
   )

   challenge = PepEvaluationChallenge(
       dataset=ds,
       scoring=score_pep_evaluation,
   )

Running the challenge on a pipeline
-----------------------------------

.. code-block:: python

   from pepbench.pipelines import PepExtractionPipeline
   from pepbench.algorithms.heartbeat_segmentation import HeartbeatSegmentationNeurokit
   from pepbench.algorithms.ecg import QPeakExtractionVanLien2013
   from pepbench.algorithms.icg import (
       BPointExtractionLozano2007LinearRegression,
       CPointExtractionScipyFindPeaks,
   )
   from pepbench.algorithms.outlier_correction import OutlierCorrectionLinearInterpolation

   pipeline = PepExtractionPipeline(
       heartbeat_segmentation_algo=HeartbeatSegmentationNeurokit(),
       q_peak_algo=QPeakExtractionVanLien2013(),
       b_point_algo=BPointExtractionLozano2007LinearRegression(),
       c_point_algo=CPointExtractionScipyFindPeaks(),
       outlier_correction_algo=OutlierCorrectionLinearInterpolation(),
   )

   # Run the evaluation (internally loops over all datapoints)
   challenge = challenge.run(pipeline)

   # Convert internal results to DataFrames
   challenge = challenge.results_as_df()

After calling :meth:`results_as_df`, the challenge instance carries
four main result attributes:

* ``results_agg_mean_std_`` – mean and standard deviation across datapoints
* ``results_agg_total_`` – overall counts (e.g. valid vs invalid PEP)
* ``results_single_`` – one row per datapoint
* ``results_per_sample_`` – per-sample / per-beat results

Each attribute is a pandas DataFrame.

Example: inspecting per-datapoint performance
---------------------------------------------

.. code-block:: python

   single = challenge.results_single_
   print(single.head())

   # Sort by RMSE against reference PEP (column name depends on scoring)
   single_sorted = single.sort_values("rmse_pep")
   print(single_sorted[["participant", "condition", "rmse_pep"]].head())

Example: using ChallengeResults directly
----------------------------------------

If you call :func:`score_pep_evaluation` manually or in custom workflows,
it returns a :class:`ChallengeResults` object:

.. code-block:: python

   from pepbench.evaluation import score_pep_evaluation

   results: ChallengeResults = score_pep_evaluation(
       pipeline=pipeline,
       datapoint=datapoint,
   )

   agg_mean_std = results.agg_mean_std
   agg_total = results.agg_total
   per_sample = results.per_sample

Saving results to disk
----------------------

The challenge can write its results to disk:

.. code-block:: python

   challenge.save_results(
       folder_path="results/2025-01-01",
       filename_stub="lozano_qvanlien",
   )

This creates files (e.g. CSVs) with aggregated and per-sample metrics,
which is convenient for papers or further statistical analysis.

Plotting signals and results
----------------------------

pepbench provides helper plotting functions, e.g.
:func:`pepbench.plotting.plot_signals_from_challenge_results`, which can
visualize ECG/ICG signals together with algorithmic and reference PEP:

.. code-block:: python

   from pepbench.plotting import plot_signals_from_challenge_results

   datapoint = next(iter(ds))
   pep_per_sample = challenge.results_per_sample_.loc[datapoint.index_as_tuples()[0]]

   fig, axes = plot_signals_from_challenge_results(
       datapoint=datapoint,
       pep_results_per_sample=pep_per_sample,
       normalize_time=True,
       add_pep=True,
   )

   fig.suptitle("Example PEP extraction vs reference")

For more complex plotting options see the Plotting API reference.