.. _user_guide_pipelines:

Building and Customizing PEP Extraction Pipelines
=================================================

A **PEP extraction pipeline** in pepbench is a
`tpcp` pipeline that chains heartbeat segmentation, Q-peak
detection, C- and B-point detection, optional outlier correction, and
finally PEP computation.

The main class is :class:`pepbench.pipelines.PepExtractionPipeline`.

Conceptual structure
--------------------

A pipeline is configured by *choosing algorithms* for each step:

* ``heartbeat_segmentation_algo`` – ECG heartbeat boundaries
* ``q_peak_algo`` – Q-peaks on ECG
* ``b_point_algo`` – B-points on ICG
* ``c_point_algo`` – C-points on ICG (optional but required for some B algorithms)
* ``outlier_correction_algo`` – optional B-point post-processing

The pipeline then provides:

* methods: :meth:`run`, :meth:`safe_run`
* result attributes: ``heartbeat_segmentation_results_``, ``q_peak_results_``,
  ``c_point_results_``, ``b_point_results_``,
  ``b_point_after_outlier_correction_results_``, ``pep_results_``

A minimal pipeline
------------------

.. code-block:: python

   from pepbench.algorithms.heartbeat_segmentation import HeartbeatSegmentationNeurokit
   from pepbench.algorithms.ecg import QPeakExtractionVanLien2013
   from pepbench.algorithms.icg import (
       BPointExtractionLozano2007LinearRegression,
       CPointExtractionScipyFindPeaks,
   )
   from pepbench.algorithms.outlier_correction import OutlierCorrectionLinearInterpolation
   from pepbench.pipelines import PepExtractionPipeline
   from pepbench.datasets import EmpkinsDataset

   # 1. Load a dataset
   ds = EmpkinsDataset(
       base_path="/path/to/empkins",
       only_labeled=True,
       exclude_missing_data=True,
   )

   datapoint = next(iter(ds))  # single datapoint

   # 2. Configure algorithms
   heartbeat_algo = HeartbeatSegmentationNeurokit()
   q_algo = QPeakExtractionVanLien2013(time_interval_ms=40)
   c_algo = CPointExtractionScipyFindPeaks()
   b_algo = BPointExtractionLozano2007LinearRegression()
   outlier_algo = OutlierCorrectionLinearInterpolation()

   # 3. Build the pipeline
   pipeline = PepExtractionPipeline(
       heartbeat_segmentation_algo=heartbeat_algo,
       q_peak_algo=q_algo,
       b_point_algo=b_algo,
       c_point_algo=c_algo,
       outlier_correction_algo=outlier_algo,
       handle_negative_pep="nan",
       handle_missing_events="warn",
   )

   # 4. Run on a single datapoint
   pipeline = pipeline.safe_run(datapoint)

   pep_df = pipeline.pep_results_
   print(pep_df.head())

Why ``safe_run``?
-----------------

:meth:`PepExtractionPipeline.safe_run` wraps :meth:`run` with additional
sanity checks:

* verifies that ``run`` returns ``self``
* checks that result attributes are set and follow the ``*_`` naming convention
* checks that input parameters are not mutated

When experimenting with custom pipelines or algorithms, prefer
``safe_run``; once things are stable, ``run`` can be used directly if
you need slightly less overhead.

Inspecting intermediate results
-------------------------------

Because the pipeline exposes results per step, you can inspect or plot
intermediate stages:

.. code-block:: python

   hb = pipeline.heartbeat_segmentation_results_
   q = pipeline.q_peak_results_
   c = pipeline.c_point_results_
   b_raw = pipeline.b_point_results_
   b_corr = pipeline.b_point_after_outlier_correction_results_
   pep = pipeline.pep_results_

   # Example: join PEP with heartbeat table
   pep_with_hb = pep.join(hb, how="left")

Configuring parameters
----------------------

All algorithms and the pipeline follow the tpcp ``get_params``
/ ``set_params`` convention.

.. code-block:: python

   # Inspect all parameters (including nested algorithms)
   print(pipeline.get_params())

   # Change the Q-peak offset
   pipeline = pipeline.set_params(q_peak_algo__time_interval_ms=50)

   # Change outlier correction behavior
   pipeline = pipeline.set_params(
       outlier_correction_algo__max_gap_beats=3,
   )

After changing parameters, simply call ``safe_run`` again.

Working with multiple datapoints
--------------------------------

Pipelines are **stateless** with respect to the dataset: you reuse the
same pipeline instance (or cloned copies) for each datapoint.

.. code-block:: python

   for dp in ds:
       res = pipeline.clone().safe_run(dp)
       # store res.pep_results_ somewhere

For large evaluation runs, you will normally let
:class:`PepEvaluationChallenge` handle this loop (see
:ref:`user_guide_evaluation`).

``PepExtractionPipelineReferenceQPeaks`` and ``PepExtractionPipelineReferenceBPoints``
--------------------------------------------------------------------------------------

pepbench also includes convenience pipelines that use **reference
annotations** instead of algorithmic detection for either Q-peaks or
B-points.

These are useful for:

* upper-bound comparisons (algorithm vs “perfect” reference)
* sanity checks on datasets and scoring

Refer to the API reference for these specialized classes.